Machine learning challenges in finance

23 October 2020

The article at a glance

Machine learning (ML) is the most important branch of artificial intelligence (AI), providing tools with wide-ranging applications in finance. My previous blog posts (‘Can robots beat the market?’ and ‘Artificial intelligence in asset management: hype or breakthrough?’) discuss some of the most important ML applications in finance.

by Dr Mehrshad Motahari, Research Associate, Cambridge Centre for Finance and Cambridge Endowment for Research in Finance

Dr Mehrshad Motahari.
Dr Mehrshad Motahari

Machine learning (ML) is the most important branch of artificial intelligence (AI), providing tools with wide-ranging applications in finance. My previous blog posts (‘Can robots beat the market?‘ and ‘Artificial intelligence in asset management: hype or breakthrough?‘) discuss some of the most important ML applications in finance. The success of ML is often linked to its three key capabilities: providing flexible functional forms which can capture nonlinearities in data, selecting relevant model features without pre-specification, and capturing information from non-numerical data sources such as texts. However, recent studies including Israel et al. (2020) and Karolyi and Van Nieuwerburgh (2020) outline several challenges involved with using ML in finance. What follows provides a summary of these challenges.

Finance is often thought of as a field awash with applicable data, ranging from financial and economic sources to more recent unstructured data such as online news and social media posts. While the breadth of data that can be used in finance is quite large, the time series are often very short by ML standards. A limited number of time series observations would mean that any model using the data is also constrained to be proportionally small. The consequence of this is that data-hungry ML tools cannot operate anywhere near their full potential. Finance also does not allow for data to be produced using experiments, as it is done in other fields. For example, in image recognition, which is a successful area of ML application, scientists can simply produce millions of photos using experiments in order for the models to train from. In finance, however, one has no alternative but to wait for financial data to be produced over time.

There are exceptional cases in finance where data is available in high frequency, such as HFT trades, providing ML tools with a larger number of observations across time to learn from. However, even in these cases, ML faces its second-biggest challenge: signal-to-noise ratio. ML tools are highly dependent on data quality. Poor quality and noisy data lead to unreliable ML models. It is to no one’s surprise that financial data is considerably noisy, especially when the data frequency is high. The reason for this, of course, is that when following the Efficient Market Hypothesis (EMH), one should only be able to predict one variable in fully efficient financial markets. That variable is risk premia, which is small and difficult to capture in short horizons. In the absence of large and reliable databases, ML tools in finance are essentially tasked with finding a needle in a haystack.

Another difference in finance, compared with other areas in which ML is applied, is data evolution. Taking image recognition again as an example, images of humans always have the same features; using these features, ML tools can learn to recognise images. In contrast, financial data changes and evolves over time, as do the financial markets. Therefore, it is difficult to imagine that financial variables have the same meaning they had several decades ago. There are, of course, economic logics that do not change over time and that underly the markets’ behaviours. However, most ML models are so-called black boxes and do not provide any insights regarding how they produce specific results. This lack of interpretability makes it difficult to understand whether an ML model is capturing economically meaningful patterns or pure noise.

ML tools have essential applications in finance nowadays. The three main ML challenges of lack of data, low signal to noise ratio, and absence of model interpretability now construct the frontier of research in finance. A growing number of papers attempt to find novel and creative solutions to address these issues (see Israel, et al., 2020). These developments can pave the way for a stronger presence of ML in finance in the years to come.


References

Israel, R., Kelly, B.T. and Moskowitz, T.J. (2020) “Can machines ‘learn’ finance?” Social Science Research Network, No.3624052.

Karolyi, G.A. and Van Nieuwerburgh, S. (2020) “New methods for the cross-section of returns.” Review of Financial Studies, 33(5): 1879-1890