I live in a house with three kids of primary-school age. Between them they have a variety of interests, but they share one in common that is almost entirely alien to me: Minecraft. This computer game involves – as the name suggests – mining for materials and then crafting useful objects from them. There are all sorts of dangers lurking for the game’s protagonist, most of them after the sun goes down, which leads to a well-known mantra among my kids and their gaming friends: don’t mine at night.
We have a similar warning that we frequently remind ourselves of in the Asset Allocation team (unfortunately without the associated catchy song) about the need to avoid data mining. However, for a recent joiner in our team who has a background in data science, this was the source of some confusion. From a data scientist’s perspective, the confusion is warranted: data mining is a key component of machine learning, and refers to extracting information from patterns in large datasets. Why, then, does the term have a negative connotation when used in our team?
What we really mean when we use the term ‘data mining’ in an investment context is ‘finding relationships in historical data that appear causational, and building a model that assumes a continuation of that causation, where in reality that relationship was coincidental and unlikely to persist’. It is relatively easy to over-parameterise a model and endlessly tweak it to get the best possible back-test (i.e. simulated past performance); it is almost certain that such a model will underperform in the future when those specific circumstances do not repeat themselves.
The following example, offered by my colleague Chris Teschmacher, highlights the point quite well. His grandmother started playing golf in her late 50s, and hit her only ever hole-in-one in her 80s. A remarkable achievement, I’m sure you’ll agree. But the truth of the matter is that she did it with a thinned 7-iron and the ball bounced off a tree. It is unlikely anyone would advocate that the suggested process to achieve the same hole-in-one result going forward should be to aim for that tree.
In a similar vein, earlier this year Chris Jeffery discussed the Super Bowl indicator. This was a clear case of correlation without causation, even before the apparent relationship between American football and equity markets fell apart. But we see far more subtle examples of this type of spurious relationship all the time in our line of work.
The distinction we make between alternative risk premia (ARP) strategies and the much broader universe of quantitative strategies is this: for ARPs, there must be either a behavioural or structural rationale as to why the strategy should work in the future, rather than being just a combination of signals that happen to have worked well together in the past and might work in the future.
Passing the back-test
There are all sorts of reasons why that might not play out (e.g. crowding of the strategy, or a regime change that causes a structural shift and a breakdown of some prior imbalance), and being mindful of changes in the dynamics of strategies is an important part of our process. We often come across strategies proposed to us by third parties that have fantastic back-tests, but whose performance drops off a cliff once that strategy goes ‘live’.
We use a variety of statistical tests to determine if the live performance belongs to the same distribution as the back-test; our research (backed up by academic literature on the topic) shows that, on average, one needs to apply a ‘haircut’ to a strategy’s simulated past performance of more than 75% as an indicator of its likely future performance.
But despite the myriad of reasons why a strategy may not live up to expectations, we try where possible to rule out data mining as the cause of future failure by keeping the testing and design process as simple as possible.
For many applications of machine learning, this is not an issue; data mining is appropriate where the output (i.e. not the model process) is the only thing that matters (e.g. improved cancer diagnosis) and the inputs are relatively stationary and unlikely to see a structural break (e.g. a large sample of people).
There are also many types of cross-sectional applications in finance where machine-learning models improve upon more traditional tools, usually aided by the availability of large datasets. Financial time-series data, however, are rarely as well behaved, with little guarantee that the future will look much like the past, so we have to be extremely careful about how we make inferences.
As we increasingly incorporate new techniques into our toolkit, and hire data scientists into the team, it is important we don’t lose sight of our data minecraft mantras.