Finance and economics suffer a severe disease. Hubris reigns in these domains, and our minds play against the logic and rationality that are required in these fields. Let's go through a few biases which cripple finance and economics, but also many other disciplines.
The normal distribution
Finance is the biggest victim of the normal distribution (also called Gaussian distribution). This probability distribution can be represented in the form of a bell curve, as follow:
The problem with this distribution of events (say, stocks returns), is that is does not apply at all in an overwhelming number of cases. For example, the daily returns of stocks do not fit such distribution. Typically, there are huge "fat tails". Extreme events are much more likely to occur than a normal distribution would allow.
Let's take an example to illustrate this point: all the daily return of Valeant Pharmaceutical since one year. For that period, the average daily return was -0.51%, and the standard deviation of this sample 5.93%. Based on these data, the normal distribution tells us that about 95% of the events should be within an interval of plus or minus 4 times the standard deviation around the mean. That is, 95% of the observations should be within [-24.24%;+23.21%]. So far so good, as 99.60% of our values belong to this range, ie. we have only one observation that is outside the range.
However, this value is basically the only one of interest within the whole sample. This value, were you an investor, would be the only one you would care about. Valeant experienced a -51.46% single-day drop, on 15th March 2016. Such an enormous variation outweighs all others. Yet, according to a normal distribution with the parameters mentioned above (i.e. mean of -0.51% and standard deviation of 5.93%, the probability for such variation is 4.39e-18. What our normal distribution tells us is that such variation may statistically occur once every 600 trillion years. The universe is not that old, and may never last this long. Yet Valeant's stock did drop.
The worst thing in this example is that I have taken the whole sample to derive my mean and standard deviation. That is, the extraordinary drop was taken into account, which clearly shows that the normal distribution cannot cope with such outliers. Had I calculated them the day before the large drop, my probability would have been 5.11e-28, ie. once every 5,000,000,000,000 trillion years. I think that we can settle this number to be approximately 'never'. But had you owned Valeant stocks on the 15th of March, your investment in that company was halved in a matter of hours. So sad too bad. The normal distribution never imagined such thing.
I admit that I chose this example on purpose to demonstrate my point, and not randomly. But maths work that way: if you find one counter-example, then the theorem falls and is not to be considered true. The normal distribution is a valid probability distribution. It holds mathematically. But it doesn't apply as much as is thought in finance. It might make sense for specific situations, but the rule of nature and of financial markets is that it doesn't apply, and that rare events do occur.
Correlation and Causality
Correlation is a coefficient which indicates how two series change with respect to one another. Causality is a logical relationship which provoke a change in a serie when an event happens in the other one. Those two concepts are very different: correlation tells us "when this happened in our sample, then that happened on our other sample". causality tells us "if this happens in our sample, then it will generate that in our other sample".
Correlation does not imply causality, whereas causality comprises correlation. A bunch of great example of correlations can be found on the website Spurious Correlations, which greatly heps to understand its difference with causality. Here is a good example:
These two phenomenon do correlate. But there is no causality link between suicides by hanging and US spending on science, space and technology. Sure, after seeing this chart, you might be tempted to find explanations, to make up theories. To explain it one way or another. That is story-telling, it is the making of a narrative fallacy, as Nassim Nicholas Taleb would put it. Our minds want to understand all that it sees, and we are always tempted to see causality.
We human want to understand. We want to be able to explain things. Sometimes, we can't. Either because the phenomenon that we want to understand is too complex. Or because their is no explanation. Everything is not linked. Darwin showed us that their is no purpose in evolution. The fittest survive, which leads to evolution, but evolution is not a purpose. Nature doesn't try to improve its creatures. We need to restrain ourselves from trying to see causality links between phenomena that are correlated. We need to rely strongly on logic, and seek to dismantle causality, rather than falling in a confirmation bias. The remaining causality links that we will not be able to disprove will be all the more stronger and reliable.
Another big issue in economics and finance (but not only in these domains), is that we typically tend to use too low confidence levels. A confidence interval, or confidence level, is how much reliable we think that our estimates are true and accurate. For example, we typically compute 95% Value at Risk in finance. These VaR tell us that the maximum single-day loss that we can experience, with a 95% confidence level, is a certain number. It means that in 5% of the cases, the loss might be greater. And 5% is huge. Where do you think that the 2007 crisis did fall? Within this range. Basically every crisis or bubble burst falls within these extreme events and are therefore only marginally accounted for.
A confidence interval of 95% might seem large. But it actually that one in twenty observations might be a false positive. That is, if you build a model based on such confidence level, their is a one in twenty chance that it might actually not be reliable. In particular when building models in economics and finance, where a lot of adjustments are made to the criterias and parameters in order to expose a proper finding, this confidence interval needs to be much larger.
A comic strip from XKCD brilliantly exposed the need for a stronger standard in modelling and statistics : Significant. Changing twenty times their hypothesis in order to find something, the researchers end up finding nothing but a false positive. Note that the hypothesis that they try to prove right was already invalidated by the first finding.
Building a model is hard. There are a huge number of biases which can affect our result, and also the way we build the model. Indeed, models typically rely on past observations. This overlooks the fact that the future is not like the past. LTCM, a hedge fund, learned this the hard way in 1998, when it went broke in a matter of weeks, though it thought it was solvent and liquid. LTCM rely on past volatility as a basis to anticipate future volatility. It didn't work. The fund thought the worse one-day loss it could experience (95% VaR... see above) was about $40 million. It lost $550 millions on August 21, 1998. Past data are a bad basis to predict the future.
Building models is therefore very hard. But assessing the reliability of a model is harder still. A widely known trap consists in testing the model against the data which helped build it. If your model succeeds, then it doesn't tell you anything about its validity because it was designed to fit the data you are using. If it fails, it's even worse, your model doesn't even hold itself together.
Another danger is in collecting data to build the model. It is hard to have objective data, and to use the full set, the full universe, or at least a representative sample. The risk is for example to fall in the survivor bias. Say you want to assess the performance of investment managers. If you take all the investment managers in activity today, compute their track record on the last 10 years, you are doing it wrong. All the investment managers who went broke during the period are de facto excluded from the sample. The correct procedure would be to take all the investments managers at a given date, and observe what happens after that. Those are only a portion of the biases that affect model builders.
- Proper distributions need to be used (ie. rarely a normal distribution),
- Confidence levels need to be appropriately high.
- We must be aware of the biases which affect us
- Long story short, statistics are for statisticians.
Readings to go further (and better)
Here are a few readings and hearings that I recommend to any person with an interest in finance or economics.
- James Heckman on Facts, Evidence, and the State of Econometrics: difficulty of interpreting data and collect reliable data, complexity of the world and insufficiency of models.
- Townsend on Development, Poverty, and Financial Institutions: survivor bias, dangers of mixing causality and correlation.
- Paul Pfleiderer on the Misuse of Economic Models: cherry-picking of data, chameleon models, calibration of models and non-reproductability of findings
- Nassim Nicholas Taleb Podcast Episodes and Extras: normal distribution fallacy, 'Black Swans' (rare events)
- Hansen on Risk, Ambiguity, and Measurement: failure of modelling and low reliability of models based on historical observations
- Fama on Finance: how past information can't help you predict future price variations (Efficient Markets Hypothesis)
- Campbell Harvey on Randomness, Skill, and Investment Strategies: back-testing, bad practices in modelling, confidence levels, and reliability of models.
Books and articles:
- Nassim Nicholas Taleb, The Black Swan: on bias of human thinking in general.
- Roger Lowenstein, When Genius Failed: an example of failure of modelling and risk management.
- Friedrich Hayek, Nobel Prize Lecture, The Pretence of Knowledge: how little do economists actually know, and how they can't predict the future.