Causation and correlation

Yet another thing that should be taught in schools, but it’s a bit technical and somewhat boring. Mind you, it beats the coal fields of Saarbrücken which I had to learn for O-level geography.

Figure 1: Everyone’s idea of cause and effect

I don’t really know what intelligence is (I don’t think anyone else does either) but spotting patterns is needed to understand a hazard: we did X and Y happened, and since we don’t want Y again, let’s not do X. We are taught these from childhood – look both ways before you cross the road or a car might hit you, don’t put your fingers in the mains socket or you will get hurt etc.. Many things are based on this simple and, as far as it goes, undeniably true model. But life is more complex than that, and thinking that this is the complete picture is misleading.

In my world of engineering, quite a few problems are simple enough for this model to be useful. We engineers like to keep things as simple as possible1 because we know that complexity is the enemy of reliability. Of course, sometimes you need complexity but in general the engineers’ maxim of KISS (“keep it simple, stupid”) works. Software engineers, in particular, are told to keep things as simple and as clear as possible, so that there are “obviously no bugs, rather than no obvious bugs”.

When something goes wrong we need to find out why and therefore we perform “root cause analysis” – finding out what mistake (bug) in the design caused the failure in question. I’ve quite often been involved in these root cause analyses and often noticed the lack of correlation between the complexities of a cause (the bug) and its effect (the failure in the vehicle). I once was faced with the most complex pattern dependent errors on a communications link. I tried to analyse which messages transmitted correctly and which did not, trying to find the clearly deeply buried processing error. It turned out to be a couple of small and normally unimportant components that had the wrong value; a very simple cause that produced very complex results. Against that I’ve had quite complex bugs – three separate modules not doing the right thing and trying to cope with their companions’ mistakes – that led to something as simple as the wrong engine temperature being displayed. Knuth describes a program he designed to generate a sequence of ten-digit random numbers2 that was as complex as he could make it; to generate the next number in the sequence, his algorithm uses a previous random number to pick one of ten different random number algorithms. A more complex cause is difficult to imagine, but the result was a sequence of 12 numbers, looping around on itself; hardly a complex outcome. Even at the engineering level where things are generally kept as simple as possible, the simple cause and effect model can break down.

The trouble is the cause and effect model is so obvious that few question it. When trying to find the cause of an accident, we start with the final event and work backwards thus:

Figure 2: Discovering the root cause

Then having done so, we are confident that what happened was this:

Figure 3: Naïve model of causality

Then we argue that if we cure the root cause, that effect will never happen again. Real life is more complex than this; what if there are 10 more root causes waiting in the wings to cause more accidents? For example, what was the root cause of the accident to the ship Baltic Star3 which ran aground in thick fog at full speed?

  • One of the boilers had broken down;
  • The steering system reacted only slowly;
  • The compass was maladjusted;
  • The captain had gone down into the ship to telephone;
  • The outlook on the prow was taking a coffee break;
  • The pilot had given an order to the helmsman;
    • that was wrong;
    • in English to a sailor who understood only Greek;
    • and who was hard of hearing.

I like this example because I don’t think anyone got hurt and it’s a bit funny. If you want a much more serious example, look for the several failures that led to the world’s worst industrial accident in Bhopal.

After analysis , a lot of accidents seem to have happened because of
sheer coincidence, leading to the attitude of “accidents are always going to happen”. This is a false way of looking at things – it’s not a simple causal chain, it’s a complicated net and you are looking at just one thread. What you have actually got might look more like this:

Figure 4 – A realistic model of how an accident happens


On another day you might have this, when a different root cause feeds through to a different effect:

Figure 5 – Solving one root cause is not enough


The fundamental systemic problem is all the grey arrows which represent the ability of one event to cause another. Aside from the desired operation, a safe system only has no events that can propagate. It is also analogous to epidemiology and the famous R number; the disease won’t spread if we are all cooped up in separate rooms without sharing air.

The more modern way of preventing accidents gets rid of the grey arrows by introducing a “safety culture” where reporting potential hazards is easy. These concerns get listened to, and most importantly nobody gets punished if it turns out not to be an issue.

So, cause always precedes effect4, but the network of causes and effects are more complex than what appears at first sight, and very often more complex than can ever be discovered. Thinking that cause and effect is a linear set of consequences, and so arguing that the prevention of a root cause will end the issue, is naïve, widely taught and very prevalent. Here’s an old adage demonstrating this to an extreme:

For want of a nail, the shoe was lost;
For want of the shoe, the horse was lost;
For want of the horse, the rider was lost;
For want of the rider, the battle was lost;
For want of the battle, the kingdom was lost;
And all from the want of a horseshoe nail.

Really? How about if one of the knights had been a bit quicker with his sword? Or if it hadn’t been raining the night before that made it treacherous under foot?

The law of unintended consequences5

The law of unintended consequences is closely allied to this. If it is naïve to think that there is just one path through the forest, then it is also simple minded to think that an action will have just one outcome (and that that outcome is the desired one). An excellent example of this is funding irrigation systems in underdeveloped countries. Nothing could be better – bring water to land that doesn’t have enough to permit the growth of crops and thereby alleviate hunger – no chemicals, no big polluters, just water. Except that with all that water sitting around, malaria made a come-back.

When India was ruled by the British, there was an infestation of cobras in Delhi, so the British offered a cash reward for any cobras caught and killed. This worked for a bit, but seeing an opportunity in the market, quite a few snake farms were created just to provide dead cobras. The British, realising this, stopped the cash reward at which point all the snakes that had been in farms and being fed were released into the wild, making the infestation a lot worse.

Figure 6: Indian cobra

My partner thinks that a lot of issues in politics these days are caused by the PPE (politics, philosophy and economics) degree that many of our Great Leaders seem to take at Oxford University. With three subjects in three years, they don’t have much time for any depth, and so just touch the basics. The law of unintended consequences doesn’t seem to get taught (nor the real complexity of causality) – our government wanted to raise money by certain laws concerning when you had to pay income tax, so that you couldn’t avoid doing so by living out of the country for a certain period every year; it’s also great politics for a Labour government – make the rich pay their fair share6. They had to change this policy when it was pointed out that this would actually lose them money; those rich enough to be caught by this scheme would just leave the country and pay no taxes at all. They have also taxed private education, but now private education is only for the very rich, not just the rich (it used to be possible for the children of middle earners like me), and a lot of private schools have closed down, putting strain on an overburdened national education system. In fact I don’t think they think about the consequences at all, just do stuff that sounds good to them. I’d like to say that our current government is a particularly stupid one but it’s not; however I digress.


Correlation and causation

If we have a tendency to oversimplify the real world, then our ability to infer causation from correlation makes this much worse. We like rules and patterns and search for them constantly by looking for correlation between causes and effects. I think that this has got strong evolutionary roots – to me it feels like the basics of intelligence but can give quite erroneous results because we are in some ways too good at it. False correlations are everywhere but are caused by only a few mechanisms.

After it, therefore because of it

Or post hoc ergo propter hoc as President Bartlet would say7. This is one of the most deep seated ways of finding a pattern in the human brain, and it has the advantage of quite often being correct. But not always.

I don’t know how apocryphal the following story is, but it illustrates the point nicely. In the 1950s and 1960s, before everyone had a freezer in the kitchen (and yes, I am that old), if you wanted ice-cream you had to buy it from the shop and eat it immediately. One particular family in America used to have dinner together and then jointly decide which flavour ice-cream they would have for dessert. The father would then go out in the car, park just outside the local shop, buy the ice-cream and return. He bought a new car and a few days later took it back complaining that it would not carry vanilla ice-cream, although it was OK with other flavours. On further investigation, as they say, the garage that sold the car found that the fuel line could create a vapour lock, which meant that the car would not start for a short time after stopping. Vanilla ice cream, being the most popular, was at the front of the store whereas all the less popular flavours were at the back, and the extra time taken to walk to the rear of the store and back again was enough for the vapour lock to disperse. He had almost perfect correlation between the flavour of the ice-cream and his car not starting. His evidence was overwhelming, and statistically significant. Even if he was real, I cannot really imagine the driver thinking that his car really was disabled by the smell of vanilla flavouring, but we all suffer from such misattributions to a greater or lesser extent.

Confounding variables


Confounding variables are another source of false correlations. If cause A correlates with effect B, it may be that rather than A causing B, something else is causing both A and B.

Figure 7 – Confounding variables (hidden causes)


For example, coffee drinkers have more heart disease than non-coffee drinkers, so coffee is bad for your health, right? However, coffee drinkers are more likely to smoke; coffee, in moderation, may even be a benefit. (You guessed it, I drink a lot of coffee but don’t smoke.)

The favourite one is that ice-cream causes sunburn (or drownings); they are indeed correlated.

Figure 8 – Correlation does not imply causation

Pure chance

Correlation does not prohibit causation, of course, but it’s not at all reliable. Given enough data, correlations can be purely coincidental and false correlations abound. An example of this is the false correlation between the number of non-commercial space launches with the number of sociology doctorates awarded in the U.S., shown below.

Figure 8 – Example of correlation by chance

The Church of the Flying Spaghetti Monster does a lovely T-shirt showing the correlation between the increasing temperature of the globe and the number of pirates:

Figure 9 – Correlation between global warming and the number of pirates

It’s also a wonderful example of dishonest reporting. At first glance the correlation looks convincing and it’s easy to conclude that all we need to do to stop the storms in Florida is to have more pirates. The x-axis bears some examination though – not only is it highly non-linear, it goes up then down.

Pure chance can come from other causes. When people reach 100 years old, they usually get interviewed with the question “what is the secret to a long life”? From the point of view of the centenarian it’s probably pure chance9. (It could also be sampling bias.)

I’m guilty. I worked for a company that was doing really very well, couldn’t hire fast enough. Along comes a reporter and asks us the secret of our success and we all came up with things that we were doing differently, feeling a bit full of ourselves for being asked this question. Truth was, we got lucky, we found a niche market within which at that time it was pretty easy to succeed. That market has now gone away and our company went under.


Conclusion

Correlation does not imply causation, neither does it prohibit it. Finding a correlation is quite easy, but proving it is very difficult. As humans we are evolved to see patterns, and we are very, very good at it. It is very easy to find a correlation and have this inner certainty that we have discovered a great truth; we then discount information that contradicts the correlation; these days it gets filed under “fake news”. This is dangerous, particularly in combination with our desire to categorise things – wars start this way.

See the next chapter for the users and uses of false correlation.

Footnotes and references

[1] Regrettably, not always true.

[2] Or at least pseudo-random numbers which try to behave as if they are random.

[3] https://ocw.mit.edu/courses/16-863j-system-safety-spring-2016/3bd61d02522835a834be4b6e245b0961_MIT16_863JS16_LecNotes1.pdf

[4] Except in quantum mechanics, where everything is a bit magical.

[5] https://en.wikipedia.org/wiki/Perverse_incentive

[6] A phrase usually meaning “what I think is their fair share, and by the way, in this context, I’m not rich”.

[7] I’m rewatching The West Wing.

[8] https://www.tylervigen.com/spurious-correlations for some wonderful examples (including this one)

[9] See https://theonion.com/114-year-old-attributes-longevity-to-sheer-random-chanc-1819563850/

Figure 1: https://www.mashupmath.com/blog/cause-and-effect-examples

Figure 2: Own work

Figure 3: Own work

Figure 4: Own work

Figure 5: Own work

Figure 6: Photo attribution Kamalnv, CC BY 3.0 <https://creativecommons.org/licenses/by/3.0>, via Wikimedia Commons

Figure 7: Own work

Figure 8: https://statisticseasily.com/correlation-vs-causality/

Figure 9: https://upload.wikimedia.org/wikipedia/commons/7/77/Pirate_Global_Warming_Graph.gif The original uploader was LiamVleck at English Wikibooks., CC BY-SA 3.0 https://creativecommons.org/licenses/by-sa/3.0, via Wikimedia Commons