Or "Why the number of films Nicholas Cage stars in annually apparently predicts pool drownings."
Why: I wanted to write a buzz-worthy post and began to wonder if the title and search terms of an article were more important than the content itself. Do others share my technology-induced ADD? Do all of us need the counsel of ADD leadership coach and psychologist Phil Bossiere?
That said, the making the premise work: Big Data is coming. More and more we will be able to see the correlations between even larger and more disperse sets of data and further unravel the onion-like layers of psychology, science, society, technology and the subtle perturbations of the quantum physics underlying it all. They say "the data doesn't lie"... but it all depends on whether you are asking it the right question. This is where Design Thinking DOES trump Big Data. The core of design thinking always is to ensure "are we asking the right question?" Or as the adage goes,
You can have all the right answers, but it doesn't matter if you are asking the wrong question
Correlation Doesn't Prove Causality: everyone who takes stats 101 learns this phrase, yet history is plagued with an embarrassing pattern of human confirmation bias and erroneous attribution of causality to correlation. Here are a few examples from this excellent article:
- Hormone replacement therapy was correlated with reduced instance of cardiac disease... until a more important correlation with higher socioeconomic status upended the claim.
- Crime reduction in the 90's in NYC was attributed to police efforts following the "broken windows theory" and widely credited to those actions and popularized by Malcolm Gladwell in Tipping Point... until a better correlation emerged from a potential reduction in birth-rate to low-income mothers driven by Roe v. Wade as popularized by Levit and Dubner in Freakonomics... Except as it turns out the birth rate of at-risk children may have actually increased after Roe v. Wade, so.. what is it?
Other similar claims that were ultimately disproven as nonsense:
- Eating breakfast = weight loss
- Eating dinner together as a family = less teen drug use
- Vaccinations = autism and other issues
Other, funnier spurious correlations: As it turns out given enough data and the ability to process it (e.g. "Big Data") spurious correlations can emerge from all directions. A couple of my favorites:
- The number of people annually drowning in a pool correlates nearly precisely with the number of films Nicholas Cage appears in.
- The age of Miss America correlates with annual murders via "steam and hot vapors" (how does one do this? No idea.)
- The divorce rate in Maine correlates precisely to the consumption of margarine
Asking the right question (and the null hypothesis.) Design thinking constantly anchors to empathy and reframing the question: correlation errors are often driven by confirmation bias - a "belief" that the answer has been found. True statisticians know that you can never "prove" causality from correlation, but they can disprove that things are uncorrelated - a double negative that if inaccurately simplified suggests you can prove causality from correlation.
A terrible question and one of the most embarrassing errors in science: "Does eating fat and in particular saturated fat cause obesity and heart disease?" This question and the corresponding correlation/causality errors over the last 45 years may have killed more Americans than smoking and car accidents combined. Was there a correlation between eating fat and obesity: apparently yes. Was is possible that a lower fat, (and likely higher plant based) diet also correlated with a more health conscious, high income group of people who exercised more with access to better medical care than those who ate a higher portion of fat in their diets? Probably. So the null hypothesis could not be proven (at the time) that eating fat does not NOT cause obesity. The inverse of the null hypothesis was then very logical (if completely flawed):
If you eat fat, you get fat.
Seemingly every doctor and nutritionist in the world with the exception of the much-maligned Robert Atkins jumped on the bandwagon and the US and the world in its wake shifted to a low-fat (and hence high carb) diet in the 70's in accordance with this data and logic.
The Big Fat Lie: unfortunately the correlation that eating fat causes obesity and the corresponding advice to shift to a low fat high carb diet has almost certainly had catastrophic outcomes. The obesity rate in the US was steady at 15% until 1980, and since then has climbed to nearly 40% - close to triple! Lest I also confuse correlation with causality lets get back to our original question and reframe it better: "do our bodies process all foods and calories the same?" and "Which foods lead us to storing fat and which foods lead to burning fat." Better questions that can lead to chemistry vs. correlation.
Chemistry and causality vs. correlation and causality. Data correlations are, as mentioned, slippery slopes to causality, but when you speak physics or chemistry, you CAN prove causality. In physics we have Newton's third law "for every action, there is an equal and opposite reaction." In chemistry and biology there are similar laws including conservation of mass, equilibrium etc. So in testing we can examine what chemicals and hormones are released in the body when consuming certain types of foods, we can then chemically and biologically ascertain which of those food/chemical combinations cause our bodies to burn or store fat, and then we can announce findings with a much greater level of confidence.
And the chemistry says?
Eating fat causes us to burn fat, eating carbohydrates causes us to store fat.
Whoah! how did we have it exactly wrong for so very long?? So basically everything we have been told for 30 years by the "experts" is wrong?? Here we are lambasting McDonalds, when it is really Coke and General Mills that are to blame. I, for one, get angry every time I think about this. For two decades I avoided fats, ate boatloads of bread and pasta, and "carbo-loaded" before every single competition, screwing up all kinds of balances including insulin, causing inflammation and soreness, and taking on probably 5 lbs of extra water weight to carry around the track. Now that I'm low carb, I can see it dramatically any time I go off the wagon: shaky nerves, creaky joints, and an immediate weight gain of 5 lbs. of water. I can speak to this from personal experience, but slowly and quietly the world's experts have eschewed their prior guidance: here are three great books: Eat Fat, Get Thin In Defense of Food It starts with Food
For God's sake, I ate Margarine instead of butter for 25 years! At that time dietary guidelines had butter as Charles Manson and coconut oil was the devil. Now along with millions of other bullet-proof coffee aficionados I put a tablespoon of both coconut oil and butter into my coffee every morning. Just 3 years ago I switched to a high fat low carb diet and within 3 weeks I lost 20 lbs and looked 10 years younger. Eating eggs and butter every day, my cholesterol has dropped significantly and my weight and body fat have stabilized at very healthy levels.
Conclusion: Design Thinking (and asking the right questions) needs to guide Big Data Correlations. We are all human beings and subject to all sorts of biases driven by complex psychological schemes and evolutionary holdouts and shortcuts. The emergence of Big Data and its limitless possibilities for potentially spurious correlations will most likely lead to a host of new rabbit trails and red herrings. Who knows what new wrong questions we might ask and what new unintended consequences may result. I, for one, am glad we no longer have saccharine in our diet sodas, but the reality is, saccharine is slightly less carcinogenic than green beans. Oops. Oh well, bring on the butter!
What kinds of questions are we going to get wrong with the advent of big data ?
PS: I assume somewhere in this article I too attributed correlation to causality, an egregious hypocritical fractal. Apologies in advance - please point it out kindly.