If you’re trying to gain a rapid understanding of a new area, one of the most important things you can do is to identify common mistakes people make, then avoid them. Here are some of the most predictable errors we tend to make when thinking about statistics.
Gaining a better understanding of probability will give you a more accurate picture of the world and help you make better decisions. However, many people fall prey to the same handful of issues because aspects of probability go against what we think is intuitive. Even if you haven’t studied the topic since high-school, you likely use probability assessments every single day in your work and life.
In Naked Statistics, Charles Wheelan takes the reader on a whistlestop tour of the basics of statistics. In one chapter, he offers pointers for avoiding some of the “most common probability-related errors, misunderstandings, and ethical dilemmas.” Whether you’re somewhat new to the topic or just want a refresher, here’s a summary of Wheelan’s lessons and how you can apply them.
Assuming events are independent when they are not
“The probability of flipping heads with a fair coin is 1/2. The probability of flipping two heads in a row is (1/2)^2 or 1/4 since the likelihood of two independent events both happening is the product of their individual probabilities.”
When an event is interconnected with another event, the former happening increases or decreases the probability of the latter happening. Your car insurance gets more expensive after an accident because car accidents are not independent events. A person who gets in one is more likely to get into another in the future. Maybe they’re not such a good driver, maybe they tend to drive after a drink, or maybe their eyesight is imperfect. Whatever the explanation, insurance companies know to revise their risk assessment.
Sometimes though, an event happening might lead to changes that make it less probable in the future. If you spilled coffee on your shirt this morning, you might be less likely to do the same this afternoon because you’ll exercise more caution. If an airline had a crash last year, you may well be safer flying with them because they will have made extensive improvements to their safety procedures to prevent another disaster.
One place we should pay extra attention to the independence or dependence of events is when making plans. Most of our plans don’t go as we’d like. We get delayed, we have to backtrack, we have to make unexpected changes. Sometimes we think we can compensate for a delay in one part of a plan by moving faster later on. But the parts of a plan are not independent. A delay in one area makes delays elsewhere more likely as problems compound and accumulate.
Any time you think about the probability of sequences of events, be sure to identify whether they’re independent or not.
Not understanding when events are independent
“A different kind of mistake occurs when events that are independent are not treated as such . . . If you flip a fair coin 1,000,000 times and get 1,000,000 heads in a row, the probability of getting heads on the next flip is still 1/2. The very definition of statistical independence between two events is that the outcome of one has no effect on the outcome of another.”
Imagine you’re grabbing a breakfast sandwich at a local cafe when someone rudely barges into line in front of you and ignores your protestations. Later that day, as you’re waiting your turn to order a latte in a different cafe, the same thing happens: a random stranger pushes in front of you. By the time you go to pick up some pastries for your kids at a different place before heading home that evening, you’re so annoyed by all the rudeness you’ve encountered that you angrily eye every person to enter the shop, on guard for any attempts to take your place. But of course, the two rude strangers were independent events. It’s unlikely they were working together to annoy you. The fact it happened twice in one day doesn’t make it happening a third time more probable.
The most important thing to remember here is that the probability of conjunctive events happening is never higher than the probability of each occurring.
“You’ve likely read the story in the newspaper or perhaps seen the news expose: Some statistically unlikely number of people in a particular area have contracted a rare form of cancer. It must be the water, or the local power plant, or the cell phone tower.
. . . But this cluster of cases may also be the product of pure chance, even when the number of cases appears highly improbable. Yes, the probability that five people in the same school or church or workplace will contract the same rare form of leukemia may be one in a million, but there are millions of schools and churches and workplaces. It’s not highly improbable that five people might get the same rare form of leukemia in one of those places.”
An important lesson of probability is that while particular improbable events are, well, improbable, the chance of any improbable event happening at all is highly probable. Your chances of winning the lottery are almost zero. But someone has to win it. Your chances of getting struck by lightning are almost zero. But with so many people walking around and so many storms, it has to happen to someone sooner or later.
The same is true for clusters of improbable events. The chance of any individual winning the lottery multiple times or getting struck by lightning more than once is even closer to zero than the chance of it happening once. Yet when we look at all the people in the world, it’s certain to happen to someone.
We’re all pattern-matching creatures. We find randomness hard to process and look for meaning in chaotic events. So it’s no surprise that clusters often fool us. If you encounter one, it’s wise to keep in mind the possibility that it’s a product of chance, not anything more meaningful. Sure, it might be jarring to be involved in three car crashes in a year or to run into two college roommates at the same conference. Is it all that improbable that it would happen to someone, though?
The prosecutor’s fallacy
“The prosecutor’s fallacy occurs when the context surrounding statistical evidence is neglected . . . the chances of finding a coincidental one in a million match are relatively high if you run the same through a database with samples from a million people.”
It’s important to look at the context surrounding statistics. Let’s say you’re evaluating whether to take a medication your doctor suggests. A quick glance at the information leaflet tells you that it carries a 1 in 10,000 risk of blood clots. Should you be concerned? Well, that depends on context. The 1 in 10,000 figure takes into account the wide spectrum of people with different genes and different lifestyles who might take the medication. If you’re an overweight chain-smoker with a family history of blood clots who takes twelve-hour flights twice a month, you might want to have a more serious discussion with your doctor than an active non-smoker with no relevant family history.
Statistics give us a simple snapshot, but if we want a finer-grained picture, we need to think about context.
Reversion to the mean (or regression to the mean)
“Probability tells us that any outlier—an observation that is particularly far from the mean in one direction or the other—is likely to be followed by outcomes that are most consistent with the long-term average.
. . . One way to think about this mean reversion is that performance—both mental and physical—consists of underlying talent-related effort plus an element of luck, good or bad. (Statisticians would call this random error.) In any case, those individuals who perform far above the mean for some stretch are likely to have had luck on their side; those who perform far below the mean are likely to have had bad luck. . . . When a spell of very good luck or very bad luck ends—as it inevitably will—the resulting performance will be closer to the mean.”
Moderate events tend to follow extreme ones. One area that regression to the mean often misleads us is when considering how people perform in areas like sports or management. We may think a single extraordinary success is predictive of future successes. Yet from one result, we can’t know if it’s an outcome of talent or luck—in which case the next result may be average. Failure or success is usually followed by an event closer to the mean, not the other extreme.
Regression to the mean teaches us that the way to differentiate between skill and luck is to look at someone’s track record. The more information you have, the better. Even if past performance is not always predictive of future performance, a track record of consistent high performance is a far better indicator than a single highlight.
If you want an accessible tour of basic statistics, check out Naked Statistics by Charles Wheelan.