Tag: Statistics

Common Probability Errors to Avoid

If you’re trying to gain a rapid understanding of a new area, one of the most important things you can do is to identify common mistakes people make, then avoid them. Here are some of the most predictable errors we tend to make when thinking about statistics.

Amateurs tend to focus on seeking brilliance. Professionals often know that it’s far more effective to avoid stupidity. Side-stepping typical blunders is the simplest way to get ahead of the crowd.

Gaining a better understanding of probability will give you a more accurate picture of the world and help you make better decisions. However, many people fall prey to the same handful of issues because aspects of probability go against what we think is intuitive. Even if you haven’t studied the topic since high-school, you likely use probability assessments every single day in your work and life.

In Naked Statistics, Charles Wheelan takes the reader on a whistlestop tour of the basics of statistics. In one chapter, he offers pointers for avoiding some of the “most common probability-related errors, misunderstandings, and ethical dilemmas.” Whether you’re somewhat new to the topic or just want a refresher, here’s a summary of Wheelan’s lessons and how you can apply them.

***

Assuming events are independent when they are not

“The probability of flipping heads with a fair coin is 1/2. The probability of flipping two heads in a row is (1/2)^2 or 1/4 since the likelihood of two independent events both happening is the product of their individual probabilities.”

When an event is interconnected with another event, the former happening increases or decreases the probability of the latter happening. Your car insurance gets more expensive after an accident because car accidents are not independent events. A person who gets in one is more likely to get into another in the future. Maybe they’re not such a good driver, maybe they tend to drive after a drink, or maybe their eyesight is imperfect. Whatever the explanation, insurance companies know to revise their risk assessment.

Sometimes though, an event happening might lead to changes that make it less probable in the future. If you spilled coffee on your shirt this morning, you might be less likely to do the same this afternoon because you’ll exercise more caution. If an airline had a crash last year, you may well be safer flying with them because they will have made extensive improvements to their safety procedures to prevent another disaster.

One place we should pay extra attention to the independence or dependence of events is when making plans. Most of our plans don’t go as we’d like. We get delayed, we have to backtrack, we have to make unexpected changes. Sometimes we think we can compensate for a delay in one part of a plan by moving faster later on. But the parts of a plan are not independent. A delay in one area makes delays elsewhere more likely as problems compound and accumulate.

Any time you think about the probability of sequences of events, be sure to identify whether they’re independent or not.

***

Not understanding when events are independent

“A different kind of mistake occurs when events that are independent are not treated as such . . . If you flip a fair coin 1,000,000 times and get 1,000,000 heads in a row, the probability of getting heads on the next flip is still 1/2. The very definition of statistical independence between two events is that the outcome of one has no effect on the outcome of another.”

Imagine you’re grabbing a breakfast sandwich at a local cafe when someone rudely barges into line in front of you and ignores your protestations. Later that day, as you’re waiting your turn to order a latte in a different cafe, the same thing happens: a random stranger pushes in front of you. By the time you go to pick up some pastries for your kids at a different place before heading home that evening, you’re so annoyed by all the rudeness you’ve encountered that you angrily eye every person to enter the shop, on guard for any attempts to take your place. But of course, the two rude strangers were independent events. It’s unlikely they were working together to annoy you. The fact it happened twice in one day doesn’t make it happening a third time more probable.

The most important thing to remember here is that the probability of conjunctive events happening is never higher than the probability of each occurring.

***

Clusters happen

“You’ve likely read the story in the newspaper or perhaps seen the news expose: Some statistically unlikely number of people in a particular area have contracted a rare form of cancer. It must be the water, or the local power plant, or the cell phone tower.

. . . But this cluster of cases may also be the product of pure chance, even when the number of cases appears highly improbable. Yes, the probability that five people in the same school or church or workplace will contract the same rare form of leukemia may be one in a million, but there are millions of schools and churches and workplaces. It’s not highly improbable that five people might get the same rare form of leukemia in one of those places.”

An important lesson of probability is that while particular improbable events are, well, improbable, the chance of any improbable event happening at all is highly probable. Your chances of winning the lottery are almost zero. But someone has to win it. Your chances of getting struck by lightning are almost zero. But with so many people walking around and so many storms, it has to happen to someone sooner or later.

The same is true for clusters of improbable events. The chance of any individual winning the lottery multiple times or getting struck by lightning more than once is even closer to zero than the chance of it happening once. Yet when we look at all the people in the world, it’s certain to happen to someone.

We’re all pattern-matching creatures. We find randomness hard to process and look for meaning in chaotic events. So it’s no surprise that clusters often fool us. If you encounter one, it’s wise to keep in mind the possibility that it’s a product of chance, not anything more meaningful. Sure, it might be jarring to be involved in three car crashes in a year or to run into two college roommates at the same conference. Is it all that improbable that it would happen to someone, though?

***

The prosecutor’s fallacy

“The prosecutor’s fallacy occurs when the context surrounding statistical evidence is neglected . . . the chances of finding a coincidental one in a million match are relatively high if you run the same through a database with samples from a million people.”

It’s important to look at the context surrounding statistics. Let’s say you’re evaluating whether to take a medication your doctor suggests. A quick glance at the information leaflet tells you that it carries a 1 in 10,000 risk of blood clots. Should you be concerned? Well, that depends on context. The 1 in 10,000 figure takes into account the wide spectrum of people with different genes and different lifestyles who might take the medication. If you’re an overweight chain-smoker with a family history of blood clots who takes twelve-hour flights twice a month, you might want to have a more serious discussion with your doctor than an active non-smoker with no relevant family history.

Statistics give us a simple snapshot, but if we want a finer-grained picture, we need to think about context.

***

Reversion to the mean (or regression to the mean)

“Probability tells us that any outlier—an observation that is particularly far from the mean in one direction or the other—is likely to be followed by outcomes that are most consistent with the long-term average.

. . . One way to think about this mean reversion is that performance—both mental and physical—consists of underlying talent-related effort plus an element of luck, good or bad. (Statisticians would call this random error.) In any case, those individuals who perform far above the mean for some stretch are likely to have had luck on their side; those who perform far below the mean are likely to have had bad luck. . . . When a spell of very good luck or very bad luck ends—as it inevitably will—the resulting performance will be closer to the mean.”

Moderate events tend to follow extreme ones. One area that regression to the mean often misleads us is when considering how people perform in areas like sports or management. We may think a single extraordinary success is predictive of future successes. Yet from one result, we can’t know if it’s an outcome of talent or luck—in which case the next result may be average. Failure or success is usually followed by an event closer to the mean, not the other extreme.

Regression to the mean teaches us that the way to differentiate between skill and luck is to look at someone’s track record. The more information you have, the better. Even if past performance is not always predictive of future performance, a track record of consistent high performance is a far better indicator than a single highlight.

***

If you want an accessible tour of basic statistics, check out Naked Statistics by Charles Wheelan.

What Sharks Can Teach Us About Survivorship Bias

Survivorship bias refers to the idea that we get a false representation of reality when we base our understanding only on the experiences of those who live to tell their story. Taking a look at how we misrepresent shark attacks highlights how survivorship bias distorts reality in other situations.

When asked what the deadliest shark is to humans, most people will say the great white. The lasting influence of the movie Jaws, reinforced by dozens of pop culture references and news reports, keeps that species of shark at the top of the mind when one considers the world’s most fearsome predators. While it is true that great white sharks do attack humans (rarely), they also leave a lot of survivors. And they’re not after humans in particular. They usually just mistake us for seals, one of their key food sources.

We must be careful to not let a volume of survivors in one area blind us to the stories of a small number of survivors elsewhere. Most importantly, we need to ask ourselves what stories are not being told because no one is around to tell them. The experiences of the dead are necessary if we want an accurate understanding of the world.

***

Before we drill down into some interesting statistics, it’s important to understand that great whites are one member of a class of sharks with many common characteristics. Great whites are closely related to tiger and bull sharks. They all have similar habitats, physiology, and instincts. They are also all large, with an average size over ten feet long.

Tiger and bull sharks rarely attack humans, and to someone being bit by one of these huge creatures, there isn’t all that much difference between them. The Florida Museum’s International Shark Attack file explains that “positive identification of attacking sharks is very difficult since victims rarely make adequate observations of the attacker during the ‘heat’ of the interaction. Tooth remains are seldom found in wounds and diagnostic characters for many requiem sharks [of which the great white is one] are difficult to discern even by trained professionals.”

The fatality rate in known attacks is 21.5% for the bull shark, 16% for the great white, and 26% for the tiger shark. But in sheer volume, attacks attributed to great whites outnumber the other two species three to one. So there are three times as many survivors to tell the story of their great white attack.

***

When it comes to our picture of reality of the most dangerous shark, there are other blind spots. Not all sharks have the same behaviors as those three, such as swimming close to shore and being around enough prey to develop a preference for fat seals versus bony humans. Pelagic sharks live in the water desert that is the open ocean and have to eat pretty much whatever they can find. The oceanic white tip is a pelagic shark that is probably far more dangerous to humans—we just don’t come into contact with them as often.

There are only fifteen documented attacks by an oceanic white tip, with three of those being fatal. But since most attacks occur in the open ocean in more isolated situations (e.g., a couple of people on a boat versus five hundred people swimming at a beach), we really have no idea how dangerous oceanic white tips are. There could be hundreds of undocumented attacks that left behind no survivors to tell the tale.

One famous survivor story gives us a glimpse of how dangerous oceanic white tips might be. In 1945, a Japanese submarine shot down the USS Indianapolis. For a multitude of reasons, partly due to the fact that the Indianapolis was on a top secret mission and partly due to tragic incompetence, a rescue ship was not sent for four days. Those who survived the ship’s sinking had to then try to survive in the open ocean with little gear until rescue arrived. The water was full of sharks.

In Indianapolis: The True Story of the Worst Sea Disaster in US Naval History and the Fifty-Year Fight to Exonerate an Innocent Man, Lynn Vincent and Sara Vladic quote Boatswain’s Mate Second Class Eugene Morgan as he described part of his experience: “All the time, the sharks never let up. We had a cargo net that had Styrofoam things attached to keep it afloat. There were about fifteen sailors on this, and suddenly, ten sharks hit it and there was nothing left. This went on and on.” These sharks are believed to have been oceanic white tips. It’s unknown how many men died from shark attacks. Many also perished due to exposure, dehydration, injury, and exhaustion. Of the 1,195 crewmen originally aboard the ship, only 316 survived. It represents the single biggest loss of life from a single ship in US naval history.

Because humans are rarely in the open ocean in large numbers, not only are attacks by this shark less common, there are also fewer survivor stories. The story of the USS Indianapolis is a rare, brutal case that provides a unique picture.

***

Our estimation of the shark that could do us the most harm is often formed by survivorship bias. We develop an inaccurate picture based on the stories of those who live to tell the tale of their shark attack. We don’t ask ourselves who didn’t survive, and so we miss out on the information we need to build an accurate picture of reality.

The point is not to shift our fear to oceanic white tips, which are, in fact, critically endangered. Our fear of sharks seems to make us indifferent to what happens to them, even though they are an essential part of the ocean ecosystem. We are also much more of a danger to sharks than they are to us. We kill them by the millions every year. Neither should we shift our fear to other, more lethal animals, which will likely result in the same indifference to their role in the ecosystem.

The point is rather to consider how well you make decisions when you only factor in the stories of the survivors. For instance, if you were to try to reduce instances of shark attacks or try to limit their severity, you will not likely get the results you are after if you only pay attention to the survivor stories. You need to ask who didn’t make it and try to figure out their stories as well. If you try to implement measures aimed only at great whites near beaches, your measures might not be effective against other predatory sharks. And if you conclude that swimmers are better off in the open ocean because sharks seem to only attack near beaches, you’d be completely wrong.

***

Survivorship bias crops up all over our lives and impedes us from accurately assessing danger. Replace “dangerous sharks” with “dangerous cities” or “dangerous vacation spots” and you can easily see how your picture of a certain location might be skewed based on the experiences of survivors. We can’t be afraid of a tale if no one lives to tell it. More survivors can make something seem more dangerous rather than less dangerous because the volume of stories makes them more memorable.

If fewer people survived shark attacks we wouldn’t have survivor stories influencing our perception about how dangerous sharks are. In all likelihood we would attribute some of the ocean deaths to other causes, like drowning, because it wouldn’t occur to us that sharks could be responsible.

Understanding survivorship bias prompts us to look for the stories of those who weren’t successful. A lack of visible survivors with memorable stories might mean we view other fields as far safer and easier than they are.

For example, a field of business where people who experience failures go on to do other things might seem riskier than one where people who fail are too ashamed to talk about it. The failure of tech start-ups sometimes feels like daily news. We don’t often, however, hear about the real estate agent who has trouble making sales or who keeps getting outbid on offers. Nor do we hear much about architects who design terrible houses or construction companies who don’t complete projects.

Survivorship bias prompts us to associate more risk with industries that exhibit more public failures. But the failures from industries or businesses that aren’t shared are equally important. If we focus only on the survivor stories, we might think that being a real estate agent or an architect is safer than starting a technology company. It might be, but we can’t only base our understanding on which career option is the best bet on the widely shared stories of failure.

If we don’t factor survivorship bias into our thinking we end up in a classic map is not the territory problem. The survivor stories become a poor navigational tool for the terrain.

Most of us know that we shouldn’t become a writer based on the results achieved by J.K Rowling and John Grisham. But even if we go out and talk to other writers, or learn about their careers, or attend writing seminars given by published authors, we are still only talking to the survivors.

Yes, it’s super inspiring to know Stephen King got so many rejections early in his career that the stack of them was enough to pull a nail out of the wall. But what about the writers who got just as many rejections and never published anything? Not only can we learn a lot from them about the publishing industry, we need to consider their experiences if we want to anticipate and understand the challenges involved in being a writer.

***

Not recognizing survivorship bias can lead to faulty decision making. We don’t see the big picture and end up optimizing for a small slice of reality. We can’t completely overcome survivorship bias. The best we can do is acknowledge it, and when the stakes are high or the result important, stop and look for the stories of those who were unsuccessful. They have just as much, if not more, to teach us.

The next time you’re assessing risk, ask yourself: am I paying too much attention to the great white sharks and not enough to the oceanic white tips?

Predicting the Future with Bayes’ Theorem

In a recent podcast, we talked with professional poker player Annie Duke about thinking in probabilities, something good poker players do all the time. At the poker table or in life, it’s useful to think in probabilities versus absolutes based on all the information you have available to you. You can improve your decisions and get better outcomes.

Probabilistic thinking leads you to ask yourself, how confident am I in this prediction? What information would impact this confidence?

Bayes’ Theorem

Bayes’ theorem is an accessible way of integrating probability thinking into our lives. Thomas Bayes was an English minister in the 18th century, whose most famous work, “An Essay toward Solving a Problem in the Doctrine of Chances,” was brought to the attention of the Royal Society in 1763—two years after his death—by his friend Richard Price. The essay did not contain the theorem as we now know it but had the seeds of the idea. It looked at how we should adjust our estimates of probabilities when we encounter new data that influence a situation. Later development by French scholar Pierre-Simon Laplace and others helped codify the theorem and develop it into a useful tool for thinking.

Knowing the exact math of probability calculations is not the key to understanding Bayesian thinking. More critical is your ability and desire to assign probabilities of truth and accuracy to anything you think you know, and then being willing to update those probabilities when new information comes in. Here is a short example, found in Investing: The Last Liberal Art, of how it works:

Let’s imagine that you and a friend have spent the afternoon playing your favorite board game, and now, at the end of the game, you are chatting about this and that. Something your friend says leads you to make a friendly wager: that with one roll of the die from the game, you will get a 6. Straight odds are one in six, a 16 percent probability. But then suppose your friend rolls the die, quickly covers it with her hand, and takes a peek. “I can tell you this much,” she says; “it’s an even number.” Now you have new information and your odds change dramatically to one in three, a 33 percent probability. While you are considering whether to change your bet, your friend teasingly adds: “And it’s not a 4.” With this additional bit of information, your odds have changed again, to one in two, a 50 percent probability. With this very simple example, you have performed a Bayesian analysis. Each new piece of information affected the original probability, and that is Bayesian [updating].

Both Nate Silver and Eliezer Yudkowsky have written about Bayes’ theorem in the context of medical testing, specifically mammograms. Imagine you live in a country with 100 million women under 40. Past trends have revealed that there is a 1.4% chance of a woman under 40 in this country getting breast cancer—so roughly 1.4 million women.

Mammograms will detect breast cancer 75% of the time. They will give out false positives—say a woman has breast cancer when she actually doesn’t—about 10% of the time. At first, you might focus just on the mammogram numbers and think that a 75% success rate means that a positive is bad news. Let’s do the math.

If all the women under 40 get mammograms, then the false positive rate will give 10 million women under 40 the news that they have breast cancer. But because you know the first statistic, that only 1.4 women under 40 actually get breast cancer, you know that 8.6 million of the women who tested positive are not actually going to have breast cancer!
That’s a lot of needless worrying, which leads to a lot of needless medical care. To remedy this poor understanding and make better decisions about using mammograms, we absolutely must consider prior knowledge when we look at the results, and try to update our beliefs with that knowledge in mind.

Weigh the Evidence

Often we ignore prior information, simply called “priors” in Bayesian-speak. We can blame this habit in part on the availability heuristic—we focus on what’s readily available. In this case, we focus on the newest information, and the bigger picture gets lost. We fail to adjust the probability of old information to reflect what we have learned.

The big idea behind Bayes’ theorem is that we must continuously update our probability estimates on an as-needed basis. In their book The Signal and the Noise, Nate Silver and Allen Lane give a contemporary example, reminding us that new information is often most useful when we put it in the larger context of what we already know:

Bayes’ theorem is an important reality check on our efforts to forecast the future. How, for instance, should we reconcile a large body of theory and evidence predicting global warming with the fact that there has been no warming trend over the last decade or so? Skeptics react with glee, while true believers dismiss the new information.

A better response is to use Bayes’ theorem: the lack of recent warming is evidence against recent global warming predictions, but it is weak evidence. This is because there is enough variability in global temperatures to make such an outcome unsurprising. The new information should reduce our confidence in our models of global warming—but only a little.

The same approach can be used in anything from an economic forecast to a hand of poker, and while Bayes’ theorem can be a formal affair, Bayesian reasoning also works as a rule of thumb. We tend to either dismiss new evidence, or embrace it as though nothing else matters. Bayesians try to weigh both the old hypothesis and the new evidence in a sensible way.

Limitations of the Bayesian

Don’t walk away thinking the Bayesian approach will enable you to predict everything! In addition to seeing the world as an ever-shifting array of probabilities, we must also remember the limitations of inductive reasoning.

A high probability of something being true is not the same as saying it is true. Consider this example from Bertrand Russell’s The Problems of Philosophy:

A horse which has been often driven along a certain road resists the attempt to drive him in a different direction. Domestic animals expect food when they see the person who usually feeds them. We know that all these rather crude expectations of uniformity are liable to be misleading. The man who has fed the chicken every day throughout its life at last wrings its neck instead, showing that more refined views as to the uniformity of nature would have been useful to the chicken.

In the final analysis, though, picking up Bayesian reasoning can change your life, as observed in this Big Think video by Julia Galef of the Center for Applied Rationality:

After you’ve been steeped in Bayes’ rule for a little while, it starts to produce some fundamental changes to your thinking. For example, you become much more aware that your beliefs are grayscale. They’re not black and white and that you have levels of confidence in your beliefs about how the world works that are less than 100 percent but greater than zero percent and even more importantly as you go through the world and encounter new ideas and new evidence, that level of confidence fluctuates, as you encounter evidence for and against your beliefs.

So be okay with uncertainty, and use it to your advantage. Instead of holding on to outdated beliefs by rejecting new information, take in what comes your way through a system of evaluating probabilities.

Bayes’ Theorem is part of the Farnam Street latticework of mental models. Still Curious? Read Bayes and Deadweight: Using Statistics to Eject the Deadweight From Your Life next. 

Learning community members can discuss this on the member forum

What’s So Significant About Significance?

How Not to be wrong

One of my favorite studies of all time took the 50 most common ingredients from a cookbook and searched the literature for a connection to cancer: 72% had a study linking them to increased or decreased risk of cancer. (Here’s the link for the interested.)

Meta-analyses (studies examining multiple studies) quashed the effect pretty seriously, but how many of those single studies were probably reported on in multiple media outlets, permanently causing changes in readers’ dietary habits? (We know from studying juries that people are often unable to “forget” things that are subsequently proven false or misleading — misleading data is sticky.)

The phrase “statistically significant” is one of the more unfortunately misleading ones of our time. The word significant in the statistical sense — meaning distinguishable from random chance — does not carry the same meaning in common parlance, in which we mean distinguishable from something that does not matter. Don’t worry, we will get to what that means.

Confusing the two gets at the heart of a lot of misleading headlines and it’s worth a brief look into why they don’t mean the same thing, so you can stop being scared that everything you eat or do is giving you cancer.

***

The term statistical significance is used to denote when an effect is found to be extremely unlikely to have occurred by chance. To make that determination, we have to propose a null hypothesis to be rejected. Let’s say we propose that eating an apple a day reduces the incidence of colon cancer. The “null hypothesis” here would be that eating an apple a day does nothing to the incidence of colon cancer — that we’d be equally likely to get colon cancer if we ate that daily apple.

When we analyze the data of our study, we’re technically not looking to say “Eating an apple a day prevents colon cancer” — that’s a bit of a misconception. What we’re actually doing is an inversionwe want the data to provide us with sufficient weight to reject the idea that apples have no effect on colon cancer.

And even when that happens, it’s not an all-or-nothing determination. What we’re actually saying is, “It would be extremely unlikely for the data we have, which shows a daily apple reduces colon cancer by 50%, to have popped up by chance. Not impossible, but very unlikely.” The world does not quite allow us to have absolute conviction.

How unlikely? The currently accepted standard in many fields is 5% — there is a less than 5% chance the data would come up this way randomly. That immediately tells you that at least 1 out of every 20 studies must be wrong, but alas that is where we’re at. (The problem with the 5% p-value, and the associated problem of p-hacking has been subject to some intense debate, but we won’t deal with that here.)

We’ll get to why “significance can be insignificant,” and why that’s so important, in a moment. But let’s make sure we’re fully on board with the importance of sorting chance events from real ones with another illustration, this one outlined by Jordan Ellenberg in his wonderful book How Not to Be Wrong. Pay close attention:

Suppose we’re in null hypothesis land, where the chance of death is exactly the same (say, 10%) for the fifty patients who got your drug and the fifty who got [a] placebo. But that doesn’t mean that five of the drug patients die and five of the placebo patients die. In fact, the chance that exactly five of the drug patients die is about 18.5%; not very likely, just as it’s not very likely that a long series of coin tosses would yield precisely as many heads as tails. In the same way, it’s not very likely that exactly the same number of drug patients and placebo patients expire during the course of the trial. I computed:

13.3% chance equally many drug and placebo patients die
43.3% chance fewer placebo patients than drug patients die
43.3% chance fewer drug patients than placebo patients die

Seeing better results among the drug patients than the placebo patients says very little, since this isn’t at all unlikely, even under the null hypothesis that your drug doesn’t work.

But things are different if the drug patients do a lot better. Suppose five of the placebo patients die during the trial, but none of the drug patients do. If the null hypothesis is right, both classes of patients should have a 90% chance of survival. But in that case, it’s highly unlikely that all fifty of the drug patients would survive. The first of the drug patients has a 90% chance; now the chance that not only the first but also the second patient survives is 90% of that 90%, or 81%–and if you want the third patient to survive as well, the chance of that happening is only 90% of that 81%, or 72.9%. Each new patient whose survival you stipulate shaves a little off the chances, and by the end of the process, where you’re asking about the probability that all fifty will survive, the slice of probability that remains is pretty slim:

(0.9) x (0.9) x (0.9) x … fifty times! … x (0.9) x (0.9) = 0.00515 …

Under the null hypothesis, there’s only one chance in two hundred of getting results this good. That’s much more compelling. If I claim I can make the sun come up with my mind, and it does, you shouldn’t be impressed by my powers; but if I claim I can make the sun not come up, and it doesn’t, then I’ve demonstrated an outcome very unlikely under the null hypothesis, and you’d best take notice.

So you see, all this null hypothesis stuff is pretty important because what you want to know is if an effect is really “showing up” or if it just popped up by chance.

A final illustration should make it clear:

Imagine you were flipping coins with a particular strategy of getting more heads, and after 30 flips you had 18 heads and 12 tails. Would you call it a miracle? Probably not — you’d realize immediately that it’s perfectly possible for an 18/12 ratio to happen by chance. You wouldn’t write an article in U.S. News and World Report proclaiming you’d figured out coin-flipping.

Now let’s say instead you flipped the coin 30,000 times and you get 18,000 heads and 12,000 tails…well, then your case for statistical significance would be pretty tight.  It would be approaching impossible to get that result by chance — your strategy must have something to it. The null hypothesis of “My coin flipping technique is no better than the usual one” would be easy to reject! (The p-value here would be orders of magnitude less than 5%, by the way.)

That’s what this whole business is about.

***

Now that we’ve got this idea down, we come to the big question that statistical significance cannot answer: Even if the result is distinguishable from chance, does it actually matter?

Statistical significance cannot tell you whether the result is worth paying attention to — even if you get the p-value down to a minuscule number, increasing your confidence that what you saw was not due to chance. 

In How Not to Be Wrong, Ellenberg provides a perfect example:

A 1995 study published in a British journal indicated that a new birth control pill doubled the risk of venous thrombosis (potentially killer blood clot) in its users. Predictably, 1.5 million British women freaked out, and some meaningfully large percentage of them stopped taking the pill. In 1996, 26,000 more babies were born than the previous year, and there were 13,600 more abortions. Whoops!

So what, right? Lots of mothers’ lives were saved, right?

Not really. The initial probability of a women getting a venous thrombosis with any old birth control pill was about 1 in 7,000 or about 0.01%. That means that the “Killer Pill,” even if was indeed increasing “thrombosis risk,” only increased that risk to 2 in 7,000, or about 0.02%!! Is that worth rearranging your life for? Probably not.

Ellenberg makes the excellent point that, at least in the case of health, the null hypothesis is unlikely to be right in most cases! The body is a complex system — of course what we put in it affects how it functions in some direction or another. It’s unlikely to be absolute zero.

But numerical and scale-based thinking, indispensable for anyone looking to not be a sucker, tells us that we must distinguish between small and meaningless effects (like the connection between almost all individual foods and cancer so far) and real ones (like the connection between smoking and lung cancer).

And now we arrive at the problem of “significance” — even if an effect is really happening, it still may not matter!  We must learn to be wary of “relative” statistics (i.e., “the risk has doubled”), and look to favor “absolute” statistics, which tell us whether the thing is worth worrying about at all.

So we have two important ideas:

A. Just like coin flips, many results are perfectly possible by chance. We use the concept of “statistical significance” to figure out how likely it is that the effect we’re seeing is real and not just a random illusion, like seeing 18 heads in 30 coin tosses.

B. Even if it is really happening, it still may be unimportant – an effect so insignificant in real terms that it’s not worth our attention.

These effects should combine to raise our level of skepticism when hearing about groundbreaking new studies! (A third and equally important problem is the fact that correlation is not causation, a common problem in many fields of science including nutritional epidemiology. Just because x is associated with y does not mean that x is causing y.)

Tread carefully and keep your thinking cap on.

***

Still Interested? Read Ellenberg’s great book to get your head working correctly, and check out our posts on Bayesian updating, another very useful statistical tool, and learn a little about how we distinguish science from pseudoscience.

Leonard Mlodinow: The Three Laws of Probability

“These three laws, simple as they are, form much of the basis of probability theory. Properly applied, they can give us much insight into the workings of nature and the everyday world.”

***

In his book, The Drunkard’s Walk, Leonard Mlodinow outlines the three key “laws” of probability.

The first law of probability is the most basic of all. But before we get to that, let’s look at this question.

Linda is 31 years old, single, outspoken, and very bright. She majored in philosophy. As a student, she was deeply concerned with issues of discrimination and social justice, and also participated in anti-nuclear demonstrations.
Which is more probable?

Linda is a bank teller.
Linda is a bank teller and is active in the feminist movement.

To Kahneman and Tversky’s surprise, 87 percent of the subjects in the study believed that the probability of Linda being a bank teller and active in the feminist movement was a higher probability than the probability that Linda is a bank teller.

1. The probability that two events will both occur can never be greater than the probability that each will occur individually.

This is the conjunction fallacy.

Mlodinow explains:

Why not? Simple arithmetic: the chances that event A will occur = the chances that events A and B will occur + the chance that event A will occur and event B will not occur.

The interesting thing that Kahneman and Tversky discovered was that we don’t tend to make this mistake unless we know something about the subject.

“For example,” Mlodinow muses, “suppose Kahneman and Tversky had asked which of these statements seems most probable:”

Linda owns an International House of Pancakes franchise.
Linda had a sex-change operation and is now known as Larry.
Linda had a sex-change operation, is now known as Larry, and owns an International House of Pancakes franchise.

In this case, it’s unlikely you would choose the last option.

Via The Drunkard’s Walk:

If the details we are given fit our mental picture of something, then the more details in a scenario, the more real it seems and hence the more probable we consider it to be—even though any act of adding less-than-certain details to a conjecture makes the conjecture less probable.

Or as Kahneman and Tversky put it, “A good story is often less probable than a less satisfactory… .”

2. If two possible events, A and B, are independent, then the probability that both A and B will occur is equal to the product of their individual probabilities.

Via The Drunkard’s Walk:

Suppose a married person has on average roughly a 1 in 50 chance of getting divorced each year. On the other hand, a police officer has about a 1 in 5,000 chance each year of being killed on the job. What are the chances that a married police officer will be divorced and killed in the same year? According to the above principle, if those events were independent, the chances would be roughly 1⁄50 × 1⁄5,000, which equals 1⁄250,000. Of course the events are not independent; they are linked: once you die, darn it, you can no longer get divorced. And so the chance of that much bad luck is actually a little less than 1 in 250,000.

Why multiply rather than add? Suppose you make a pack of trading cards out of the pictures of those 100 guys you’ve met so far through your Internet dating service, those men who in their Web site photos often look like Tom Cruise but in person more often resemble Danny DeVito. Suppose also that on the back of each card you list certain data about the men, such as honest (yes or no) and attractive (yes or no). Finally, suppose that 1 in 10 of the prospective soul mates rates a yes in each case. How many in your pack of 100 will pass the test on both counts? Let’s take honest as the first trait (we could equally well have taken attractive). Since 1 in 10 cards lists a yes under honest, 10 of the 100 cards will qualify. Of those 10, how many are attractive? Again, 1 in 10, so now you are left with 1 card. The first 1 in 10 cuts the possibilities down by 1⁄10, and so does the next 1 in 10, making the result 1 in 100. That’s why you multiply. And if you have more requirements than just honest and attractive, you have to keep multiplying, so . . . well, good luck.

And there are situations where probabilities should be added. That’s the next law.

“These occur when we want to know the chances of either one event or another occurring, as opposed to the earlier situation, in which we wanted to know the chance of one event and another event happening.”

3. If an event can have a number of different and distinct possible outcomes, A, B, C, and so on, then the probability that either A or B will occur is equal to the sum of the individual probabilities of A and B, and the sum of the probabilities of all the possible outcomes (A, B, C, and so on) is 1 (that is, 100 percent).

Via The Drunkard’s Walk:

When you want to know the chances that two independent events, A and B, will both occur, you multiply; if you want to know the chances that either of two mutually exclusive events, A or B, will occur, you add. Back to our airline: when should the gate attendant add the probabilities instead of multiplying them? Suppose she wants to know the chances that either both passengers or neither passenger will show up. In this case she should add the individual probabilities, which according to what we calculated above, would come to 55 percent.

These three simple laws form the basis of probability. “Properly applied,” Mlodinow writes, “they can give us much insight into the workings of nature and the everyday world.” We use them all the time, we just don’t use them properly.

Mental Model: Bias from Insensitivity to Sample Size

The widespread misunderstanding of randomness causes a lot of problems.

Today we’re going to explore a concept that causes a lot of human misjudgment. It’s called the bias from insensitivity to sample size, or, if you prefer,the law of small numbers.

Insensitivity to small sample sizes causes a lot of problems.

* * *

If I measured one person, who happened to measure 6 feet, and then told you that everyone in the whole world was 6 feet, you’d intuitively realize this is a mistake. You’d say, you can’t measure only one person and then draw such a conclusion. To do that you’d need a much larger sample.

And, of course, you’d be right.

While simple, this example is a key building block to our understanding of how insensitivity to sample size can lead us astray.

As Stuard Suterhland writes in Irrationality:

Before drawing conclusions from information about a limited number of events (a sample) selected from a much larger number of events (the population) it is important to understand something about the statistics of samples.

In Thinking, Fast and Slow, Daniel Kahneman writes “A random event, by definition, does not lend itself to explanation, but collections of random events do behave in a highly regular fashion.” Kahnemen continues, “extreme outcomes (both high and low) are more likely to be found in small than in large samples. This explanation is not causal.”

We all intuitively know that “the results of larger samples deserve more trust than smaller samples, and even people who are innocent of statistical knowledge have heard about this law of large numbers.”

The principle of regression to the mean says that as the sample size grows larger results should converge to a stable frequency. So, if we’re flipping coins, and measuring the proportion of times that we get heads, we’d expect it to approach 50% after some large sample size of, say, 100 but not necessarily 2 or 4.

In our minds, we often fail to account for the accuracy and uncertainty with a given sample size.

While we all understand it intuitively, it’s hard for us to realize in the moment of processing and decision making that larger samples are better representations than smaller samples.

We understand the difference between a sample size of 6 and 6,000,000 fairly well but we don’t, intuitively, understand the difference between 200 and 3,000.

* * *

This bias comes in many forms.

In a telephone poll of 300 seniors, 60% support the president.

If you had to summarize the message of this sentence in exactly three words, what would they be? Almost certainly you would choose “elderly support president.” These words provide the gist of the story. The omitted details of the poll, that it was done on the phone with a sample of 300, are of no interest in themselves; they provide background information that attracts little attention.” Of course, if the sample was extreme, say 6 people, you’d question it. Unless you’re fully mathematically equipped, however, you’ll intuitively judge the sample size and you may not react differently to a sample of, say, 150 and 3000. That, in a nutshell, is exactly the meaning of the statement that “people are not adequately sensitive to sample size.”

Part of the problem is that we focus on the story over reliability, or, robustness, of the results.

System one thinking, that is our intuition, is “not prone to doubt. It suppresses ambiguity and spontaneously constructs stories that are as coherent as possible. Unless the message is immediately negated, the associations that it evokes will spread as if the message were true.”

Considering sample size, unless it’s extreme, is not a part of our intuition.

Kahneman writes:

The exaggerated faith in small samples is only one example of a more general illusion – we pay more attention to the content of messages than to information about their reliability, and as a result end up with a view of the world around us that is simpler and more coherent than the data justify. Jumping to conclusions is a safer sport in the world of our imagination than it is in reality.

* * *

In engineering, for example, we can encounter this in the evaluation of precedent.

Steven Vick, writing in Degrees of Belief: Subjective Probability and Engineering Judgment, writes:

If something has worked before, the presumption is that it will work again without fail. That is, the probability of future success conditional on past success is taken as 1.0. Accordingly, a structure that has survived an earthquake would be assumed capable of surviving with the same magnitude and distance, with the underlying presumption being that the operative causal factors must be the same. But the seismic ground motions are quite variable in their frequency content, attenuation characteristics, and many other factors, so that a precedent for a single earthquake represents a very small sample size.

Bayesian thinking tells us that a single success, absent of other information, raises the likelihood of survival in the future.

In a way this is related to robustness. The more you’ve had to handle and you still survive the more robust you are.

Let’s look at some other examples.

* * *

Hospital

Daniel Kahneman and Amos Tversky demonstrated our insensitivity to sample size with the following question:

A certain town is served by two hospitals. In the larger hospital about 45 babies are born each day, and in the smaller hospital about 15 babies are born each day. As you know, about 50% of all babies are boys. However, the exact percentage varies from day to day. Sometimes it may be higher than 50%, sometimes lower. For a period of 1 year, each hospital recorded the days on which more than 60% of the babies born were boys. Which hospital do you think recorded more such days?

  1. The larger hospital
  2. The smaller hospital
  3. About the same (that is, within 5% of each other)

Most people incorrectly choose 3. The correct answer is, however, 2.

In Judgment in Managerial Decision Making, Max Bazerman explains:

Most individuals choose 3, expecting the two hospitals to record a similar number of days on which 60 percent or more of the babies board are boys. People seem to have some basic idea of how unusual it is to have 60 percent of a random event occurring in a specific direction. However, statistics tells us that we are much more likely to observe 60 percent of male babies in a smaller sample than in a larger sample.” This effect is easy to understand. Think about which is more likely: getting more than 60 percent heads in three flips of coin or getting more than 60 percent heads in 3,000 flips.

* * *

Another interesting example comes from Poker.

Over short periods of time luck is more important than skill. The more luck contributes to the outcome, the larger the sample you’ll need to distinguish between someone’s skill and pure chance.

David Einhorn explains.

People ask me “Is poker luck?” and “Is investing luck?”

The answer is, not at all. But sample sizes matter. On any given day a good investor or a good poker player can lose money. Any stock investment can turn out to be a loser no matter how large the edge appears. Same for a poker hand. One poker tournament isn’t very different from a coin-flipping contest and neither is six months of investment results.

On that basis luck plays a role. But over time – over thousands of hands against a variety of players and over hundreds of investments in a variety of market environments – skill wins out.

As the number of hands played increases, skill plays a larger and larger role and luck plays less of a role.

* * *

But this goes way beyond hospitals and poker. Baseball is another good example. Over a long season, odds are the best teams will rise to the top. In the short term, anything can happen. If you look at the standing 10 games into the season, odds are they will not be representative of where things will land after the full 162 game season. In the short term, luck plays too much of a role.

In Moneyball, Michael Lewis writes “In a five-game series, the worst team in baseball will beat the best about 15% of the time.”

* * *

If you promote people or work with colleagues you’ll also want to keep this bias in mind.

If you assume that performance at work is some combination of skill and luck you can easily see that sample size is relevant to the reliability of performance.

That performance sampling works like anything else, the bigger the sample size the bigger the reduction in uncertainty and the more likely you are to make good decisions.

This has been studied by one of my favorite thinkers, James March. He calls it the false record effect.

He writes:

False Record Effect. A group of managers of identical (moderate) ability will show considerable variation in their performance records in the short run. Some will be found at one end of the distribution and will be viewed as outstanding; others will be at the other end and will be viewed as ineffective. The longer a manager stays in a job, the less the probable difference between the observed record of performance and actual ability. Time on the job increased the expected sample of observations, reduced expected sampling error, and thus reduced the change that the manager (or moderate ability) will either be promoted or exit.

Hero Effect. Within a group of managers of varying abilities, the faster the rate of promotion, the less likely it is to be justified. Performance records are produced by a combination of underlying ability and sampling variation. Managers who have good records are more likely to have high ability than managers who have poor records, but the reliability of the differentiation is small when records are short.

(I realize promotions are a lot more complicated than I’m letting on. Some jobs, for example, are more difficult than others. It gets messy quickly and that’s part of the problem. Often when things get messy we turn off our brains and concoct the simplest explanation we can. Simple but wrong. I’m only pointing out that sample size is one input into the decision. I’m by no means advocating an “experience is best” approach, as that comes with a host of other problems.)

* * *

This bias is also used against you in advertising.

The next time you see a commercial that says “4 out of 5 Doctors recommend ….” These results are meaningless without knowing the sample size. Odds are pretty good that the sample size is 5.

* * *

Large sample sizes are not a panacea. Things change. Systems evolve and faith in those results can be unfounded as well.

The key, at all times, is to think.

This bias leads to a whole slew of things, such as:
– under-estimating risk
– over-estimating risk
– undue confidence in trends/patterns
– undue confidence in the lack of side-effects/problems

The Bias from insensitivity to sample size is part of the Farnam Street latticework of mental models.

12