Tag: Statistics

Predicting the Future with Bayes’s Theorem

In a recent podcast, we talked with professional poker player Annie Duke about thinking in probabilities, something good poker players do all the time. At the poker table or in life, it’s really useful to think in probabilities versus absolutes based on all the information you have available to you. You can improve your decisions and get better outcomes. Probabilistic thinking leads you to ask yourself, how confident am I in this prediction? What information would impact this confidence?

Bayes’s Theorem

Bayes’s theorem is an accessible way of integrating probability thinking into our lives. Thomas Bayes was an English minister in the 18th century, whose most famous work, “An Essay toward Solving a Problem in the Doctrine of Chances,” was brought to the attention of the Royal Society in 1763—two years after his death—by his friend Richard Price. The essay did not contain the theorem as we now know it, but had the seeds of the idea. It looked at how we should adjust our estimates of probabilities when we encounter new data that influence a situation. Later development by French scholar Pierre-Simon Laplace and others helped codify the theorem and develop it into a useful tool for thinking.

Knowing the exact math of probability calculations is not the key to understanding Bayesian thinking. More critical is your ability and desire to assign probabilities of truth and accuracy to anything you think you know, and then being willing to update those probabilities when new information comes in. Here is a short example, found in Investing: The Last Liberal Art, of how it works:

Let’s imagine that you and a friend have spent the afternoon playing your favorite board game, and now, at the end of the game, you are chatting about this and that. Something your friend says leads you to make a friendly wager: that with one roll of the die from the game, you will get a 6. Straight odds are one in six, a 16 percent probability. But then suppose your friend rolls the die, quickly covers it with her hand, and takes a peek. “I can tell you this much,” she says; “it’s an even number.” Now you have new information and your odds change dramatically to one in three, a 33 percent probability. While you are considering whether to change your bet, your friend teasingly adds: “And it’s not a 4.” With this additional bit of information, your odds have changed again, to one in two, a 50 percent probability. With this very simple example, you have performed a Bayesian analysis. Each new piece of information affected the original probability, and that is Bayesian [updating].

Both Nate Silver and Eliezer Yudkowsky have written about Bayes’s theorem in the context of medical testing, specifically mammograms. Imagine you live in a country with 100 million women under 40. Past trends have revealed that there is a 1.4% chance of a woman under 40 in this country getting breast cancer—so roughly 1.4 million women.

Mammograms will detect breast cancer 75% of the time. They will give out false positives—say a woman has breast cancer when she actually doesn’t—about 10% of the time. At first, you might focus just on the mammogram numbers and think that 75% success rate means that a positive is bad news. Let’s do the math.

If all the women under 40 get mammograms, then the false positive rate will give 10 million women under 40 the news that they have breast cancer. But because you know the first statistic, that only 1.4 women under 40 actually get breast cancer, you know that 8.6 million of the women who tested positive are not actually going to have breast cancer!
That’s a lot of needless worrying, which leads to a lot of needless medical care. In order to remedy this poor understanding and make better decisions about using mammograms, we absolutely must consider prior knowledge when we look at the results, and try to update our beliefs with that knowledge in mind.

Weigh the Evidence

Often we ignore prior information, simply called “priors” in Bayesian-speak. We can blame this habit in part on the availability heuristic—we focus on what’s readily available. In this case, we focus on the newest information and the bigger picture gets lost. We fail to adjust the probability of old information to reflect what we have learned.

The big idea behind Bayes’s theorem is that we must continuously update our probability estimates on an as-needed basis. In their book The Signal and the Noise, Nate Silver and Allen Lane give a contemporary example, reminding us that new information is often most useful when we put it in the larger context of what we already know:

Bayes’s theorem is an important reality check on our efforts to forecast the future. How, for instance, should we reconcile a large body of theory and evidence predicting global warming with the fact that there has been no warming trend over the last decade or so? Skeptics react with glee, while true believers dismiss the new information.

A better response is to use Bayes’s theorem: the lack of recent warming is evidence against recent global warming predictions, but it is weak evidence. This is because there is enough variability in global temperatures to make such an outcome unsurprising. The new information should reduce our confidence in our models of global warming—but only a little.

The same approach can be used in anything from an economic forecast to a hand of poker, and while Bayes’s theorem can be a formal affair, Bayesian reasoning also works as a rule of thumb. We tend to either dismiss new evidence, or embrace it as though nothing else matters. Bayesians try to weigh both the old hypothesis and the new evidence in a sensible way.

Limitations of the Bayesian

Don’t walk away thinking the Bayesian approach will enable you to predict everything! In addition to seeing the the world as an ever-shifting array of probabilities, we must also remember the limitations of inductive reasoning. A high probability of something being true is not the same as saying it is true. A great example of this is from Bertrand Russell’s The Problems of Philosophy:

A horse which has been often driven along a certain road resists the attempt to drive him in a different direction. Domestic animals expect food when they see the person who usually feeds them. We know that all these rather crude expectations of uniformity are liable to be misleading. The man who has fed the chicken every day throughout its life at last wrings its neck instead, showing that more refined views as to the uniformity of nature would have been useful to the chicken.

In the final analysis, though, picking up Bayesian reasoning can truly change your life, as observed in this Big Think video by Julia Galef of the Center for Applied Rationality:

After you’ve been steeped in Bayes’s rule for a little while, it starts to produce some fundamental changes to your thinking. For example, you become much more aware that your beliefs are grayscale. They’re not black and white and that you have levels of confidence in your beliefs about how the world works that are less than 100 percent but greater than zero percent and even more importantly as you go through the world and encounter new ideas and new evidence, that level of confidence fluctuates, as you encounter evidence for and against your beliefs.

So be okay with uncertainty, and use it to your advantage. Instead of holding on to outdated beliefs by rejecting new information, take in what comes your way through a system of evaluating probabilities.

Bayes’s Theorem is part of the Farnam Street latticework of mental models. Still Curious? Read Bayes and Deadweight: Using Statistics to Eject the Deadweight From Your Life next. 

Learning community members can discuss this on the member forum

What’s So Significant About Significance?

How Not to be wrong

One of my favorite studies of all time took the 50 most common ingredients from a cookbook and searched the literature for a connection to cancer: 72% had a study linking them to increased or decreased risk of cancer. (Here’s the link for the interested.)

Meta-analyses (studies examining multiple studies) quashed the effect pretty seriously, but how many of those single studies were probably reported on in multiple media outlets, permanently causing changes in readers’ dietary habits? (We know from studying juries that people are often unable to “forget” things that are subsequently proven false or misleading — misleading data is sticky.)

The phrase “statistically significant” is one of the more unfortunately misleading ones of our time. The word significant in the statistical sense — meaning distinguishable from random chance — does not carry the same meaning in common parlance, in which we mean distinguishable from something that does not matterWe’ll get to what that means.

Confusing the two gets at the heart of a lot of misleading headlines and it’s worth a brief look into why they don’t mean the same thing, so you can stop being scared that everything you eat or do is giving you cancer.

***

The term statistical significance is used to denote when an effect is found to be extremely unlikely to have occurred by chance. In order to make that determination, we have to propose a null hypothesis to be rejected. Let’s say we propose that eating an apple a day reduces the incidence of colon cancer. The “null hypothesis” here would be that eating an apple a day does nothing to the incidence of colon cancer — that we’d be equally likely to get colon cancer if we ate that daily apple.

When we analyze the data of our study, we’re technically not looking to say “Eating an apple a day prevents colon cancer” — that’s a bit of a misconception. What we’re actually doing is an inversion we want the data to provide us with sufficient weight to reject the idea that apples have no effect on colon cancer.

And even when that happens, it’s not an all-or-nothing determination. What we’re actually saying is “It would be extremely unlikely for the data we have, which shows a daily apple reduces colon cancer by 50%, to have popped up by chance. Not impossible, but very unlikely.” The world does not quite allow us to have absolute conviction.

How unlikely? The currently accepted standard in many fields is 5% — there is a less than 5% chance the data would come up this way randomly. That immediately tells you that at least 1 out of every 20 studies must be wrong, but alas that is where we’re at. (The problem with the 5% p-value, and the associated problem of p-hacking has been subject to some intense debate, but we won’t deal with that here.)

We’ll get to why “significance can be insignificant,” and why that’s so important, in a moment. But let’s make sure we’re fully on board with the importance of sorting chance events from real ones with another illustration, this one outlined by Jordan Ellenberg in his wonderful book How Not to Be WrongPay close attention:

Suppose we’re in null hypothesis land, where the chance of death is exactly the same (say, 10%) for the fifty patients who got your drug and the fifty who got [a] placebo. But that doesn’t mean that five of the drug patients die and five of the placebo patients die. In fact, the chance that exactly five of the drug patients die is about 18.5%; not very likely, just as it’s not very likely that a long series of coin tosses would yield precisely as many heads as tails. In the same way, it’s not very likely that exactly the same number of drug patients and placebo patients expire during the course of the trial. I computed:

13.3% chance equally many drug and placebo patients die
43.3% chance fewer placebo patients than drug patients die
43.3% chance fewer drug patients than placebo patients die

Seeing better results among the drug patients than the placebo patients says very little, since this isn’t at all unlikely, even under the null hypothesis that your drug doesn’t work.

But things are different if the drug patients do a lot better. Suppose five of the placebo patients die during the trial, but none of the drug patients do. If the null hypothesis is right, both classes of patients should have a 90% chance of survival. But in that case, it’s highly unlikely that all fifty of the drug patients would survive. The first of the drug patients has a 90% chance; now the chance that not only the first but also the second patient survives is 90% of that 90%, or 81%–and if you want the third patient to survive as well, the chance of that happening is only 90% of that 81%, or 72.9%. Each new patient whose survival you stipulate shaves a little off the chances, and by the end of the process, where you’re asking about the probability that all fifty will survive, the slice of probability that remains is pretty slim:

(0.9) x (0.9) x (0.9) x … fifty times! … x (0.9) x (0.9) = 0.00515 …

Under the null hypothesis, there’s only one chance in two hundred of getting results this good. That’s much more compelling. If I claim I can make the sun come up with my mind, and it does, you shouldn’t be impressed by my powers; but if I claim I can make the sun not come up, and it doesn’t, then I’ve demonstrated an outcome very unlikely under the null hypothesis, and you’d best take notice.

So you see, all this null hypothesis stuff is pretty important because what you want to know is if an effect is really “showing up” or if it just popped up by chance.

A final illustration should make it clear:

Imagine you were flipping coins with a particular strategy of getting more heads, and after 30 flips you had 18 heads and 12 tails. Would you call it a miracle? Probably not — you’d realize immediately that it’s perfectly possible for an 18/12 ratio to happen by chance. You wouldn’t write an article in U.S. News and World Report proclaiming you’d figured out coin flipping.

Now let’s say instead you flipped the coin 30,000 times and you get 18,000 heads and 12,000 tails…well, then your case for statistical significance would be pretty tight.  It would be approaching impossible to get that result by chance — your strategy must have something to it. The null hypothesis of “My coin flipping technique is no better than the usual one” would be easy to reject! (The p-value here would be orders of magnitude less than 5%, by the way.)

That’s what this whole business is about.

***

Now that we’ve got this idea down, we come to the big question that statistical significance cannot answer: Even if the result is distinguishable from chance, does it actually matter?

Statistical significance cannot tell you whether the result is worth paying attention to — even if you get the p-value down to a minuscule number, increasing your confidence that what you saw was not due to chance. 

In How Not to be Wrong, Ellenberg provides a perfect example:

A 1995 study published in a British journal indicated that a new birth control pill doubled the risk of venous thrombosis (potentially killer blood clot) in its users. Predictably, 1.5 million British women freaked out, and some meaningfully large percentage of them stopped taking the pill. In 1996, 26,000 more babies were born than the previous year and there were 13,600 more abortions. Whoops!

So what, right? Lots of mothers’ lives were saved, right?

Not really. The initial probability of a women getting a venous thrombosis with any old birth control pill, was about 1 in 7,000 or about 0.01%. That means that the “Killer Pill,” even if was indeed increasing “thrombosis risk,” only increased that risk to 2 in 7,000, or about 0.02%!! Is that worth rearranging your life for? Probably not.

Ellenberg makes the excellent point that, at least in the case of health, the null hypothesis is unlikely to be right in most cases! The body is a complex system — of course what we put in it affects how it functions in some direction or another. It’s unlikely to be absolute zero.

But numerical and scale-based thinking, indispensable for anyone looking to not be a sucker, tells us that we must distinguish between small and meaningless effects (like the connection between almost all individual foods and cancer so far) and real ones (like the connection between smoking and lung cancer).

And now we arrive at the problem of “significance” — even if an effect is really happening, it still may not matter!  We must learn to be wary of “relative” statistics (i.e., “the risk has doubled”), and look to favor “absolute” statistics, which tell us whether the thing is worth worrying about at all.

So we have two important ideas:

A. Just like coin flips, many results are perfectly possible by chance. We use the concept of “statistical significance” to figure out how likely it is that the effect we’re seeing is real and not just a random illusion, like seeing 18 heads in 30 coin tosses.

B. Even if it is really happening, it still may be unimportant – an effect so insignificant in real terms that it’s not worth our attention.

These effects should combine to raise our level of skepticism when hearing about groundbreaking new studies! (A third and equally important problem is the fact that correlation is not causation, a common problem in many fields of science including nutritional epidemiology. Just because x is associated with y does not mean that x is causing y.)

Tread carefully and keep your thinking cap on.

***

Still Interested? Read Ellenberg’s great book to get your head working correctly, and check out our posts on Bayesian updating, another very useful statistical tool, and learn a little about how we distinguish science from pseudoscience.

Leonard Mlodinow: The Three Laws of Probability

"These three laws, simple as they are, form much of the basis of probability theory. Properly applied, they can give us much insight into the workings of nature and the everyday world. "
“These three laws, simple as they are, form much of the basis of probability theory. Properly applied, they can give us much insight into the workings of nature and the everyday world.”

 

In his book, The Drunkard’s Walk, Leonard Mlodinow outlines the three key “laws” of probability.

The first law of probability is the most basic of all. But before we get to that, let’s look at this question.

Linda is 31 years old, single, outspoken, and very bright. She majored in philosophy. As a student, she was deeply concerned with issues of discrimination and social justice, and also participated in anti-nuclear demonstrations.
Which is more probable?

Linda is a bank teller.
Linda is a bank teller and is active in the feminist movement.

To Kahneman and Tversky’s surprise, 87 percent of the subjects in the study believed that the probability of Linda being a bank teller and active in the feminist movement was a higher probability than the probability that Linda is a bank teller.

1. The probability that two events will both occur can never be greater than the probability that each will occur individually.

This is the conjunction fallacy.

Mlodinow explains:

Why not? Simple arithmetic: the chances that event A will occur = the chances that events A and B will occur + the chance that event A will occur and event B will not occur.

The interesting thing that Kahneman and Tversky discovered was that we don’t tend to make this mistake unless we know something about the subject.

“For example,” Mlodinow muses, “suppose Kahneman and Tversky had asked which of these statements seems most probable:”

Linda owns an International House of Pancakes franchise.
Linda had a sex-change operation and is now known as Larry.
Linda had a sex-change operation, is now known as Larry, and owns an International House of Pancakes franchise.

In this case it’s unlikely you would choose the last option.

Via The Drunkard’s Walk:

If the details we are given fit our mental picture of something, then the more details in a scenario, the more real it seems and hence the more probable we consider it to be—even though any act of adding less-than-certain details to a conjecture makes the conjecture less probable.

Or as Kahneman and Tversky put it, “A good story is often less probable than a less satisfactory… .”

2. If two possible events, A and B, are independent, then the probability that both A and B will occur is equal to the product of their individual probabilities.

Via The Drunkard’s Walk:

Suppose a married person has on average roughly a 1 in 50 chance of getting divorced each year. On the other hand, a police officer has about a 1 in 5,000 chance each year of being killed on the job. What are the chances that a married police officer will be divorced and killed in the same year? According to the above principle, if those events were independent, the chances would be roughly 1⁄50 × 1⁄5,000, which equals 1⁄250,000. Of course the events are not independent; they are linked: once you die, darn it, you can no longer get divorced. And so the chance of that much bad luck is actually a little less than 1 in 250,000.

Why multiply rather than add? Suppose you make a pack of trading cards out of the pictures of those 100 guys you’ve met so far through your Internet dating service, those men who in their Web site photos often look like Tom Cruise but in person more often resemble Danny DeVito. Suppose also that on the back of each card you list certain data about the men, such as honest (yes or no) and attractive (yes or no). Finally, suppose that 1 in 10 of the prospective soul mates rates a yes in each case. How many in your pack of 100 will pass the test on both counts? Let’s take honest as the first trait (we could equally well have taken attractive). Since 1 in 10 cards lists a yes under honest, 10 of the 100 cards will qualify. Of those 10, how many are attractive? Again, 1 in 10, so now you are left with 1 card. The first 1 in 10 cuts the possibilities down by 1⁄10, and so does the next 1 in 10, making the result 1 in 100. That’s why you multiply. And if you have more requirements than just honest and attractive, you have to keep multiplying, so . . . well, good luck.

And there are situations where probabilities should be added. That’s the next law.

“These occur when we want to know the chances of either one event or another occurring, as opposed to the earlier situation, in which we wanted to know the chance of one event and another event happening.”

3. If an event can have a number of different and distinct possible outcomes, A, B, C, and so on, then the probability that either A or B will occur is equal to the sum of the individual probabilities of A and B, and the sum of the probabilities of all the possible outcomes (A, B, C, and so on) is 1 (that is, 100 percent).

Via The Drunkard’s Walk:

When you want to know the chances that two independent events, A and B, will both occur, you multiply; if you want to know the chances that either of two mutually exclusive events, A or B, will occur, you add. Back to our airline: when should the gate attendant add the probabilities instead of multiplying them? Suppose she wants to know the chances that either both passengers or neither passenger will show up. In this case she should add the individual probabilities, which according to what we calculated above, would come to 55 percent.

These three simple laws form the basis of probability. “Properly applied,” Mlodinow writes, “they can give us much insight into the workings of nature and the everyday world.” We use them all the time, we just don’t use them properly.

Mental Model: Bias from Insensitivity to Sample Size

The widespread misunderstanding of randomness causes a lot of problems.

Today we’re going to explore a concept that causes a lot of human misjudgment. It’s called the bias from insensitivity to sample size, or, if you prefer,the law of small numbers.

Insensitivity to small sample sizes causes a lot of problems.

* * *

If I measured one person, who happened to measure 6 feet, and then told you that everyone in the whole world was 6 feet, you’d intuitively realize this is a mistake. You’d say, you can’t measure only one person and then draw such a conclusion. To do that you’d need a much larger sample.

And, of course, you’d be right.

While simple, this example is a key building block to our understanding of how insensitivity to sample size can lead us astray.

As Stuard Suterhland writes in Irrationality:

Before drawing conclusions from information about a limited number of events (a sample) selected from a much larger number of events (the population) it is important to understand something about the statistics of samples.

In Thinking, Fast and Slow, Daniel Kahneman writes “A random event, by definition, does not lend itself to explanation, but collections of random events do behave in a highly regular fashion.” Kahnemen continues, “extreme outcomes (both high and low) are more likely to be found in small than in large samples. This explanation is not causal.”

We all intuitively know that “the results of larger samples deserve more trust than smaller samples, and even people who are innocent of statistical knowledge have heard about this law of large numbers.”

The principle of regression to the mean says that as the sample size grows larger results should converge to a stable frequency. So, if we’re flipping coins, and measuring the proportion of times that we get heads, we’d expect it to approach 50% after some large sample size of, say, 100 but not necessarily 2 or 4.

In our minds, we often fail to account for the accuracy and uncertainty with a given sample size.

While we all understand it intuitively, it’s hard for us to realize in the moment of processing and decision making that larger samples are better representations than smaller samples.

We understand the difference between a sample size of 6 and 6,000,000 fairly well but we don’t, intuitively, understand the difference between 200 and 3,000.

* * *

This bias comes in many forms.

In a telephone poll of 300 seniors, 60% support the president.

If you had to summarize the message of this sentence in exactly three words, what would they be? Almost certainly you would choose “elderly support president.” These words provide the gist of the story. The omitted details of the poll, that it was done on the phone with a sample of 300, are of no interest in themselves; they provide background information that attracts little attention.” Of course, if the sample was extreme, say 6 people, you’d question it. Unless you’re fully mathematically equipped, however, you’ll intuitively judge the sample size and you may not react differently to a sample of, say, 150 and 3000. That, in a nutshell, is exactly the meaning of the statement that “people are not adequately sensitive to sample size.”

Part of the problem is that we focus on the story over reliability, or, robustness, of the results.

System one thinking, that is our intuition, is “not prone to doubt. It suppresses ambiguity and spontaneously constructs stories that are as coherent as possible. Unless the message is immediately negated, the associations that it evokes will spread as if the message were true.”

Considering sample size, unless it’s extreme, is not a part of our intuition.

Kahneman writes:

The exaggerated faith in small samples is only one example of a more general illusion – we pay more attention to the content of messages than to information about their reliability, and as a result end up with a view of the world around us that is simpler and more coherent than the data justify. Jumping to conclusions is a safer sport in the world of our imagination than it is in reality.

* * *

In engineering, for example, we can encounter this in the evaluation of precedent.

Steven Vick, writing in Degrees of Belief: Subjective Probability and Engineering Judgment, writes:

If something has worked before, the presumption is that it will work again without fail. That is, the probability of future success conditional on past success is taken as 1.0. Accordingly, a structure that has survived an earthquake would be assumed capable of surviving with the same magnitude and distance, with the underlying presumption being that the operative causal factors must be the same. But the seismic ground motions are quite variable in their frequency content, attenuation characteristics, and many other factors, so that a precedent for a single earthquake represents a very small sample size.

Bayesian thinking tells us that a single success, absent of other information, raises the likelihood of survival in the future.

In a way this is related to robustness. The more you’ve had to handle and you still survive the more robust you are.

Let’s look at some other examples.

* * *

Hospital

Daniel Kahneman and Amos Tversky demonstrated our insensitivity to sample size with the following question:

A certain town is served by two hospitals. In the larger hospital about 45 babies are born each day, and in the smaller hospital about 15 babies are born each day. As you know, about 50% of all babies are boys. However, the exact percentage varies from day to day. Sometimes it may be higher than 50%, sometimes lower. For a period of 1 year, each hospital recorded the days on which more than 60% of the babies born were boys. Which hospital do you think recorded more such days?

  1. The larger hospital
  2. The smaller hospital
  3. About the same (that is, within 5% of each other)

Most people incorrectly choose 3. The correct answer is, however, 2.

In Judgment in Managerial Decision Making, Max Bazerman explains:

Most individuals choose 3, expecting the two hospitals to record a similar number of days on which 60 percent or more of the babies board are boys. People seem to have some basic idea of how unusual it is to have 60 percent of a random event occurring in a specific direction. However, statistics tells us that we are much more likely to observe 60 percent of male babies in a smaller sample than in a larger sample.” This effect is easy to understand. Think about which is more likely: getting more than 60 percent heads in three flips of coin or getting more than 60 percent heads in 3,000 flips.

* * *

Another interesting example comes from Poker.

Over short periods of time luck is more important than skill. The more luck contributes to the outcome, the larger the sample you’ll need to distinguish between someone’s skill and pure chance.

David Einhorn explains.

People ask me “Is poker luck?” and “Is investing luck?”

The answer is, not at all. But sample sizes matter. On any given day a good investor or a good poker player can lose money. Any stock investment can turn out to be a loser no matter how large the edge appears. Same for a poker hand. One poker tournament isn’t very different from a coin-flipping contest and neither is six months of investment results.

On that basis luck plays a role. But over time – over thousands of hands against a variety of players and over hundreds of investments in a variety of market environments – skill wins out.

As the number of hands played increases, skill plays a larger and larger role and luck plays less of a role.

* * *

But this goes way beyond hospitals and poker. Baseball is another good example. Over a long season, odds are the best teams will rise to the top. In the short term, anything can happen. If you look at the standing 10 games into the season, odds are they will not be representative of where things will land after the full 162 game season. In the short term, luck plays too much of a role.

In Moneyball, Michael Lewis writes “In a five-game series, the worst team in baseball will beat the best about 15% of the time.”

* * *

If you promote people or work with colleagues you’ll also want to keep this bias in mind.

If you assume that performance at work is some combination of skill and luck you can easily see that sample size is relevant to the reliability of performance.

That performance sampling works like anything else, the bigger the sample size the bigger the reduction in uncertainty and the more likely you are to make good decisions.

This has been studied by one of my favorite thinkers, James March. He calls it the false record effect.

He writes:

False Record Effect. A group of managers of identical (moderate) ability will show considerable variation in their performance records in the short run. Some will be found at one end of the distribution and will be viewed as outstanding; others will be at the other end and will be viewed as ineffective. The longer a manager stays in a job, the less the probable difference between the observed record of performance and actual ability. Time on the job increased the expected sample of observations, reduced expected sampling error, and thus reduced the change that the manager (or moderate ability) will either be promoted or exit.

Hero Effect. Within a group of managers of varying abilities, the faster the rate of promotion, the less likely it is to be justified. Performance records are produced by a combination of underlying ability and sampling variation. Managers who have good records are more likely to have high ability than managers who have poor records, but the reliability of the differentiation is small when records are short.

(I realize promotions are a lot more complicated than I’m letting on. Some jobs, for example, are more difficult than others. It gets messy quickly and that’s part of the problem. Often when things get messy we turn off our brains and concoct the simplest explanation we can. Simple but wrong. I’m only pointing out that sample size is one input into the decision. I’m by no means advocating an “experience is best” approach, as that comes with a host of other problems.)

* * *

This bias is also used against you in advertising.

The next time you see a commercial that says “4 out of 5 Doctors recommend ….” These results are meaningless without knowing the sample size. Odds are pretty good that the sample size is 5.

* * *

Large sample sizes are not a panacea. Things change. Systems evolve and faith in those results can be unfounded as well.

The key, at all times, is to think.

This bias leads to a whole slew of things, such as:
– under-estimating risk
– over-estimating risk
– undue confidence in trends/patterns
– undue confidence in the lack of side-effects/problems

The Bias from insensitivity to sample size is part of the Farnam Street latticework of mental models.

Nassim Taleb: The Big Errors of Big Data

bigdatabigerrors_taleb

I am not saying here that there is no information in big data.
There is plenty of information.

The problem — the central issue — is that the needle
comes in an increasingly larger haystack.

***

Nassim Taleb offers another way to look at big data.

We’re more fooled by noise than ever before, and it’s because of a nasty phenomenon called “big data.” With big data, researchers have brought cherry-picking to an industrial level.

Modernity provides too many variables, but too little data per variable. So the spurious relationships grow much, much faster than real information.

In other words: Big data may mean more information, but it also means more false information.

To me, this relates to reading the news.

We’re consumed—bombarded even—by all of this incoming information that’s constructed in a way to capture and maintain our attention. Somewhat counter-intuitively, this distraction offers negative, not positive utility. Not only does it give us easily accessible information that’s full of noise from people that are not deeply fluent in the subject they are talking about but we rarely consider the opportunity cost of this time or the false confidence it gives us (which causes us to take undue risks).

It would be much better to focus our limited attention in two places.

The first, our niche. That is our narrow specialization or our circle of competence. This is after all how we’ll make a living.

The second, on how the world works. These are the time-tested ideas that repeat throughout history and don’t change as time passes.

These are the mental models that you can use to not only better understand how the world works and why people behave as they do, but also to make better decisions.

They combine a narrow specialization with a general view of how the world works. If you further add how to think, now you’re really getting somewhere.