Tag: Insensitivity to Sample Size

The Disproportional Power of Anecdotes

Humans, it seems, have an innate tendency to overgeneralize from small samples. How many times have you been caught in an argument where the only proof offered is anecdotal? Perhaps your co-worker saw this bratty kid make a mess in the grocery store while the parents appeared to do nothing. “They just let that child pull things off the shelves and create havoc! My parents would never have allowed that. Parents are so permissive now.” Hmm. Is it true that most parents commonly allow young children to cause trouble in public? It would be a mistake to assume so based on the evidence presented, but a lot of us would go with it anyway. Your co-worker did.

Our propensity to confuse the “now” with “what always is,” as if the immediate world before our eyes consistently represents the entire universe, leads us to bad conclusions and bad decisions. We don’t bother asking questions and verifying validity. So we make mistakes and allow ourselves to be easily manipulated.

Political polling is a good example. It’s actually really hard to design and conduct a good poll. Matthew Mendelsohn and Jason Brent, in their article “Understanding Polling Methodology,” say:

Public opinion cannot be understood by using only a single question asked at a single moment. It is necessary to measure public opinion along several different dimensions, to review results based on a variety of different wordings, and to verify findings on the basis of repetition. Any one result is filled with potential error and represents one possible estimation of the state of public opinion.

This makes sense. But it’s amazing how often we forget.

We see a headline screaming out about the state of affairs and we dive right in, instant believers, without pausing to question the validity of the methodology. How many people did they sample? How did they select them? Most polling aims for random sampling, but there is pre-selection at work immediately, depending on the medium the pollsters use to reach people.

Truly random samples of people are hard to come by. In order to poll people, you have to be able to reach them. The more complicated this is, the more expensive the poll becomes, which acts as a deterrent to thoroughness. The internet can offer high accessibility for a relatively low cost, but it’s a lot harder to verify the integrity of the demographics. And if you go the telephone route, as a lot of polling does, are you already distorting the true randomness of your sample size? Are the people who answer “unknown” numbers already different from those who ignore them?

Polls are meant to generalize larger patterns of behavior based on small samples. You need to put a lot of effort in to make sure that sample is truly representative of the population you are trying to generalize about. Otherwise, erroneous information is presented as truth.

Why does this matter?

It matters because generalization is a widespread human bias, which means a lot of our understanding of the world actually is based on extrapolations made from relatively small sample sizes. Consequently, our individual behavior is shaped by potentially incomplete or inadequate facts that we use to make the decisions that are meant to lead us to success. This bias also shapes a fair degree of public policy and government legislation. We don’t want people who make decisions that affect millions to be dependent on captivating bullshit. (A further concern is that once you are invested, other biases kick in).

Some really smart people are perpetual victims of the problem.

Joseph Henrich, Steven J. Heine, and Ara Norenzayan wrote an article called “The weirdest people in the world?” It’s about how many scientific psychology studies use college students who are predominantly Western, Educated, Industrialized, Rich, and Democratic (WEIRD), and then draw conclusions about the entire human race from these outliers. They reviewed scientific literature from domains such as “visual perception, fairness, cooperation, spatial reasoning, categorization and inferential induction, moral reasoning, and the heritability of IQ. The findings suggest that members of WEIRD societies, including young children, are among the least representative populations one could find for generalizing about humans.”

Uh-oh. This is a double whammy. “It’s not merely that researchers frequently make generalizations from a narrow subpopulation. The concern is that this particular subpopulation is highly unrepresentative of the species.”

This is why it can be dangerous to make major life decisions based on small samples, like anecdotes or a one-off experience. The small sample may be an outlier in the greater range of possibilities. You could be correcting for a problem that doesn’t exist or investing in an opportunity that isn’t there.

This tendency of mistaken extrapolation from small samples can have profound consequences.

Are you a fan of the San Francisco 49ers? They exist, in part, because of our tendency to over-generalize. In the 19th century in Western America and Canada, a few findings of gold along some creek beds led to a massive rush as entire populations flocked to these regions in the hope of getting rich. San Francisco grew from 200 residents in 1846 to about 36,000 only six years later. The gold rush provided enormous impetus toward California becoming a state, and the corresponding infrastructure developments touched off momentum that long outlasted the mining of gold.

But for most of the actual rushers, those hoping for gold based on the anecdotes that floated east, there wasn’t much to show for their decision to head west. The Canadian Encyclopedia states, “If the nearly 29 million (figure unadjusted) in gold that was recovered during the heady years of 1897 to 1899 [in the Klondike] was divided equally among all those who participated in the gold rush, the amount would fall far short of the total they had invested in time and money.”

How did this happen? Because those miners took anecdotes as being representative of a broader reality. Quite literally, they learned mining from rumor, and didn’t develop any real knowledge. Most people fought for claims along the creeks, where easy gold had been discovered, while rejecting the bench claims on the hillsides above, which often had just as much gold.

You may be thinking that these men must have been desperate if they packed themselves up, heading into unknown territory, facing multiple dangers along the way, to chase a dream of easy money. But most of us aren’t that different. How many times have you invested in a “hot stock” on a tip from one person, only to have the company go under within a year? Ultimately, the smaller the sample size, the greater role the factors of chance play in determining an outcome.

If you want to limit the capriciousness of chance in your quest for success, increase your sample size when making decisions. You need enough information to be able to plot the range of possibilities, identify the outliers, and define the average.

So next time you hear the words “the polls say,” “studies show,” or “you should buy this,” ask questions before you take action. Think about the population that is actually being represented before you start modifying your understanding. Accept the limits of small sample sizes from large populations. And don’t give power to anecdotes.

Promoting People In Organizations

In their 1978 paper Performance Sampling in Social Matches, researchers March and March discussed the implications of performance sampling for understanding careers in organizations. They came to some interesting conclusions with implications for those of us working in organizations.

Considerable evidence exists documenting that individuals confronted with problems requiring the estimation of proportions act as though sample size were substantially irrelevant to the reliability of their estimates. We do this in hiring all the time. Yet we know that sample size matters.

On how this cognitive bias affects hiring, March and March offer some good insights including the false record effect, the hero effect, the disappointment affect.

False Record Effect

A group of managers of identical (moderate) ability will show considerable variation in their performance records in the short run. Some will be found at one end of the distribution and will be viewed as outstanding; others will be at the other end and will be viewed as ineffective. The longer a manager stays in a job, the less the probable difference between the observed record of performance and actual ability. Time on the job increased the expected sample of observations, reduced expected sampling error, and thus reduced the chance that the manager (of moderate ability) will either be promoted or exit.

Hero Effect

Within a group of managers of varying abilities, the faster the rate of promotion, the less likely it is to be justified. Performance records are produced by a combination of underlying ability and sampling variation. Managers who have good records are more likely to have high ability than managers who have poor records, but the reliability of the differentiation is small when records are short.

Disappointment Effect

On the average, new managers will be a disappointment. The performance records by which managers are evaluated are subject to sampling error. Since a manager is promoted to a new job on the basis of a good previous record, the proportion of promoted managers whose past records are better than their abilities will be greater than the proportion whose past records are poorer. As a result, on the average, managers will do less well in their new jobs than they did in their old ones, and observers will come to believe that higher level jobs are more difficult than lower level ones, even if they are not.

…The present results reinforce the idea that indistinguishability among managers is a joint property of the individuals being evaluated and the process by which they are evaluated. Performance sampling models show how careers may be the consequences of erroneous interpretations of variations in performance produced by equivalent managers. But they also indicate that the same pattern of careers could be the consequence of unreliable evaluation of managers who do, in fact, differ, or of managers who do, in fact, learn over the course of their experience.

But hold on a second before you stop promoting new managers (who, by definition, have a limited sample size).

I’m not sure that sample size alone is the right way to think about this.

Consider two people: Manager A and Manager B who are up for promotion. Manager A has 10 years of experience and is an “all-star” (that is great performance with little variation in observations). Manager B, on the other hand, has only 5 years of experience but has shown a lot of variance in performance.

If you had to hire someone you’d likely pick A. But it’s important not to misinterpret the results of March and March and dig a little deeper.

What if we add one more variable to our two managers.

Manager A’s job has been “easy” whereas Manager B took a very “tough” assignment.

With this in mind, it seems reasonable to conclude that Manager B’s variance in performance could be explained by the difficulty of their task. This could also explain the lack of variance in Manager A’s performance.

Some jobs are tougher than others.

If you don’t factor in degree-of-difficulty you’re missing something big and sending a message to your workforce that discourages people from taking difficult assignments.

The importance of measuring performance over a meaningful sample size is the key to distinguishing between luck and skill. When in doubt go with the person that’s excelled with more variance in difficulty.

Predicting the Improbable

One natural human bias is that we tend to draw strong conclusions based on few observations. This bias, misconceptions of chance, shows itself in many ways including the gambler and hot hand fallacies. Such biases may induce public opinion and the media to call for dramatic swings in policies or regulation in response to highly improbable events. These biases are made even worse by our natural tendency to “do something.”

***

An event like an earthquake happens, making it more available in our mind.

We think the event is more probable than evidence would support so we run out and buy earthquake insurance. Over many years as the earthquake fades from our mind (making it less available) we believe, paradoxically, that the risk is lower (based on recent evidence) so we cancel our policy. …

Some events are hard to predict. This becomes even more complicated when you consider not only predicting the event but the timing of the event as well. This article below points out that experts, like the rest of us, base their predictions on inference from observing the past and are just as prone to biases as the rest of us.

Why do people over infer from recent events?

There are two plausible but apparently contradicting intuitions about how people over-infer from observing recent events.

The gambler’s fallacy claims that people expect rapid reversion to the mean.

For example, upon observing three outcomes of red in roulette, gamblers tend to think that black is now due and tend to bet more on black (Croson and Sundali 2005).

The hot hand fallacy claims that upon observing an unusual streak of events, people tend to predict that the streak will continue. (See Misconceptions of Chance)

The hot hand fallacy term originates from basketball where players who scored several times in a row are believed to have a “hot hand”, i.e. are more likely to score at their next attempt.

Recent behavioural theory has proposed a foundation to reconcile the apparent contradiction between the two types of over-inference. The intuition behind the theory can be explained with reference to the example of roulette play.

A person believing in the law of small numbers thinks that small samples should look like the parent distribution, i.e. that the sample should be representative of the parent distribution. Thus, the person believes that out of, say 6, spins 3 should be red and 3 should be black (ignoring green). If observed outcomes in the small sample differ from the 50:50 ratio, immediate reversal is expected. Thus, somebody observing 2 times red in 6 consecutive spins believes that black is “due” on the 3rd spin to restore the 50:50 ratio.

Now suppose such person is uncertain about the fairness of the roulette wheel. Upon observing an improbable event (6 times red in 6 spins, say), the person starts to doubt about the fairness of the roulette wheel because a long streak does not correspond to what he believes a random sequence should look like. The person then revises his model of the data generating process and starts to believe the event on streak is more likely. The upshot of the theory is that the same person may at first (when the streak is short) believe in reversion of the trend (the gambler’s fallacy) and later – when the streak is long – in continuation of the trend (the hot hand fallacy).

Continue Reading

Choice Under Uncertainty

Some of the general heuristics—rules of thumb—that people use in making judgments that produce biases towards classifying situations according to their representativeness, or toward judging frequencies according to the availability of examples in memory, or toward interpretations warped by the way in which a problem has been framed. These heuristics have important implications for individuals and society.

Insensitivity to Base Rates
When people are given information about the probabilities of certain events (e.g., how many lawyers and how many engineers are in a population that is being sampled), and then are given some additional information as to which of the events has occurred (which person has been sampled from the population), they tend to ignore the prior probabilities in favor of incomplete or even quite irrelevant information about the individual event. Thus, if they are told that 70 percent of the population are lawyers, and if they are then given a noncommittal description of a person (one that could equally well fit a lawyer or an engineer), half the time they will predict that the person is a lawyer and half the time that he is an engineer–even though the laws of probability dictate that the best forecast is always to predict that the person is a lawyer.

Insensitivity to Sample Size
People commonly misjudge probabilities in many other ways. Asked to estimate the probability that 60 percent or more of the babies born in a hospital during a given week are male, they ignore information about the total number of births, although it is evident that the probability of a departure of this magnitude from the expected value of 50 percent is smaller if the total number of births is larger (the standard error of a percentage varies inversely with the square root of the population size).

Availability
There are situations in which people assess the frequency of a class by the ease with which instances can be brought to mind. In one experiment, subjects heard a list of names of persons of both sexes and were later asked to judge whether there were more names of men or women on the list. In lists presented to some subjects, the men were more famous than the women; in other lists, the women were more famous than the men. For all lists, subjects judged that the sex that had the more famous personalities was the more numerous.

Framing and Loss Aversion
The way in which an uncertain possibility is presented may have a substantial effect on how people respond to it. When asked whether they would choose surgery in a hypothetical medical emergency, many more people said that they would when the chance of survival was given as 80 percent than when the chance of death was given as 20 percent.

Source: Decision Making and Problem Solving, Herbert A. Simon