Category: Mental Models

Poker, Speeding Tickets, and Expected Value: Making Decisions in an Uncertain World

“Take the probability of loss times the amount of possible loss from the probability of gain times the amount of possible gain. That is what we’re trying to do. It’s imperfect but that’s what it’s all about.”

— Warren Buffett

You can train your brain to think like CEOs, professional poker players, investors, and others who make tricky decisions in an uncertain world by weighing probabilities.

All decisions involve potential tradeoffs and opportunity costs. The question is, how can we make the best possible choices when the factors involved are often so complicated and confusing? How can we determine which statistics and metrics are worth paying attention to? How do we think about averages?

Expected value is one of the simplest tools you can use to think better. While not a natural way of thinking for most people, it instantly turns the world into shades of grey by forcing us to weigh probabilities and outcomes. Once we’ve mastered it, our decisions become supercharged. We know which risks to take, when to quit projects, when to go all in, and more.

Expected value refers to the long-run average of a random variable.

If you flip a fair coin ten times, the heads-to-tails ratio will probably not be exactly equal. If you flip it one hundred times, the ratio will be closer to 50:50, though again not exactly. But for a very large number of iterations, you can expect heads to come up half the time and tails the other half. The law of large numbers dictates that the values will, in the long term, regress to the mean, even if the first few flips seem unequal.

The more coin flips, the closer you get to the 50:50 ratio. If you bet a sum of money on a coin flip, the potential winnings on a fair coin have to be bigger than your potential loss to make the expected value positive.

We make many expected-value calculations without even realizing it. If we decide to stay up late and have a few drinks on a Tuesday, we regard the expected value of an enjoyable evening as higher than the expected costs the following day. If we decide to always leave early for appointments, we weigh the expected value of being on time against the frequent instances when we arrive early. When we take on work, we view the expected value in terms of income and other career benefits as higher than the cost in terms of time and/or sanity.

Likewise, anyone who reads a lot knows that most books they choose will have minimal impact on them, while a few books will change their lives and be of tremendous value. Looking at the required time and money as an investment, books have a positive expected value (provided we choose them with care and make use of the lessons they teach).

These decisions might seem obvious. But the math behind them would be somewhat complicated if we tried to sit down and calculate it. Who pulls out a calculator before deciding whether to open a bottle of wine (certainly not me) or walk into a bookstore?

The factors involved are impossible to quantify in a non-subjective manner – like trying to explain how to catch a baseball. We just have a feel for them. This expected-value analysis is unconscious – something to consider if you have ever labeled yourself as “bad at math.”

Parking Tickets

Another example of expected value is parking tickets. Let’s say that a parking spot costs \$5 and the fine for not paying is \$10. If you can expect to be caught one-third of the time, why pay for parking? The expected value of doing so is negative. It’s a disincentive. You can park without paying three times and pay only \$10 in fines, instead of paying \$15 for three parking spots. But if the fine is \$100, the probability of getting caught would have to be higher than one in twenty for it to be worthwhile. This is why fines tend to seem excessive. They cover the people who are not caught while giving an incentive for everyone to pay.

Consider speeding tickets. Here, the expected value can be more abstract, encompassing different factors. If speeding on the way to work saves 15 minutes, then a monthly \$100 fine might seem worthwhile to some people. For most of us, though, a weekly fine would mean that speeding has a negative expected value. Add in other disincentives (such as the loss of your driver’s license), and speeding is not worth it. So the calculation is not just financial; it takes into account other tradeoffs as well.

The same goes for free samples and trial periods on subscription services. Many companies (such as Graze, Blue Apron, and Amazon Prime) offer generous free trials. How can they afford to do this? Again, it comes down to expected value. The companies know how much the free trials cost them. They also know the probability of someone’s paying afterwards and the lifetime value of a customer. Basic math reveals why free trials are profitable. Say that a free trial costs the company \$10 per person, and one in ten people then sign up for the paid service, going on to generate \$150 in profits. The expected value is positive. If only one in twenty people sign up, the company needs to find a cheaper free trial or scrap it.

Similarly, expected value applies to services that offer a free “lite” version (such as Buffer and Spotify). Doing so costs them a small amount or even nothing. Yet it increases the chance of someone’s deciding to pay for the premium version. For the expected value to be positive, the combined cost of the people who never upgrade needs to be lower than the profit from the people who do pay.

Lottery tickets prove useless when viewed through the lens of expected value. If a ticket costs \$1 and there is a possibility of winning \$500,000, it might seem as if the expected value of the ticket is positive. But it is almost always negative. If one million people purchase a ticket, the expected value is \$0.50. That difference is the profit that lottery companies make. Only on sporadic occasions is the expected value positive, even though the probability of winning remains minuscule.

Failing to understand expected value is a common logical fallacy. Getting a grasp of it can help us to overcome many limitations and cognitive biases.

“Constantly thinking in expected value terms requires discipline and is somewhat unnatural. But the leading thinkers and practitioners from somewhat varied fields have converged on the same formula: focus not on the frequency of correctness, but on the magnitude of correctness.”

— Michael Mauboussin

Expected Value and Poker

Let’s look at poker. How do professional poker players manage to win large sums of money and hold impressive track records? Well, we can be certain that the answer isn’t all luck, although there is some of that involved.

Professional players rely on mathematical mental models that create order among random variables. Although these models are basic, it takes extensive experience to create the fingerspitzengefühl (“fingertips feeling,” or instinct) necessary to use them.

A player needs to make correct calculations every minute of a game with an automaton-like mindset. Emotions and distractions can corrupt the accuracy of the raw math.

In a game of poker, the expected value is the average return on each dollar invested in the pot. Each time a player makes a bet or call, they are taking into account the probability of making more money than they invest. If a player is risking \$100, with a 1 in 5 probability of success, the pot must contain at least \$500 for the bet to be safe. The expected value per call is at least equal to the amount the player stands to lose. If the pot contains \$300 and the probability is 1 in 5, the expected value is negative. The idea is that even if this tactic is unsuccessful at times, in the long run, the player will profit.

Expected-value analysis gives players a clear idea of probabilistic payoffs. Successful poker players can win millions one week, then make nothing or lose money the next, depending on the probability of winning. Even the best possible hands can lose due to simple probability. With each move, players also need to use Bayesian updating to adapt their calculations. because sticking with a prior figure could prove disastrous. Casinos make their fortunes from people who bet on situations with a negative expected value.

Expected Value and the Ludic Fallacy

In The Black Swan, Nassim Taleb explains the difference between everyday randomness and randomness in the context of a game or casino. Taleb coined the term “ludic fallacy” to refer to “the misuse of games to model real-life situations.” (Or, as the website logicallyfallacious.com puts it: the assumption that flawless statistical models apply to situations where they don’t actually apply.)

In Taleb’s words, gambling is “sterilized and domesticated uncertainty. In the casino, you know the rules, you can calculate the odds… ‘The casino is the only human venture I know where the probabilities are known, Gaussian (i.e., bell-curve), and almost computable.’ You cannot expect the casino to pay out a million times your bet, or to change the rules abruptly during the game….”

Games like poker have a defined, calculable expected value. That’s because we know the outcomes, the cards, and the math. Most decisions are more complicated. If you decide to bet \$100 that it will rain tomorrow, the expected value of the wager is incalculable. The factors involved are too numerous and complex to compute. Relevant factors do exist; you are more likely to win the bet if you live in England than if you live in the Sahara, for example. But that doesn’t rule out Black Swan events, nor does it give you the neat probabilities which exist in games. In short, there is a key distinction between Knightian risks, which are computable because we have enough information to calculate the odds, and Knightian uncertainty, which is non-computable because we don’t have enough information to calculate odds accurately. (This distinction between risk and uncertainty is based on the writings of economist Frank Knight.) Poker falls into the former category. Real life is in the latter. If we take the concept literally and only plan for the expected, we will run into some serious problems.

As Taleb writes in Fooled By Randomness:

Probability is not a mere computation of odds on the dice or more complicated variants; it is the acceptance of the lack of certainty in our knowledge and the development of methods for dealing with our ignorance. Outside of textbooks and casinos, probability almost never presents itself as a mathematical problem or a brain teaser. Mother nature does not tell you how many holes there are on the roulette table, nor does she deliver problems in a textbook way (in the real world one has to guess the problem more than the solution).

The Monte Carlo Fallacy

Even in the domesticated environment of a casino, probabilistic thinking can go awry if the principle of expected value is forgotten. This famously occurred in Monte Carlo Casino in 1913. A group of gamblers lost millions when the roulette table landed on black 26 times in a row. The probability of this occurring is no more or less likely than the other 67,108,863 possible permutations, but the people present kept thinking, “It has to be red next time.” They saw the likelihood of the wheel landing on red as higher each time it landed on black. In hindsight, what sense does that make? A roulette wheel does not remember the color it landed on last time. The likelihood of either outcome is exactly 50% with each spin, regardless of the previous iteration. So the potential winnings for each spin need to be at least twice the bet a player makes, or the expected value is negative.

“A lot of people start out with a 400-horsepower motor but only get 100 horsepower of output. It’s way better to have a 200-horsepower motor and get it all into output.”

— Warren Buffett

Given all the casinos and roulette tables in the world, the Monte Carlo incident had to happen at some point. Perhaps some day a roulette wheel will land on red 26 times in a row and the incident will repeat. The gamblers involved did not consider the negative expected value of each bet they made. We know this mistake as the Monte Carlo fallacy (or the “gambler’s fallacy” or “the fallacy of the maturity of chances”) – the assumption that prior independent outcomes influence future outcomes that are actually also independent. In other words, people assume that “a random process becomes less random and more predictable as it is repeated”1.

It’s a common error. People who play the lottery for years without success think that their chance of winning rises with each ticket, but the expected value is unchanged between iterations. Amos Tversky and Daniel Kahneman consider this kind of thinking a component of the representativeness heuristic, stating that the more we believe we control random events, the more likely we are to succumb to the Monte Carlo fallacy.

Magnitude over Frequency

Steven Crist, in his book Bet with the Best, offers an example of how an expected-value mindset can be applied. Consider a hypothetical race with four horses. If you’re trying to maximize return on investment, you might want to avoid the horse with a high likelihood of winning. Crist writes,

The point of this exercise is to illustrate that even a horse with a very high likelihood of winning can be either a very good or a very bad bet, and that the difference between the two is determined by only one thing: the odds.”2

Everything comes down to payoffs. A horse with a 50% chance of winning might be a good bet, but it depends on the payoff. The same holds for a 100-to-1 longshot. It’s not the frequency of winning but the magnitude of the win that matters.

Error Rates, Averages, and Variability

When Bill Gates walks into a room with 20 people, the average wealth per person in the room quickly goes beyond a billion dollars. It doesn’t matter if the 20 people are wealthy or not; Gates’s wealth is off the charts and distorts the results.

An old joke tells of the man who drowns in a river which is, on average, three feet deep. If you’re deciding to cross a river and can’t swim, the range of depths matters a heck of a lot more than the average depth.

The Use of Expected Value: How to Make Decisions in an Uncertain World

Thinking in terms of expected value requires discipline and practice. And yet, the top performers in almost any field think in terms of probabilities. While this isn’t natural for most of us, once you implement the discipline of the process, you’ll see the quality of your thinking and decisions improve.

In poker, players can predict the likelihood of a particular outcome. In the vast majority of cases, we cannot predict the future with anything approaching accuracy. So what use is the expected value outside gambling? It turns out, quite a lot. Recognizing how expected value works puts any of us at an advantage. We can mentally leap through various scenarios and understand how they affect outcomes.

Expected value takes into account wild deviations. Averages are useful, but they have limits, as the man who tried to cross the river discovered. When making predictions about the future, we need to consider the range of outcomes. The greater the possible variance from the average, the more our decisions should account for a wider range of outcomes.

There’s a saying in the design world: when you design for the average, you design for no one. Large deviations can mean more risk-which is not always a bad thing. So expected-value calculations take into account the deviations. If we can make decisions with a positive expected value and the lowest possible risk, we are open to large benefits.

Investors use expected value to make decisions. Choices with a positive expected value and minimal risk of losing money are wise. Even if some losses occur, the net gain should be positive over time. In investing, unlike in poker, the potential losses and gains cannot be calculated in exact terms. Expected-value analysis reveals opportunities that people who just use probabilistic thinking often miss. A trade with a low probability of success can still carry a high expected value. That’s why it is crucial to have a large number of robust mental models. As useful as probabilistic thinking can be, it has far more utility when combined with expected value.

Understanding expected value is also an effective way to overcome the sunk costs fallacy. Many of our decisions are based on non-recoverable past investments of time, money, or resources. These investments are irrelevant; we can’t recover them, so we shouldn’t factor them into new decisions. Sunk costs push us toward situations with a negative expected value. For example, consider a company that has invested considerable time and money in the development of a new product. As the launch date nears, they receive irrefutable evidence that the product will be a failure. Perhaps research shows that customers are disinterested, or a competitor launches a similar, better product. The sunk costs fallacy would lead them to release their product anyway. Even if they take a loss. Even if it damages their reputation. After all, why waste the money they spent developing the product? Here’s why: Because the product has a negative expected value, which will only worsen their losses. An escalation of commitment will only increase sunk costs.

When we try to justify a prior expense, calculating the expected value can prevent us from worsening the situation. The sunk costs fallacy robs us of our most precious resource: time. Each day we are faced with the choice between continuing and quitting numerous endeavors. Expected-value analysis reveals where we should continue, and where we should cut our losses and move on to a better use of time and resources. It’s an efficient way to work smarter, and not engage in unnecessary projects.

Thinking in terms of expected value will make you feel awkward when you first try it. That’s the hardest thing about it; you need to practice it a while before it becomes second nature. Once you get the hang of it, you’ll see that it’s valuable in almost every decision. That’s why the most rational people in the world constantly think about expected value. They’ve uncovered the key insight that the magnitude of correctness matters more than its frequency. And yet, human nature is such that we’re happier when we’re frequently right.

Footnotes
• 1

From https://rationalwiki.org/wiki/Gambler’s_fallacy, accessed on 11 January 2018.

• 2

Steven Crist, “Crist on Value,” in Andrew Beyer et al., Bet with the Best: All New Strategies From America’s Leading Handicappers (New York: Daily Racing Form Press, 2001), 63-64.

Complexity Bias: Why We Prefer Complicated to Simple

Complexity bias is a logical fallacy that leads us to give undue credence to complex concepts.

Faced with two competing hypotheses, we are likely to choose the most complex one. That’s usually the option with the most assumptions and regressions. As a result, when we need to solve a problem, we may ignore simple solutions — thinking “that will never work” — and instead favor complex ones.

To understand complexity bias, we need first to establish the meaning of three key terms associated with it: complexity, simplicity, and chaos.

Complexity, like pornography, is hard to define when we’re put on the spot, although most of us recognize it when we see it. The Cambridge Dictionary defines complexity as “the state of having many parts and being difficult to understand or find an answer to.” The definition of simplicity is the inverse: “something [that] is easy to understand or do.” Chaos is defined as “a state of total confusion with no order.”

“Life is really simple, but we insist on making it complicated.”

— Confucius

Complex systems contain individual parts that combine to form a collective that often can’t be predicted from its components. Consider humans. We are complex systems. We’re made of about 100 trillion cells and yet we are so much more than the aggregation of our cells. You’d never predict what we’re like or who we are from looking at our cells.

Complexity bias is our tendency to look at something that is easy to understand, or look at it when we are in a state of confusion, and view it as having many parts that are difficult to understand.

We often find it easier to face a complex problem than a simple one.

A person who feels tired all the time might insist that their doctor check their iron levels while ignoring the fact that they are unambiguously sleep deprived. Someone experiencing financial difficulties may stress over the technicalities of their telephone bill while ignoring the large sums of money they spend on cocktails.

Marketers make frequent use of complexity bias.

They do this by incorporating confusing language or insignificant details into product packaging or sales copy. Most people who buy “ammonia-free” hair dye, or a face cream which “contains peptides,” don’t fully understand the claims. Terms like these often mean very little, but we see them and imagine that they signify a product that’s superior to alternatives.

How many of you know what probiotics really are and how they interact with gut flora?

Meanwhile, we may also see complexity where only chaos exists. This tendency manifests in many forms, such as conspiracy theories, superstition, folklore, and logical fallacies. The distinction between complexity and chaos is not a semantic one. When we imagine that something chaotic is in fact complex, we are seeing it as having an order and more predictability than is warranted. In fact, there is no real order, and prediction is incredibly difficult at best.

Complexity bias is interesting because the majority of cognitive biases occur in order to save mental energy. For example, confirmation bias enables us to avoid the effort associated with updating our beliefs. We stick to our existing opinions and ignore information that contradicts them. Availability bias is a means of avoiding the effort of considering everything we know about a topic. It may seem like the opposite is true, but complexity bias is, in fact, another cognitive shortcut. By opting for impenetrable solutions, we sidestep the need to understand. Of the fight-or-flight responses, complexity bias is the flight response. It is a means of turning away from a problem or concept and labeling it as too confusing. If you think something is harder than it is, you surrender your responsibility to understand it.

“Most geniuses—especially those who lead others—prosper not by deconstructing intricate complexities but by exploiting unrecognized simplicities.”

— Andy Benoit

Faced with too much information on a particular topic or task, we see it as more complex than it is. Often, understanding the fundamentals will get us most of the way there. Software developers often find that 90% of the code for a project takes about half the allocated time. The remaining 10% takes the other half. Writing — and any other sort of creative work — is much the same. When we succumb to complexity bias, we are focusing too hard on the tricky 10% and ignoring the easy 90%.

Research has revealed our inherent bias towards complexity.

In a 1989 paper entitled “Sensible reasoning in two tasks: Rule discovery and hypothesis evaluation,” Hilary F. Farris and Russell Revlin evaluated the topic. In one study, participants were asked to establish an arithmetic rule. They received a set of three numbers (such as 2, 4, 6) and tried to generate a hypothesis by asking the experimenter if other number sequences conformed to the rule. Farris and Revlin wrote, “This task is analogous to one faced by scientists, with the seed triple functioning as an initiating observation, and the act of generating the triple is equivalent to performing an experiment.”

The actual rule was simple: list any three ascending numbers.

The participants could have said anything from “1, 2, 3” to “3, 7, 99” and been correct. It should have been easy for the participants to guess this, but most of them didn’t. Instead, they came up with complex rules for the sequences. (Also see Falsification of Your Best Loved Ideas.)

A paper by Helena Matute looked at how intermittent reinforcement leads people to see complexity in chaos. Three groups of participants were placed in rooms and told that a loud noise would play from time to time. The volume, length, and pattern of the sound were identical for each group. Group 1 (Control) was told to sit and listen to the noises. Group 2 (Escape) was told that there was a specific action they could take to stop the noises. Group 3 (Yoked) was told the same as Group 2, but in their case, there was actually nothing they could do.

Matute wrote:

Yoked participants received the same pattern and duration of tones that had been produced by their counterparts in the Escape group. The amount of noise received by Yoked and Control subjects depends only on the ability of the Escape subjects to terminate the tones. The critical factor is that Yoked subjects do not have control over reinforcement (noise termination) whereas Escape subjects do, and Control subjects are presumably not affected by this variable.

The result? Not one member of the Yoked group realized that they had no control over the sounds. Many members came to repeat particular patterns of “superstitious” behavior. Indeed, the Yoked and Escape groups had very similar perceptions of task controllability. Faced with randomness, the participants saw complexity.

Does that mean the participants were stupid? Not at all. We all exhibit the same superstitious behavior when we believe we can influence chaotic or simple systems.

Funnily enough, animal studies have revealed much the same. In particular, consider B.F. Skinner’s well-known research on the effects of random rewards on pigeons. Skinner placed hungry pigeons in cages equipped with a random-food-delivery mechanism. Over time, the pigeons came to believe that their behavior affected the food delivery. Skinner described this as a form of superstition. One bird spun in counterclockwise circles. Another butted its head against a corner of the cage. Other birds swung or bobbed their heads in specific ways. Although there is some debate as to whether “superstition” is an appropriate term to apply to birds, Skinner’s research shed light on the human tendency to see things as being more complex than they actually are.

Skinner wrote (in “‘Superstition’ in the Pigeon,” Journal of Experimental Psychology, 38):

The bird behaves as if there were a causal relation between its behavior and the presentation of food, although such a relation is lacking. There are many analogies in human behavior. Rituals for changing one’s fortune at cards are good examples. A few accidental connections between a ritual and favorable consequences suffice to set up and maintain the behavior in spite of many unreinforced instances. The bowler who has released a ball down the alley but continues to behave as if he were controlling it by twisting and turning his arm and shoulder is another case in point. These behaviors have, of course, no real effect upon one’s luck or upon a ball half way down an alley, just as in the present case the food would appear as often if the pigeon did nothing—or, more strictly speaking, did something else.

The world around us is a chaotic, entropic place. But it is rare for us to see it that way.

In Living with Complexity, Donald A. Norman offers a perspective on why we need complexity:

We seek rich, satisfying lives, and richness goes along with complexity. Our favorite songs, stories, games, and books are rich, satisfying, and complex. We need complexity even while we crave simplicity… Some complexity is desirable. When things are too simple, they are also viewed as dull and uneventful. Psychologists have demonstrated that people prefer a middle level of complexity: too simple and we are bored, too complex and we are confused. Moreover, the ideal level of complexity is a moving target, because the more expert we become at any subject, the more complexity we prefer. This holds true whether the subject is music or art, detective stories or historical novels, hobbies or movies.

As an example, Norman asks readers to contemplate the complexity we attach to tea and coffee. Most people in most cultures drink tea or coffee each day. Both are simple beverages, made from water and coffee beans or tea leaves. Yet we choose to attach complex rituals to them. Even those of us who would not consider ourselves to be connoisseurs have preferences. Offer to make coffee for a room full of people, and we can be sure that each person will want it made in a different way.

Coffee and tea start off as simple beans or leaves, which must be dried or roasted, ground and infused with water to produce the end result. In principle, it should be easy to make a cup of coffee or tea. Simply let the ground beans or tea leaves [steep] in hot water for a while, then separate the grounds and tea leaves from the brew and drink. But to the coffee or tea connoisseur, the quest for the perfect taste is long-standing. What beans? What tea leaves? What temperature water and for how long? And what is the proper ratio of water to leaves or coffee?

The quest for the perfect coffee or tea maker has been around as long as the drinks themselves. Tea ceremonies are particularly complex, sometimes requiring years of study to master the intricacies. For both tea and coffee, there has been a continuing battle between those who seek convenience and those who seek perfection.

Complexity, in this way, can enhance our enjoyment of a cup of tea or coffee. It’s one thing to throw some instant coffee in hot water. It’s different to select the perfect beans, grind them ourselves, calculate how much water is required, and use a fancy device. The question of whether this ritual makes the coffee taste better or not is irrelevant. The point is the elaborate surrounding ritual. Once again, we see complexity as superior.

“Simplicity is a great virtue but it requires hard work to achieve it and education to appreciate it. And to make matters worse: complexity sells better.”

— Edsger W. Dijkstra

The Problem with Complexity

Imagine a person who sits down one day and plans an elaborate morning routine. Motivated by the routines of famous writers they have read about, they lay out their ideal morning. They decide they will wake up at 5 a.m., meditate for 15 minutes, drink a liter of lemon water while writing in a journal, read 50 pages, and then prepare coffee before planning the rest of their day.

The next day, they launch into this complex routine. They try to keep at it for a while. Maybe they succeed at first, but entropy soon sets in and the routine gets derailed. Sometimes they wake up late and do not have time to read. Their perceived ideal routine has many different moving parts. Their actual behavior ends up being different each day, depending on random factors.

Now imagine that this person is actually a famous writer. A film crew asks to follow them around on a “typical day.” On the day of filming, they get up at 7 a.m., write some ideas, make coffee, cook eggs, read a few news articles, and so on. This is not really a routine; it is just a chaotic morning based on reactive behavior. When the film is posted online, people look at the morning and imagine they are seeing a well-planned routine rather than the randomness of life.

This hypothetical scenario illustrates the issue with complexity: it is unsustainable without effort.

The more individual constituent parts a system has, the greater the chance of its breaking down. Charlie Munger once said that “Where you have complexity, by nature you can have fraud and mistakes.” Any complex system — be it a morning routine, a business, or a military campaign — is difficult to manage. Addressing one of the constituent parts inevitably affects another (see the Butterfly Effect). Unintended and unexpected consequences are likely to occur.

As Daniel Kahneman and Amos Tversky wrote in 1974 (in Judgment Under Uncertainty: Heuristics and Biases): “A complex system, such as a nuclear reactor or the human body, will malfunction if any of its essential components fails. Even when the likelihood of failure in each component is slight, the probability of an overall failure can be high if many components are involved.”

This is why complexity is less common than we think. It is unsustainable without constant maintenance, self-organization, or adaptation. Chaos tends to disguise itself as complexity.

“Human beings are pattern-seeking animals. It’s part of our DNA. That’s why conspiracy theories and gods are so popular: we always look for the wider, bigger explanations for things.”

— Adrian McKinty, The Cold Cold Ground

Complexity Bias and Conspiracy Theories

A musician walks barefoot across a zebra-crossing on an album cover. People decide he died in a car crash and was replaced by a lookalike. A politician’s eyes look a bit odd in a blurry photograph. People conclude that he is a blood-sucking reptilian alien taking on a human form. A photograph shows an indistinct shape beneath the water of a Scottish lake. The area floods with tourists hoping to glimpse a surviving prehistoric creature. A new technology overwhelms people. So, they deduce that it is the product of a government mind-control program.

Conspiracy theories are the ultimate symptom of our desire to find complexity in the world. We don’t want to acknowledge that the world is entropic. Disasters happen and chaos is our natural state. The idea that hidden forces animate our lives is an appealing one. It seems rational. But as we know, we are all much less rational and logical than we think. Studies have shown that a high percentage of people believe in some sort of conspiracy. It’s not a fringe concept. According to research by Joseph E. Uscinski and Joseph M. Parent, about one-third of Americans believe the notion that Barack Obama’s birth certificate is fake. Similar numbers are convinced that 9/11 was an inside job orchestrated by George Bush. Beliefs such as these are present in all types of people, regardless of class, age, gender, race, socioeconomic status, occupation, or education level.

Conspiracy theories are invariably far more complex than reality. Although education does reduce the chances of someone’s believing in conspiracy theories, one in five Americans with postgraduate degrees still hold conspiratorial beliefs.

Uscinski and Parent found that, just as uncertainty led Skinner’s pigeons to see complexity where only randomness existed, a sense of losing control over the world around us increases the likelihood of our believing in conspiracy theories. Faced with natural disasters and political or economic instability, we are more likely to concoct elaborate explanations. In the face of horrific but chaotic events such as Hurricane Katrina, or the recent Grenfell Tower fire, many people decide that secret institutions are to blame.

Take the example of the “Paul McCartney is dead” conspiracy theory. Since the 1960s, a substantial number of people have believed that McCartney died in a car crash and was replaced by a lookalike, usually said to be a Scottish man named William Campbell. Of course, conspiracy theorists declare, The Beatles wanted their most loyal fans to know this, so they hid clues in songs and on album covers.

The beliefs surrounding the Abbey Road album are particularly illustrative of the desire to spot complexity in randomness and chaos. A police car is parked in the background — an homage to the officers who helped cover up the crash. A car’s license plate reads “LMW 28IF” — naturally, a reference to McCartney being 28 if he had lived (although he was 27) and to Linda McCartney (whom he had not met yet). Matters were further complicated once The Beatles heard about the theory and began to intentionally plant “clues” in their music. The song “I’m So Tired” does in fact feature backwards mumbling about McCartney’s supposed death. The 1960s were certainly a turbulent time, so is it any wonder that scores of people pored over album art or played records backwards, looking for evidence of a complex hidden conspiracy?

As Henry Louis Gates Jr. wrote, “Conspiracy theories are an irresistible labor-saving device in the face of complexity.”

Complexity Bias and Language

We have all, at some point, had a conversation with someone who speaks like philosopher Theodor Adorno wrote: using incessant jargon and technical terms even when simpler synonyms exist and would be perfectly appropriate. We have all heard people say things which we do not understand, but which we do not question for fear of sounding stupid.

Jargon is an example of how complexity bias affects our communication and language usage. When we use jargon, especially out of context, we are putting up unnecessary semantic barriers that reduce the chances of someone’s challenging or refuting us.

In an article for The Guardian, James Gingell describes his work translating scientific jargon into plain, understandable English:

It’s quite simple really. The first step is getting rid of the technical language. Whenever I start work on refining a rough-hewn chunk of raw science into something more pleasant I use David Dobbs’ (rather violent) aphorism as a guiding principle: “Hunt down jargon like a mercenary possessed, and kill it.” I eviscerate acronyms and euthanise decrepit Latin and Greek. I expunge the esoteric. I trim and clip and pare and hack and burn until only the barest, most easily understood elements remain.

[…]

Jargon…can be useful for people as a shortcut to communicating complex concepts. But it’s intrinsically limited: it only works when all parties involved know the code. That may be an obvious point but it’s worth emphasising — to communicate an idea to a broad, non-specialist audience, it doesn’t matter how good you are at embroidering your prose with evocative imagery and clever analogies, the jargon simply must go.”

Gingell writes that even the most intelligent scientists struggle to differentiate between thinking (and speaking and writing) like a scientist, and thinking like a person with minimal scientific knowledge.

Unnecessarily complex language is not just annoying. It’s outright harmful. The use of jargon in areas such as politics and economics does real harm. People without the requisite knowledge to understand it feel alienated and removed from important conversations. It leads people to believe that they are not intelligent enough to understand politics, or not educated enough to comprehend economics. When a politician talks of fiscal charters or rolling four-quarter growth measurements in a public statement, they are sending a crystal clear message to large numbers of people whose lives will be shaped by their decisions: this is not about you.

Complexity bias is a serious issue in politics. For those in the public eye, complex language can be a means of minimizing the criticism of their actions. After all, it is hard to dispute something you don’t really understand. Gingell considers jargon to be a threat to democracy:

If we can’t fully comprehend the decisions that are made for us and about us by the government then how we can we possibly revolt or react in an effective way? Yes, we have a responsibility to educate ourselves more on the big issues, but I also think it’s important that politicians and journalists meet us halfway.

[…]

Economics and economic decisions are more important than ever now, too. So we should implore our journalists and politicians to write and speak to us plainly. Our democracy depends on it.

In his essay “Politics and the English Language,” George Orwell wrote:

In our time, political speech and writing are largely the defence of the indefensible. … Thus, political language has to consist largely of euphemism, question-begging and sheer cloudy vagueness. Defenceless villages are bombarded from the air, the inhabitants driven out into the countryside, the cattle machine-gunned, the huts set on fire with incendiary bullets: this is called pacification. Millions of peasants are robbed of their farms and sent trudging along the roads with no more than they can carry: this is called transfer of population or rectification of frontiers. People are imprisoned for years without trial, or shot in the back of the neck or sent to die of scurvy in Arctic lumber camps: this is called elimination of unreliable elements.

An example of the problems with jargon is the Sokal affair. In 1996, Alan Sokal (a physics professor) submitted a fabricated scientific paper entitled “Transgressing the Boundaries: Towards a Transformative Hermeneutics of Quantum Gravity.” The paper had absolutely no relation to reality and argued that quantum gravity is a social and linguistic construct. Even so, the paper was published in a respected journal. Sokal’s paper consisted of convoluted, essentially meaningless claims, such as this paragraph:

Secondly, the postmodern sciences deconstruct and transcend the Cartesian metaphysical distinctions between humankind and Nature, observer and observed, Subject and Object. Already quantum mechanics, earlier in this century, shattered the ingenious Newtonian faith in an objective, pre-linguistic world of material objects “out there”; no longer could we ask, as Heisenberg put it, whether “particles exist in space and time objectively.”

(If you’re wondering why no one called him out, or more specifically why we have a bias to not call BS out, check out pluralistic ignorance).

Jargon does have its place. In specific contexts, it is absolutely vital. But in everyday communication, its use is a sign that we wish to appear complex and therefore more intelligent. Great thinkers throughout the ages have stressed the crucial importance of using simple language to convey complex ideas. Many of the ancient thinkers whose work we still reference today — people like Plato, Marcus Aurelius, Seneca, and Buddha — were known for their straightforward communication and their ability to convey great wisdom in a few words.

“Any intelligent fool can make things bigger, more complex, and more violent. It takes a touch of genius — and a lot of courage — to move in the opposite direction.”

— Ernst F. Schumacher

How Can We Overcome Complexity Bias?

The most effective tool we have for overcoming complexity bias is Occam’s razor. Also known as the principle of parsimony, this is a problem-solving principle used to eliminate improbable options in a given situation. Occam’s razor suggests that the simplest solution or explanation is usually the correct one. When we don’t have enough empirical evidence to disprove a hypothesis, we should avoid making unfounded assumptions or adding unnecessary complexity so we can make quick decisions or establish truths.

An important point to note is that Occam’s razor does not state that the simplest hypothesis is the correct one, but states rather that it is the best option before the establishment of empirical evidence. It is also useful in situations where empirical data is difficult or impossible to collect. While complexity bias leads us towards intricate explanations and concepts, Occam’s razor can help us to trim away assumptions and look for foundational concepts.

Returning to Skinner’s pigeons, had they known of Occam’s razor, they would have realized that there were two main possibilities:

• Their behavior affects the food delivery.

Or:

• Their behavior is irrelevant because the food delivery is random or on a timed schedule.

Using Occam’s razor, the head-bobbing, circles-turning pigeons would have realized that the first hypothesis involves numerous assumptions, including:

• There is a particular behavior they must enact to receive food.
• The delivery mechanism can somehow sense when they enact this behavior.
• The required behavior is different from behaviors that would normally give them access to food.
• The delivery mechanism is consistent.

And so on. Occam’s razor would dictate that because the second hypothesis is the simplest, involving the fewest assumptions, it is most likely the correct one.

So many geniuses, are really good at eliminating unnecessary complexity. Einstein, for instance, was a master at sifting the essential from the non-essential. Steve Jobs was the same.

Power Laws: How Nonlinear Relationships Amplify Results

“The greatest shortcoming of the human race is our inability to understand the exponential function.”

— Albert Allen Bartlett

Defining A Power Law

Consider a person who begins weightlifting for the first time.

During their initial sessions, they can lift only a small amount of weight. But as they invest more time, they find that for each training session, their strength increases a surprising amount.

For a while, they make huge improvements. Eventually, however, their progress slows down. At first, they could increase their strength by as much as 10% per session; now it takes months to improve by even 1%. Perhaps they resort to taking performance-enhancing drugs or training more often. Their motivation is sapped and they find themselves getting injured, without any real change in the amount of weight they can lift.

Now, let’s imagine that our frustrated weightlifter decides to take up running instead. Something similar happens. While the first few runs are incredibly difficult, the person’s endurance increases rapidly with the passing of each week, until it levels off and diminishing returns set in again.

Both of these situations are examples of power laws — a relationship between two things in which a change in one thing can lead to a large change in the other, regardless of the initial quantities. In both of our examples, a small investment of time in the beginning of the endeavor leads to a large increase in performance.

Power laws are interesting because they reveal surprising correlations between disparate factors. As a mental model, power laws are versatile, with numerous applications in different fields of knowledge.

If parts of this post look intimidating to non-mathematicians, bear with us. Understanding the math behind power laws is worthwhile in order to grasp their many applications. Invest a little time in reading this and reap the value — which is in itself an example of a power law!

A power law is often represented by an equation with an exponent:

Y=MX^B

Each letter represents a number. Y is a function (the result); X is the variable (the thing you can change); B is the order of scaling (the exponent); and M is a constant (unchanging).

If M is equal to 1, the equation is then Y=X^B. If B=2, the equation becomes Y=X^2 (Y=X squared). If X is 1, Y is also 1. But if X=2, then Y=4; if X=3, then Y=9, and so on. A small change in the value of X leads to a proportionally large change in the value of Y.

B=1 is known as the linear scaling law.

To double a cake recipe, you need twice as much flour. To drive twice as far will take twice as long. (Unless you have kids, in which case you need to factor in bathroom breaks that seemingly have little to do with distance.) Linear relationships, in which twice-as-big requires twice-as-much, are simple and intuitive.

Nonlinear relationships are more complicated. In these cases, you don’t need twice as much of the original value to get twice the increase in some measurable characteristic. For example, an animal that’s twice our size requires only about 75% more food than we do. This means that on a per-unit-of-size basis, larger animals are more energy efficient than smaller ones. As animals get bigger, the energy required to support each unit decreases.

One of the characteristics of a complex system is that the behavior of the system differs from the simple addition of its parts. This characteristic is called emergent behavior. “In many instances,” write Geoffrey West in Scale: The Universal Laws of Growth, Innovation, Sustainability, and the Pace of Life in Organisms, Cities, Economies, and Companies, “the whole seems to take on a life of its own, almost dissociated from the specific characteristics of its individual building blocks.”

This collective outcome, in which a system manifests significantly different characteristics from those resulting from simply adding up all of the contributions of its individual constituent parts, is called an emergent behavior.

When we set out to understand a complex system, our intuition tells us to break it down into its component pieces. But that’s linear thinking, and it explains why so much of our thinking about complexity falls short. Small changes in a complex system can cause sudden and large changes. Small changes cause cascades among the connected parts, like knocking over the first domino in a long row.

Let’s return to the example of our hypothetical weightlifter-turned-runner. As they put in more time on the road, constraints will naturally arise on their progress.

Recall our exponential equation: Y=MX^B. Try applying it to the runner. (We’re going to simplify running, but stick with it.)

Y is the distance the runner can run before becoming exhausted. That’s what we’re trying to calculate. M, the constant, represents their running ability: some combination of their natural endowment and their training history. (Think of it this way: Olympic champion Usain Bolt has a high M; film director Woody Allen has a low M.)

That leaves us with the final term: X^B. The variable X represents the thing we have control over: in this case, our training mileage. If B, the exponent, is between 0 and 1, then the relationship between X and Y— between training mileage and endurance — becomes progressively less proportional. All it takes is plugging in a few numbers to see the effect.

Let’s set M to 1 for the sake of simplicity. If B=0.5 and X=4, then Y=2. Four miles on the road gives the athlete the ability to run two miles at a clip.

Increase X to 16, and Y increases only to 4. The runner has to put in four times the road mileage to merely double their running endurance.

Here’s the kicker: With both running and weightlifting, as we increase X, we’re likely to see the exponent, B, decline! Quadrupling our training mileage from 16 to 64 miles is unlikely to double our endurance again. It might take a 10x increase in mileage to do that. Eventually, the ratio of training mileage to endurance will become nearly infinite.

We know this state, of course, as diminishing returns: the point where more input yields progressively less output. Not only is the relationship between training mileage and endurance not linear to begin with, but it also gets less linear as we increase our training.

It gets even more interesting. If B=−0.5 and X=4, then Y=0.5. Four miles on the road gets us a half-mile of endurance. If X is increased to 16, Y declines to 0.25. More training, less endurance! This is akin to someone putting in way too much mileage, way too soon: the training is less than useful as injuries pile up.

With negative numbers, the more X increases, the more Y shrinks. This relationship is known as an inverse power law. B=−2, for example, is known as the inverse square law and is an important equation in physics.

The relationship between gravity and distance follows an inverse power law. G is the gravitational constant; it’s the constant in Newton’s law of gravitation, relating gravity to the masses and separation of particles, equal to:

6.67 × 10−11 N m2 kg−2

Any force radiating from a single point — including heat, light intensity, and magnetic and electrical forces — follows the inverse square law. At 1m away from a fire, 4 times as much heat is felt as at 2m, and so on.

Higher Order Power Laws

When B is a positive integer (a whole number larger than zero), there are names for the power laws.

When B is equal to 1, we have a linear relationship, as we discussed above. This is also known as a first-order power law.

Things really get interesting after that.

When B is 2, we have a second-order power law. A great example of this is kinetic energy. Kinetic energy = 1/2 mv^2

When B is 3, we have a third-order power law. An example of this is the power converted from wind into rotational energy.

Power Available = ½ (Air Density)( πr^2)(Windspeed^3)(Power Coefficient)

(There is a natural limit here. Albert Betz concluded in 1919 that wind turbines cannot convert more than 59.3% of the kinetic energy of the wind into mechanical energy. This number is called the Betz Limit and represents the power coefficient above.)[1]

The law of heat radiation is a fourth-order power law. Derived first by the Austrian physicist Josef Stefan in 1879 and separately by Austrian physicist Ludwig Boltzmann, the law works like this: the radiant heat energy emitted from a unit area in one second is equal to the constant of proportionality (the Stefan-Boltzmann constant) times the absolute temperature to the fourth power.[2]

There is only one power law with a variable exponent, and it’s considered to be one of the most powerful forces in the universe. It’s also the most misunderstood. We call it compounding. The formula looks like this:

Future Value = (Present Value)(1+i)^n

where i is the interest rate and n is the number of years.

Unlike in the other equations, the relationship between X and Y is potentially limitless. As long as B is positive, Y will increase as X does.

Non-integer power laws (where B is a fraction, as with our running example above) are also of great use to physicists. Formulas in which B=0.5 are common.

Imagine a car driving at a certain speed. A non-integer power law applies. V is the speed of the car, P is the petrol burnt per second to reach that speed, and A is the air resistance. For the car to go twice as fast, it must use 4 times as much petrol, and to go 3 times as fast, it must use 9 times as much petrol. Air resistance increases as speed increases, and that is why faster cars use such ridiculous amounts of petrol. It might seem logical to think that a car going from 40 miles an hour to 50 miles an hour would use a quarter more fuel. That is incorrect, though, because the relationship between air resistance and speed is itself a power law.

Another instance of a power law is the area of a square. Double the length of two parallel sides and the area quadruples. Do the same for a 3D cube and the area increases by a factor of eight. It doesn’t matter if the length of the square went from 1cm to 2cm, or from 100m to 200m; the area still quadruples. We are all familiar with second-order (or square) power laws. This name comes from squares, since the relationship between length and area reflect the way second-order power laws change a number. Third-order (or cubic) power laws are likewise named due to their relationship to cubes.

Using Power Laws in Our Lives

Now that we’ve gotten through the complicated part, let’s take a look at how power laws crop up in many fields of knowledge. Most careers involve an understanding of them, even if it might not be so obvious.

“What’s the most powerful force in the universe? Compound interest. It builds on itself. Over time, a small amount of money becomes a large amount of money. Persistence is similar. A little bit improves performance, which encourages greater persistence which improves persistence even more. And on and on it goes.”

— Daniel H. Pink, The Adventures of Johnny Bunko

The Power Behind Compounding

Compounding is one of our most important mental models and is absolutely vital to understand for investing, personal development, learning, and other crucial areas of life.

In economics, we calculate compound interest by using an equation with these variables: P is the original sum of money. P’ is the resulting sum of money, r is the annual interest rate, n is the compounding frequency, and t is the length of time. Using an equation, we can illustrate the power of compounding.

If a person deposits \$1000 in a bank for five years, at a quarterly interest rate of 4%, the equation becomes this:

Future Value = Present Value * ((1 + Quarterly Interest Rate) ^ Number of Quarters)

This formula can be used to calculate how much money will be in the account after five years. The answer is \$2,220.20.

Compound interest is a power law because the relationship between the amount of time a sum of money is left in an account and the amount accumulated at the end is non-linear.

In A Random Walk Down Wall Street, Burton Malkiel gives the example of two brothers, William and James. Beginning at age 20 and stopping at age 40, William invests \$4,000 per year. Meanwhile, James invests the same amount per year between the ages of 40 and 65. By the time William is 65, he has invested less money than his brother, but has allowed it to compound for 25 years. As a result, when both brothers retire, William has 600% more money than James — a gap of \$2 million. One of the smartest financial choices we can make is to start saving as early as possible: by harnessing power laws, we increase the exponent as much as possible.

Compound interest can help us achieve financial freedom and wealth, without the need for a large annual income. Members of the financial independence movement (such as the blogger Mr. Money Mustache) are living examples of how we can apply power laws to our lives.

As far back as the 1800s, Robert G. Ingersoll emphasized the importance of compound interest:

One dollar at compound interest, at twenty-four per cent., for one hundred years, would produce a sum equal to our national debt. Interest eats night and day, and the more it eats the hungrier it grows. The farmer in debt, lying awake at night, can, if he listens, hear it gnaw. If he owes nothing, he can hear his corn grow. Get out of debt as soon as possible. You have supported idle avarice and lazy economy long enough.

Compounding can apply to areas beyond finance — personal development, health, learning, relationships and more. For each area, a small input can lead to a large output, and the results build upon themselves.

Nonlinear Language Learning

When we learn a new language, it’s always a good idea to start by learning the 100 or so most used words.

In all known languages, a small percentage of words make up the majority of usage. This is known as Zipf’s law, after George Kingsley Zipf, who first identified the phenomenon. The most used word in a language may make up as much as 7% of all words used, while the second-most-used word is used half as much, and so on. As few as 135 words can together form half of a language (as used by native speakers).

Why Zipf’s law holds true is unknown, although the concept is logical. Many languages include a large number of specialist terms that are rarely needed (including legal or anatomy terms). A small change in the frequency ranking of a word means a huge change in its usefulness.

Understanding Zipf’s law is a central component of accelerated language learning. Each new word we learn from the most common 100 words will have a huge impact on our ability to communicate. As we learn less-common words, diminishing returns set in. If each word in a language were listed in order of frequency of usage, the further we moved down the list, the less useful a word would be.

Power Laws in Business, Explained by Peter Thiel

Peter Thiel, the founder of PayPal (as well as an early investor in Facebook and Palantir), considers power laws to be a crucial concept for all businesspeople to understand. In his fantastic book, Zero to One, Thiel writes:

Indeed, the single most powerful pattern I have noticed is that successful people find value in unexpected places, and they do this by thinking about business from first principles instead of formulas.

And:

In 1906, economist Vilfredo Pareto discovered what became the “Pareto Principle,” or the 80-20 rule, when he noticed that 20% of the people owned 80% of the land in Italy—a phenomenon that he found just as natural as the fact that 20% of the peapods in his garden produced 80% of the peas. This extraordinarily stark pattern, when a small few radically outstrip all rivals, surrounds us everywhere in the natural and social world. The most destructive earthquakes are many times more powerful than all smaller earthquakes combined. The biggest cities dwarf all mere towns put together. And monopoly businesses capture more value than millions of undifferentiated competitors. Whatever Einstein did or didn’t say, the power law—so named because exponential equations describe severely unequal distributions—is the law of the universe. It defines our surroundings so completely that we usually don’t even see it.

… [I]n venture capital, where investors try to profit from exponential growth in early-stage companies, a few companies attain exponentially greater value than all others. … [W]e don’t live in a normal world; we live under a power law.

The biggest secret in venture capital is that the best investment in a successful fund equals or outperforms the entire rest of the fund combined.

This implies two very strange rules for VCs. First, only invest in companies that have the potential to return the value of the entire fund. … This leads to rule number two: because rule number one is so restrictive, there can’t be any other rules.

…[L]ife is not a portfolio: not for a startup founder, and not for any individual. An entrepreneur cannot “diversify” herself; you cannot run dozens of companies at the same time and then hope that one of them works out well. Less obvious but just as important, an individual cannot diversify his own life by keeping dozens of equally possible careers in ready reserve.

Thiel teaches a class called Startup at Stanford, where he hammers home the value of understanding power laws. In his class, he imparts copious wisdom. From Blake Masters’ notes on Class 7:

Consider a prototypical successful venture fund. A number of investments go to zero over a period of time. Those tend to happen earlier rather than later. The investments that succeed do so on some sort of exponential curve. Sum it over the life of a portfolio and you get a J curve. Early investments fail. You have to pay management fees. But then the exponential growth takes place, at least in theory. Since you start out underwater, the big question is when you make it above the water line. A lot of funds never get there.

To answer that big question you have to ask another: what does the distribution of returns in [a] venture fund look like? The naïve response is just to rank companies from best to worst according to their return in multiple of dollars invested. People tend to group investments into three buckets. The bad companies go to zero. The mediocre ones do maybe 1x, so you don’t lose much or gain much. And then the great companies do maybe 3-10x.

But that model misses the key insight that actual returns are incredibly skewed. The more a VC understands this skew pattern, the better the VC. Bad VCs tend to think the dashed line is flat, i.e. that all companies are created equal, and some just fail, spin wheels, or grow. In reality you get a power law distribution.

Thiel explains how investors can apply the mental model of power laws (more from Masters’ notes on Class 7):

…Given a big power law distribution, you want to be fairly concentrated. … There just aren’t that many businesses that you can have the requisite high degree of conviction about. A better model is to invest in maybe 7 or 8 promising companies from which you think you can get a 10x return. …

Despite being rooted in middle school math, exponential thinking is hard. We live in a world where we normally don’t experience anything exponentially. Our general life experience is pretty linear. We vastly underestimate exponential things.

He also cautions against over-relying on power laws as a strategy (an assertion that should be kept in mind for all mental models). From Masters’ notes:

One shouldn’t be mechanical about this heuristic, or treat it as some immutable investment strategy. But it actually checks out pretty well, so at the very least it compels you to think about power law distribution.

Understanding exponents and power law distributions isn’t just about understanding VC. There are important personal applications too. Many things, such as key life decisions or starting businesses, also result in similar distributions.

Thiel then explains why founders should focus on one key revenue stream, rather than trying to build multiple equal ones:

Even within an individual business, there is probably a sort of power law as to what’s going to drive it. It’s troubling if a startup insists that it’s going to make money in many different ways. The power law distribution on revenues says that one source of revenue will dominate everything else.

For example, if you’re an entrepreneur who opens a coffee shop, you’ll have a lot of ways you can make money. You can sell coffee, cakes, paintings, merchandise, and more. But each of those things will not contribute to your success in an equal way. While there is value in the discovery process, once you’ve found the variable that matters most, you should place more time on that one and less on the others. The importance of finding this variable cannot be overstated.

He also acknowledges that power laws are one of the great secrets of investing success. From Masters’ notes on Class 11:

On one level, the anti-competition, power law, and distribution secrets are all secrets about nature. But they’re also secrets hidden by people. That is crucial to remember. Suppose you’re doing an experiment in a lab. You’re trying to figure out a natural secret. But every night another person comes into the lab and messes with your results. You won’t understand what’s going on if you confine your thinking to the nature side of things. It’s not enough to find an interesting experiment and try to do it. You have to understand the human piece too.

… We know that, per the power law secret, companies are not evenly distributed. The distribution tends to be bimodal; there are some great ones, and then there are a lot of ones that don’t really work at all. But understanding this isn’t enough. There is a big difference between understanding the power law secret in theory and being able to apply it in practice.

The key to all mental models is knowing the facts and being able to use the concept. As George Box said, “all models are false but some are useful.” Once we grasp the basics, the best next step is to start figuring out how to apply it.

The metaphor of an unseen person sabotaging laboratory results is an excellent metaphor for how cognitive biases and shortcuts cloud our judgement.

Natural Power Laws

Anyone who has kept a lot of pets will have noticed the link between an animal’s size and its lifespan. Small animals, like mice and hamsters, tend to live for a year or two. Larger ones, like dogs and cats, can live to 10-20 years, or even older in rare cases. Scaling up even more, some whales can live for 200 years. This comes down to power laws.

Biologists have found clear links between an animal’s size and its metabolism. Kleiber’s law (identified by Max Kleiber) states that an animal’s metabolic rate increases at three-fourths of the power of the animal’s weight (mass). If an average rabbit (2 kg) weighs one hundred times as much as an average mouse (20g), the rabbit’s metabolic rate will be 32 times the mouse’s. In other words, the rabbit’s structure is more efficient. It all comes down to the geometry behind their mass.

Which leads us to another biological power law: Smaller animals require more energy per gram of body weight, meaning that mice eat around half their body weight in dense foods each day. The reason is that, in terms of percentage of mass, larger animals have more structure (bones, etc.) and fewer reserves (fat stores).

Research has illustrated how power laws apply to blood circulation in animals. The end units through which oxygen, water, and nutrients enter cells from the bloodstream are the same size in all animals. Only the number per animal varies. The relationship between the total area of these units and the size of the animal is a third-order power law. The distance blood travels to enter cells and the actual volume of blood are also subject to power laws.

The Law of Diminishing Returns

As we have seen, a small change in one area can lead to a huge change in another. However, past a certain point, diminishing returns set in and more is worse. Working an hour extra per day might mean more gets done, whereas working three extra hours is likely to lead to less getting done due to exhaustion. Going from a sedentary lifestyle to running two days a week may result in greatly improved health, but stepping up to seven days a week will cause injuries. Overzealousness can turn a positive exponent into a negative exponent. For a busy restaurant, hiring an extra chef will mean that more people can be served, but hiring two new chefs might spoil the proverbial broth.

Perhaps the most underappreciated diminishing return, the one we never want to end up on the wrong side of, is the one between money and happiness.

In David and Goliath, Malcolm Gladwell discusses how diminishing returns relate to family incomes. Most people assume that the more money they make, the happier they and their families will be. This is true — up to a point. An income that’s too low to meet basic needs makes people miserable, leading to far more physical and mental health problems. A person who goes from earning \$30,000 a year to earning \$40,000 is likely to experience a dramatic boost in happiness. However, going from \$100,000 to \$110,000 leads to a negligible change in well-being.

The scholars who research happiness suggest that more money stops making people happier at a family income of around seventy-five thousand dollars a year. After that, what economists call “diminishing marginal returns” sets in. If your family makes seventy-five thousand and your neighbor makes a hundred thousand, that extra twenty-five thousand a year means that your neighbor can drive a nicer car and go out to eat slightly more often. But it doesn’t make your neighbor happier than you, or better equipped to do the thousands of small and large things that make for being a good parent.

Footnotes
• 1

http://www.raeng.org.uk/publications/other/23-wind-turbine

• 2

https://www.britannica.com/science/Stefan-Boltzmann-law

The Fairness Principle: How the Veil of Ignorance Helps Test Fairness

“But the nature of man is sufficiently revealed for him to know something of himself and sufficiently veiled to leave much impenetrable darkness, a darkness in which he ever gropes, forever in vain, trying to understand himself.”

— Alexis de Tocqueville, Democracy in America

The Basics

If you could redesign society from scratch, what would it look like?

How would you distribute wealth and power?

Would you make everyone equal or not? How would you define fairness and equality?

And — here’s the kicker — what if you had to make those decisions without knowing who you would be in this new society?

Philosopher John Rawls asked just that in a thought experiment known as “the Veil of Ignorance” in his 1971 book, Theory of Justice.

Like many thought experiments, the Veil of Ignorance could never be carried out in the literal sense, nor should it be. Its purpose is to explore ideas about justice, morality, equality, and social status in a structured manner.

The Veil of Ignorance, a component of social contract theory, allows us to test ideas for fairness.

Behind the Veil of Ignorance, no one knows who they are. They lack clues as to their class, their privileges, their disadvantages, or even their personality. They exist as an impartial group, tasked with designing a new society with its own conception of justice.

As a thought experiment, the Veil of Ignorance is powerful because our usual opinions regarding what is just and unjust are informed by our own experiences. We are shaped by our race, gender, class, education, appearance, sexuality, career, family, and so on. On the other side of the Veil of Ignorance, none of that exists. Technically, the resulting society should be a fair one.

In Ethical School Leadership, Spencer J. Maxcy writes:

Imagine that you have set for yourself the task of developing a totally new social contract for today’s society. How could you do so fairly? Although you could never actually eliminate all of your personal biases and prejudices, you would need to take steps at least to minimize them. Rawls suggests that you imagine yourself in an original position behind a veil of ignorance. Behind this veil, you know nothing of yourself and your natural abilities, or your position in society. You know nothing of your sex, race, nationality, or individual tastes. Behind such a veil of ignorance all individuals are simply specified as rational, free, and morally equal beings. You do know that in the “real world,” however, there will be a wide variety in the natural distribution of natural assets and abilities, and that there will be differences of sex, race, and culture that will distinguish groups of people from each other.

“The Fairness Principle: When contemplating a moral action, imagine that you do not know if you will be the moral doer or receiver, and when in doubt err on the side of the other person.”

— Michael Shermer, The Moral Arc: How Science and Reason Lead Humanity Toward Truth, Justice, and Freedom

The Purpose of the Veil of Ignorance

Because people behind the Veil of Ignorance do not know who they will be in this new society, any choice they make in structuring that society could either harm them or benefit them.

If they decide men will be superior, for example, they must face the risk that they will be women. If they decide that 10% of the population will be slaves to the others, they cannot be surprised if they find themselves to be slaves. No one wants to be part of a disadvantaged group, so the logical belief is that the Veil of Ignorance would produce a fair, egalitarian society.

Behind the Veil of Ignorance, cognitive biases melt away. The hypothetical people are rational thinkers. They use probabilistic thinking to assess the likelihood of their being affected by any chosen measure. They possess no opinions for which to seek confirmation. Nor do they have any recently learned information to pay undue attention to. The sole incentive they are biased towards is their own self-preservation, which is equivalent to the preservation of the entire group. They cannot stereotype any particular group as they could be members of it. They lack commitment to their prior selves as they do not know who they are.

So, what would these people decide on? According to Rawls, in a fair society all individuals must possess the following:

• Rights and liberties (including the right to vote, the right to hold public office, free speech, free thought, and fair legal treatment)
• Power and opportunities
• Income and wealth sufficient for a good quality of life (Not everyone needs to be rich, but everyone must have enough money to live a comfortable life.)
• The conditions necessary for self-respect

For these conditions to occur, the people behind the Veil of Ignorance must figure out how to achieve what Rawls regards as the two key components of justice:

• Everyone must have the best possible life which does not cause harm to others.
• Everyone must be able to improve their position, and any inequalities must be present solely if they benefit everyone.

However, the people behind the Veil of Ignorance cannot be completely blank slates or it would be impossible for them to make rational decisions. They understand general principles of science, psychology, politics, and economics. Human behavior is no mystery to them. Neither are key economic concepts, such as comparative advantage and supply and demand. Likewise, they comprehend the deleterious impact of social entropy, and they have a desire to create a stable, ordered society. Knowledge of human psychology leads them to be cognizant of the universal desire for happiness and fulfillment. Rawls considered all of this to be the minimum viable knowledge for rational decision-making.

Ways of Understanding the Veil of Ignorance

One way to understand the Veil of Ignorance is to imagine that you are tasked with cutting up a pizza to share with friends. You will be the last person to take a slice. Being of sound mind, you want to get the largest possible share, and the only way to ensure this is to make all the slices the same size. You could cut one huge slice for yourself and a few tiny ones for your friends, but one of them might take the large slice and leave you with a meager share. (Not to mention, your friends won’t think very highly of you.)

Another means of appreciating the implications of the Veil of Ignorance is by considering the social structures of certain species of ants. Even though queen ants are able to form colonies alone, they will band together to form stronger, more productive colonies. Once the first group of worker ants reaches maturity, the queens fight to the death until one remains. When they first form a colony, the queen ants are behind a Veil of Ignorance. They do not know if they will be the sole survivor or not. All they know, on an instinctual level, is that cooperation is beneficial for their species. Like the people behind the Veil of Ignorance, the ants make a decision which, by necessity, is selfless.

The Veil of Ignorance, as a thought experiment, shows us that ignorance is not always detrimental to a society. In some situations, it can create robust social structures. In the animal kingdom, we see many examples of creatures that cooperate even though they do not know if they will suffer or benefit as a result. In a paper entitled “The Many Selves of Social Insects,” Queller and Strassmann write of bees:

…social insect colonies are so tightly integrated that they seem to function as single organisms, as a new level of self. The honeybees’ celebrated dance about food location is just one instance of how their colonies integrate and act on information that no single individual possesses. Their unity of purpose is underscored by the heroism of workers, whose suicidal stinging attacks protect the single reproducing queen.

We can also consider the Tragedy of the Commons. Introduced by ecologist Garrett Hardin, this mental model states that shared resources will be exploited if no system for fair distribution is implemented. Individuals have no incentive to leave a share of free resources for others. Hardin’s classic example is an area of land which everyone in a village is free to use for their cattle. Each person wants to maximize the usefulness of the land, so they put more and more cattle out to graze. Yet the land is finite and at some point will become too depleted to support livestock. If the people behind the Veil of Ignorance had to choose how the common land should be shared, the logical decision would be to give each person an equal part and forbid them from introducing too many cattle.

As N. Gregory Mankiw writes in Principles of Microeconomics:

The Tragedy of the Commons is a story with a general lesson: when one person uses a common resource, he diminishes other people’s enjoyment of it. Because of this negative externality, common resources tend to be used excessively. The government can solve the problem by reducing use of the common resource through regulation or taxes. Alternatively, the government can sometimes turn the common resource into a private good.

This lesson has been known for thousands of years. The ancient Greek philosopher Aristotle pointed out the problem with common resources: “What is common to many is taken least care of, for all men have greater regard for what is their own than for what they possess in common with others.”

In The Case for Meritocracy, Michael Faust uses other thought experiments to support the Veil of Ignorance:

Let’s imagine another version of the thought experiment. If inheritance is so inherently wonderful — such an intrinsic good — then let’s collect together all of the inheritable money in the world. We shall now distribute this money in exactly the same way it would be distributed in today’s world… but with one radical difference. We are going to distribute it by lottery rather than by family inheritance, i.e, anyone in the world can receive it. So, in these circumstances, how many people who support inheritance would go on supporting it? Note that the government wouldn’t be getting the money… just lucky strangers. Would the advocates of inheritance remain as fiercely committed to their cherished principle? Or would the entire concept instantly be exposed for the nonsense it is?

If inheritance were treated as the lottery it is, no one would stand by it.

[…]

In the world of the 1% versus the 99%, no one in the 1% would ever accept a lottery to decide inheritance because there would be a 99% chance they would end up as schmucks, exactly like the rest of us.

And a further surrealistic thought experiment:

Imagine that on a certain day of the year, each person in the world randomly swaps bodies with another person, living anywhere on earth. Well, for the 1%, there’s a 99% chance that they will be swapped from heaven to hell. For the 99%, 1% might be swapped from hell to heaven, while the other 98% will stay the same as before. What kind of constitution would the human race adopt if annual body swapping were a compulsory event?! They would of course choose a fair one.

“In the immutability of their surroundings the foreign shores, the foreign faces, the changing immensity of life, glide past, veiled not by a sense of mystery but by a slightly disdainful ignorance.”

— Joseph Conrad, Heart of Darkness

The History of Social Contract Theory

Although the Veil of Ignorance was first described by Rawls in 1971, many other philosophers and writers have discussed similar concepts in the past. Philosophers discussed social contract theory as far back as ancient Greece.

In Crito, Plato describes a conversation in which Socrates discusses the laws of Athens and how they are responsible for his existence. Finding himself in prison and facing the death penalty, Socrates rejects Crito’s suggestion that he should escape. He states that further injustice is not an appropriate response to prior injustice. Crito believes that by refusing to escape, Socrates is aiding his enemies, as well as failing to fulfil his role as a father. But Socrates views the laws of Athens as a single entity that has always protected him. He describes breaking any of the laws as being like injuring a parent. Having lived a long, fulfilling life as a result of the social contract he entered at birth, he has no interest in now turning away from Athenian law. Accepting death is essentially a symbolic act that Socrates intends to use to illustrate rationality and reason to his followers. If he were to escape, he would be acting out of accord with the rest of his life, during which he was always concerned with justice.

Social contract theory is concerned with the laws and norms a society decides on and the obligation individuals have to follow them. Socrates’ dialogue with Plato has similarities with the final scene of Arthur Miller’s The Crucible. At the end of the play, John Proctor is hung for witchcraft despite having the option to confess and avoid death. In continuing to follow the social contract of Salem and not confessing to a crime he obviously did not commit, Proctor believes that his death will redeem his earlier mistakes. We see this in the final dialogue between Reverend Hale and Elizabeth (Proctor’s wife):

HALE: Woman, plead with him! […] Woman! It is pride, it is vanity. […] Be his helper! What profit him to bleed? Shall the dust praise him? Shall the worms declare his truth? Go to him, take his shame away!

ELIZABETH: […] He have his goodness now. God forbid I take it from him!

In these two situations, individuals allow themselves to be put to death in the interest of following the social contract they agreed upon by living in their respective societies. Earlier in their lives, neither person knew what their ultimate fate would be. They were essentially behind the Veil of Ignorance when they chose (consciously or unconsciously) to follow the laws enforced by the people around them. Just as the people behind the Veil of Ignorance must accept whatever roles they receive in the new society, Socrates and Proctor followed social contracts. To modern eyes, the decision both men make to abandon their children in the interest of proving a point is not easily defensible.

Immanuel Kant wrote about justice and freedom in the late 1700s. Kant believed that fair laws should not be based on making people happy or reflecting the desire of individual policymakers, but should be based on universal moral principles:

Is it not of the utmost necessity to construct a pure moral philosophy which is completely freed from everything that may be only empirical and thus belong to anthropology? That there must be such a philosophy is self-evident from the common idea of duty and moral laws. Everyone must admit that a law, if it is to hold morally, i.e., as a ground of obligation, must imply absolute necessity; he must admit that the command, “Then shalt not lie,” does not apply to men only, as if other rational beings had no need to observe it. The same is true for all other moral laws properly so called. He must concede that the ground of obligation here must not be sought in the nature of man or in the circumstances in which he is placed, but sought a priori solely in the concepts of pure reason, and that every other precept which is in certain respects universal, so far as it leans in the least on empirical grounds (perhaps only in regard to the motive involved), may be called a practical rule but never a moral law.

How We Can Apply This Concept

We can use the Veil of Ignorance to test whether a certain issue is fair.

When my kids are fighting over the last cookie, which happens more often than you’d imagine, I ask them to determine who will spilt the cookie. The other person picks. This is the old playground rule, “you split, I pick.” Without this rule, one of them would surely give the other a smaller portion. With it, the halves are as equal as they would be with sensible adults.

When considering whether we should endorse a proposed law or policy, we can ask: if I did not know if this would affect me or not, would I still support it? Those who make big decisions that shape the lives of large numbers of people are almost always those in positions of power. And those in positions of power are almost always members of privileged groups. As Benjamin Franklin once wrote: “Justice will not be served until those who are unaffected are as outraged as those who are.”

Laws allowing or prohibiting abortion have typically been made by men, for example. As the issue lacks real significance in their personal lives, they are free to base decisions on their own ideological views, rather than consider what is fair and sane. However, behind the Veil of Ignorance, no one knows their sex. Anyone deciding on abortion laws would have to face the possibility that they themselves will end up as a woman with an unwanted pregnancy.

In Justice as Fairness: A Restatement, Rawls writes:

So what better alternative is there than an agreement between citizens themselves reached under conditions that are fair for all?

[…]

[T]hreats of force and coercion, deception and fraud, and so on must be ruled out.

And:

Deep religious and moral conflicts characterize the subjective circumstances of justice. Those engaged in these conflicts are surely not in general self-interested, but rather, see themselves as defending their basic rights and liberties which secure their legitimate and fundamental interests. Moreover, these conflicts can be the most intractable and deeply divisive, often more so than social and economic ones.

In Ethics: Studying the Art of Moral Appraisal, Ronnie Littlejohn explains:

We must have a mechanism by which we can eliminate the arbitrariness and bias of our “situation in life” and insure that our moral standards are justified by the one thing all people share in common: reason. It is the function of the veil of ignorance to remove such bias.

When we have to make decisions that will affect other people, especially disadvantaged groups (such as when a politician decides to cut benefits or a CEO decides to outsource manufacturing to a low-income country), we can use the Veil of Ignorance as a tool for making fair choices.

As Robert F. Kennedy (the younger brother of John F. Kennedy) said in the 1960s:

Few will have the greatness to bend history itself, but each of us can work to change a small portion of events. It is from numberless diverse acts of courage and belief that human history is shaped. Each time a man stands up for an ideal, or acts to improve the lot of others, or strikes out against injustice, he sends forth a tiny ripple of hope, and crossing each other from a million different centers of energy and daring, those ripples build a current which can sweep down the mightiest walls of oppression and resistance.

When we choose to position ourselves behind the Veil of Ignorance, we have a better chance of creating one of those all-important ripples.

The Power of Incentives: Inside the Hidden Forces That Shape Behavior

“Never, ever, think about something else when you should be thinking about the power of incentives.”

— Charlie Munger

According to Charlie Munger, there are only a few forces more powerful than incentives. In his speech “The Psychology of Human Misjudgment,” he reflects on how the power of incentives never disappoints him:

Well, I think I’ve been in the top 5% of my age cohort all my life in understanding the power of incentives, and all my life I’ve underestimated it. And never a year passes but I get some surprise that pushes my limit a little farther.

Sometimes the solution to a behavior problem is simply to revisit incentives and make sure they align with the desired goal. Munger talks about Federal Express, which is one of his favorite examples of the power of incentives:

The heart and soul of the integrity of the system is that all the packages have to be shifted rapidly in one central location each night. And the system has no integrity if the whole shift can’t be done fast. And Federal Express had one hell of a time getting the thing to work.
And they tried moral suasion, they tried everything in the world, and finally somebody got the happy thought that they were paying the night shift by the hour, and that maybe if they paid them by the shift, the system would work better. And lo and behold, that solution worked.

If you’re trying to change a behavior, reason will take you only so far. Reflecting on another example where misaligned incentives hampered the sales of a superior product, Munger said:

Early in the history of Xerox, Joe Wilson, who was then in the government, had to go back to Xerox because he couldn’t understand how their better, new machine was selling so poorly in relation to their older and inferior machine. Of course when he got there, he found out that the commission arrangement with the salesmen gave a tremendous incentive to the inferior machine.

Ignoring incentives almost never works out well. Thinking about the incentives of others is necessary to create win-win relationships.

We can turn to psychology to obtain a more structured and thorough understanding of how incentives shape our actions.

The Science of Reinforcement

The science of reinforcement was furthered by Burrhus Frederic Skinner (usually called B.F. Skinner), a professor of psychology at Harvard from 1958 to 1974.

Skinner, unlike his contemporaries, refused to hypothesize about what happened on the inside (what people or animals thought and felt) and preferred to focus on what we can observe. To him, focusing on how much people ate meant more than focusing on subjective measures, like how hungry people were or how much pleasure they got from eating. He wanted to find out how environmental variables affected behavior, and he believed that behavior is shaped by its consequences.

If we don’t like the consequences of an action we’ve taken, we’re less likely to do it again; if we do like the consequences, we’re more likely to do it again. That assumption is the basis of operant conditioning, “a type of learning in which the strength of a behavior is modified by [its] consequences, such as reward or punishment.” 1

One of Skinner’s most important inventions was the operant conditioning chamber, also known as a “Skinner box,” which was used to study the effects of reinforcers on lab animals. The rats in the box had to figure out how to do a task (such as pushing a lever) that would reward them with food. Such an automated system allowed Skinner and thousands of successors to study conditioned behavior in a controlled setting.

What years of studies on reinforcement have revealed is that consistency and timing play important roles in shaping new behaviors. Psychologists argue that the best way for us to learn complex behaviors is via continuous reinforcement, in which the desired behavior is reinforced every time it’s performed.

If you want to teach your dog a new trick, for example, it is smart to reward him for every correct response. At the very beginning of the learning curve, your failure to immediately respond to a positive behavior might be misinterpreted as a sign of incorrect behavior from the dog’s perspective.

Intermittent reinforcement is reinforcement that is given only some of the times that the desired behavior occurs, and it can be done according to various schedules, some predictable and some not (see “Scheduling Reinforcement,” below). Intermittent reinforcement is argued to be the most efficient way to maintain an already learnt behavior. This is due to three reasons.

First, rewarding the behavior takes time away from the behavior’s continuation. Paying a worker after each piece is assembled on the assembly line simply does not make sense.

Second, intermittent reinforcement is better from an economic perspective. Not only is it cheaper not to reward every instance of a desired behavior, but by making the rewards unpredictable, you trigger excitement and thus get an increase in response without increasing the amount of reinforcement. Intermittent reinforcement is how casinos work; they want people to gamble, but they can’t afford to have people win large amounts very often.

Finally, intermittent reinforcement can induce resistance to extinction (stopping the behavior when reinforcement is removed). Consider the example of resistance outlined in the textbook Psychology: Core Concepts:

Imagine two gamblers and two slot machines. One machine inexplicably pays off on every trial and another, a more usual machine, pays on an unpredictable, intermittent schedule. Now, suppose that both devices suddenly stop paying. Which gambler will catch on first?

Most of us would probably guess it right:

The one who has been rewarded for each pull of the lever (continuous reinforcement) will quickly notice the change, while the gambler who has won only occasionally (on partial reinforcement) may continue playing unrewarded for a long time.

Scheduling Reinforcement

Intermittent reinforcement can be used on various schedules, each with its own degree of effectiveness and situations to which it can be appropriately applied. Ratio schedules are based on the number of responses (the amount of work done), whereas interval schedules are based on the amount of time spent.

• Fixed-ratio schedules are used when you pay your employees based on the amount of work they do. Fixed-ratio schedules are common in freelancing, where contractors are paid on a piecework basis. Managers like fixed-ratio schedules because the response to reinforcement is usually very high (if you want to get paid, you do the work).
• Variable-ratio schedules are unpredictable because the number of responses between reinforcers varies. Telemarketers, salespeople, and slot machine players are on this schedule because they never know when the next sale or the next big win will occur. Skinner himself demonstrated the power of this schedule by showing that a hungry pigeon would peck a disk 12,000 times an hour while being rewarded on average for only every 110 pecks. Unsurprisingly, this is the type of reinforcement that normally produces more responses than any other schedule. (Varying the intervals between reinforcers is another way of making reinforcement unpredictable, but if you want people to feel appreciated, this kind of schedule is probably not the one to use.)
• Fixed-interval schedules are the most common type of payment — they reward people for the time spent on a specific task. You might have already guessed that the response rate on this schedule is very low. Even a rat in a Skinner box programmed for a fixed-interval schedule learns that lever presses beyond the required minimum are just a waste of energy. Ironically, the “9-5 job” is a preferred way to reward employees in business.

While the design of scheduling can be a powerful technique for continuing or amplifying a specific behavior, we may still fail to recognize an important aspect of reinforcement — individual preferences for specific rewards.

Experience suggests that survival is propelled by our need for food and water. However, most of us don’t live in conditions of extreme scarcity and thus the types of reinforcement appealing to us will differ.

Culture plays an important role in determining effective reinforcers. And what’s reinforced shapes culture. Offering tickets to a cricket match might serve as a powerful reward for someone in a country where cricket is a big deal, but would be meaningless to most Americans. Similarly, an air-conditioned office might be a powerful incentive for employees in Indonesia, but won’t matter as much to employees in a more temperate area.

So far we’ve talked about positive reinforcement — the carrot, if you will. However, there is also a stick.

There is no doubt that our society relies heavily on threat and punishment as a way to keep ourselves in line. Still, we keep arriving late, forgetting birthdays, and receiving parking fines, even though we know there is the potential to be punished.

There are several reasons that punishment might not be the best way to alter someone’s behavior.

First of all, Skinner observed that the power of punishment to suppress behavior usually disappears when the threat of punishment is removed. Indeed, we all refrain from using social networks during work hours, when we know our boss is around, and we similarly adhere to the speed limit when we know we are being watched by a police patrol.

Second, punishment often triggers a fight-or-flight response and renders us aggressive. When punished, we seek to flee from further punishment, and when the escape is blocked, we may become aggressive. This punishment-aggression link may also explain why abusing parents come from abusing families themselves.

Third, punishment inhibits the ability to learn new and better responses. Punishment leads to a variety of responses — such as escape, aggression, and learned helplessness — none of which aid in the subject’s learning process. Punishment also fails to show subjects what exactly they must do and instead focuses on what not to do. This is why environments that forgive failure are so important in the learning process.

Finally, punishment is often applied unequally. We are ruled by bias in our assessment of who deserves to be punished. We scold boys more often than girls, physically punish grade-schoolers more often than adults, and control members of racial minorities more often (and more harshly) than whites.

There are three alternatives that you can try the next time you feel tempted to punish someone.

The first we already touched upon — extinction. A response will usually diminish or disappear if it ceases to produce the rewards it once did. However, it is important that all possible reinforcements are withheld. This is far more difficult to do in real life than in a lab setting.

What makes it especially difficult is that during the extinction process, organisms tend to look for novel techniques to obtain reinforcement. This means that a whining child will either redouble her efforts or change tactics to regain the parent’s attention before ceasing the behavior. In this case, a better extinction strategy is to combine methods by withholding attention after whining occurs and rewarding more desirable behaviors with attention before the whining occurs.

The second alternative is positively reinforcing preferred activities. For example, people who exercise regularly (and enjoy it) might use a daily run as a reward for getting other tasks done. Similarly, young children learn to sit still by being rewarded with occasional permission to run around and make noise. The main principle of this idea is that a preferred activity, such as running around, can be used to reinforce a less preferred activity. This idea is also called the Premack principle.

Finally, prompting and shaping are two actions we can use together to change behavior in an iterative manner. A prompt is a cue or stimulus that encourages the desired behavior. When shaping begins, any approximation of the target response is reinforced. Once you see the approximation occurring regularly, you can make the criterion for the target more strict (the actual behavior has to match the desired behavior more closely), and you continue narrowing the criteria until the specific target behavior is performed. This tactic is often the preferred method of developing a habit gradually and of training animals to perform a specific behavior.

***

I hope that you are now better equipped to recognize incentives as powerful forces shaping the way we and others behave. The next time you wish someone would change the way they behave, think about changing their incentives.

Like any parent, I experiment with my kids all the time. One of the most effective things I do when one of them has misbehaved is to acknowledge my child’s feelings and ask him what he was trying to achieve.

When one kid hits the other, for example, I ask him what he was trying to accomplish. Usually, the response is “He hit me. (So I hit him back.)” I know this touches on an automatic human response that many adults can’t control. Which makes me wonder how I can change my kids’ behavior to be more effective.

“So, you were angry and you wanted him to know?”

“Yes.”

“People are not for hitting. If you want, I’ll help you go tell him why you’re angry.”

Tensions dissipate. And I’m (hopefully) starting to get my kids thinking about effective and ineffective ways to achieve their goals.

Punishment works best to prevent actions whereas incentives work best to encourage them.

Let’s end with an excellent piece of advice that has been given regarding incentives. Here is Charlie Munger, speaking at the University South California commencement:

You do not want to be in a perverse incentive system that’s causing you to behave more and more foolishly or worse and worse — incentives are too powerful a control over human cognition or human behavior. If you’re in one [of these systems], I don’t have a solution for you. You’ll have to figure it out for yourself, but it’s a significant problem.

Footnotes

Complex adaptive systems are hard to understand. Messy and complicated, they cannot be broken down into smaller bits. It would be easier to ignore them, or simply leave them as mysteries. But given that we are living in one such system, it might be more useful to buckle down and sort it out. That way, we can make choices that are aligned with how the world actually operates.

In his book Diversity and Complexity, Scott E. Page explains, “Complexity can be loosely thought of as interesting structures and patterns that are not easily described or predicted. Systems that produce complexity consist of diverse rule-following entities whose behaviors are interdependent. Those entities interact over a contact structure or network. In addition, the entities often adapt.”

Understanding complexity is important, because sometimes things are not further reducible. While the premise of Occam’s Razor is that things should be made as simple as possible but not simpler, sometimes there are things that cannot be reduced. There is, in fact, an irreducible minimum. Certain things can be properly contemplated only in all their complicated, interconnected glory.

Take, for example, cities.

Cities cannot be created for success from the top down by the imposition of simple rules.

For those of us who live in cities, we all know what makes a particular neighborhood great. We can get what we need and have the interactions we want, and that’s ultimately because we feel safe there.

But how is this achieved? What magic combination of people and locations, uses and destinations, makes a vibrant, safe neighborhood? Is there a formula for, say, the ratio of houses to businesses, or of children to workers?

No. Cities are complex adaptive systems. They cannot be created for success from the top down by the imposition of simple rules.

In her seminal book The Death and Life of Great American Cities, Jane Jacobs approached the city as a complex adaptive system, turned city planning on its head, and likely saved many North American cities by taking them apart and showing that they cannot be reduced to a series of simple behavioral interactions.

Cities fall exactly into the definition of complexity given above by Page. They are full of rule-following humans, cars, and wildlife, the behaviors of which are interdependent on the other entities and respond to feedback.

These components of a city interact over multiple interfaces in a city network and will adapt easily, changing their behavior based on food availability, road closures, or perceived safety. But the city itself cannot be understood by looking at just one of these behaviors.

Jacobs starts with “the kind of problem which cities pose — a problem in handling organized complexity” — and a series of observations about that common, almost innocuous, part of all cities: the sidewalk.

What makes a particular neighborhood safe?

Jacobs argues that there is no one factor but rather a series of them. In order to understand how a city street can be safe, you must examine the full scope of interactions that occur on its sidewalk. “The trust of a city street is formed over time from many, many little public sidewalk contacts.” Nodding to people you know, noticing people you don’t. Recognizing which parent goes with which kid, or whose business seems to be thriving. People create safety.

Given that most of them are strangers to each other, how do they do this? How come these strangers are not all perceived as threats?

Safe streets are streets that are used by many different types of people throughout the 24-hour day. Children, workers, caregivers, tourists, diners — the more people who use the sidewalk, the more eyes that participate in the safety of the street.

Safety on city streets is “kept primarily by an intricate, almost unconscious, network of voluntary controls and standards among the people themselves, and enforced by the people themselves.” Essentially, we all contribute to safety because we all want safety. It increases our chances of survival.

Jacobs brings an amazing eye for observational detail in describing neighborhoods that work and those that don’t. In describing sidewalks, she explains that successful, safe neighborhoods are orderly. “But there is nothing simple about that order itself, or the bewildering number of components that go into it. Most of those components are specialized in one way or another. They unite in their joint effect upon the sidewalk, which is not specialized in the least. That is its strength.” For example, restaurant patrons, shopkeepers, loitering teenagers, etc. — some of whom belong to the area and some of whom are transient — all use the sidewalk and in doing so contribute to the interconnected and interdependent relationships that produce the perception of safety on that street. And real safety will follow perceived safety.

To get people participating in this unorganized street safety, you have to have streets that are desirable. “You can’t make people use streets they have no reason to use. You can’t make people watch streets they do not want to watch.” But Jacobs points out time and again that there is no predictable prescription for how to achieve this mixed use where people are unconsciously invested in the maintenance of safety.

This is where considering the city as a complex adaptive system is most useful.

Each individual component has a part to play, so a top-down imposition of theory that doesn’t allow for the unpredictable behavior of each individual is doomed to fail. “Orthodox planning is much imbued with puritanical and Utopian conceptions of how people should spend their free time, and in planning, these moralisms on people’s private lives are deeply confused with concepts about the workings of cities.” A large, diverse group of people is not going to conform to only one way of living. And it’s the diversity that offers the protection.

For example, a city planner might decide to not have bars in residential neighborhoods. The noise might keep people up, or there will be a negative moral impact on the children who are exposed to the behavior of loud, obnoxious drunks. But as Jacobs reveals, safe city areas can’t be built on the basis of this type of simplistic assumption.

By stretching the use of a street through as many hours of the day as possible, you might create a safer neighborhood. I say “might” because in this complex system, other factors might connect to manifest a different reality.

Planning that doesn’t respect the spectrum of diverse behavior and instead aims to insist on an ideal based on a few simple concepts will hinder the natural ability of a system to adapt.

As Scott Page explains, “Creating a complex system from scratch takes skill (or evolution). Therefore, when we see diverse complex systems in the real world, we should not assume that they’ve been assembled from whole cloth. Far more likely, they’ve been constructed bit by bit.”

Urban planning that doesn’t respect the spectrum of diverse behavior and instead aims to insist on an ideal based on a few simple concepts (fresh air, more public space, large private space) will hinder the natural ability of a city system to adapt in a way that suits the residents. And it is this ability to adapt that is the cornerstone requirement of this type of complex system. Inhibit the adaptive property and you all but ensure the collapse of the system.

As Jacobs articulates:

Under the seeming disorder of the old city, wherever the old city is working successfully, is a marvelous order for maintaining the safety of the streets and the freedom of the city. It is a complex order. Its essence is intricacy of sidewalk use, bringing with it a constant succession of eyes. This order is all composed of movement and change, and although it is life, not art, we may fancifully call it the art form of the city and liken it to the dance — … to an intricate ballet in which the individual dancers and ensembles all have distinctive parts which miraculously reinforce each other and compose an orderly whole. The ballet of the good city sidewalk never repeats itself from place to place, and in any one place is always replete with new improvisations.

This is the essence of complexity. As Scott Page argues, “Adaptation occurs at the level of individuals or of types. The system itself doesn’t adapt. The parts do; they alter their behaviors leading to system level adaptation.”

Jacobs maintains that “the sight of people attracts still other people.” We feel more secure when we know there are multiple eyes on us, eyes that are concerned only with the immediate function that might affect them and are not therefore invasive.

Our complex behavior as individuals in cities, interacting with various components in any given day, is multiplied by everyone, so a city that produces a safe environment seems to be almost miraculous. But ultimately our behavior is governed by certain rules — not rules that are imposed by theory or external forces, but rules that we all feel are critical to our well-being and success in our city.

Thus, the workings of a desirable city are produced by a multitude of small interactions that have evolved and adapted as they have promoted the existence of the things that most support the desires of individuals.

“The look of things and the way they work are inextricably bound together, and in no place more so than cities,” claims Jacobs. Use is not independent of form. That is why we must understand the system as a whole. No matter how many components and unpredictable potential interactions there are, they are all part of what makes the city function.

As Jacobs concludes, “There is no use wishing it were a simpler problem, because in real life it is not a simpler problem. No matter what you try to do to it, a city park behaves like a problem in organized complexity, and that is what it is. The same is true of all other parts or features of cities. Although the inter-relations of their many factors are complex, there is nothing accidental or irrational about the ways in which these factors affect each other.”