Power Laws: How Nonlinear Relationships Amplify Results

“The greatest shortcoming of the human race is our inability to understand the exponential function.”
— Albert Allen Bartlett

Defining A Power Law

Consider a person who begins weightlifting for the first time.

During their initial sessions, they can lift only a small amount of weight. But as they invest more time, they find that for each training session, their strength increases a surprising amount.

For a while, they make huge improvements. Eventually, however, their progress slows down. At first, they could increase their strength by as much as 10% per session; now it takes months to improve by even 1%. Perhaps they resort to taking performance-enhancing drugs or training more often. Their motivation is sapped, and they find themselves getting injured, without any real change in the amount of weight they can lift.

Now, let’s imagine that our frustrated weightlifter decides to take up running instead. Something similar happens. While the first few runs are incredibly difficult, the person’s endurance increases rapidly with the passing of each week, until it levels off and diminishing returns set in again.

Both of these situations are examples of power laws — a relationship between two things in which a change in one thing can lead to a large change in the other, regardless of the initial quantities. In both of our examples, a small investment of time at the beginning of the endeavor leads to a large increase in performance.

Power laws are interesting because they reveal surprising correlations between disparate factors. As a mental model, power laws are versatile, with numerous applications in different fields of knowledge.

If parts of this post look intimidating to non-mathematicians, bear with us. Understanding the math behind power laws is worthwhile to grasp their many applications. Invest a little time in reading this and reap the value — which is in itself an example of a power-law!

A power law is often represented by an equation with an exponent:

Y=MX^B

Each letter represents a number. Y is a function (the result); X is the variable (the thing you can change); B is the order of scaling (the exponent), and M is a constant (unchanging).

If M is equal to 1, the equation is then Y=X^B. If B=2, the equation becomes Y=X^2 (Y=X squared). If X is 1, Y is also 1. But if X=2, then Y=4; if X=3, then Y=9, and so on. A small change in the value of X leads to a proportionally large change in the value of Y.

B=1 is known as the linear scaling law.

To double a cake recipe, you need twice as much flour. To drive twice as far will take twice as long. (Unless you have kids, in which case you need to factor in bathroom breaks that seemingly have little to do with distance.) Linear relationships, in which twice-as-big requires twice-as-much, are simple and intuitive.

Nonlinear relationships are more complicated. In these cases, you don’t need twice as much of the original value to get twice the increase in some measurable characteristic. For example, an animal that’s twice our size requires only about 75% more food than we do. This means that on a per-unit-of-size basis, larger animals are more energy-efficient than smaller ones. As animals get bigger, the energy required to support each unit decreases.

One of the characteristics of a complex system is that the behavior of the system differs from the simple addition of its parts. This characteristic is called emergent behavior. “In many instances,” write Geoffrey West in Scale: The Universal Laws of Growth, Innovation, Sustainability, and the Pace of Life in Organisms, Cities, Economies, and Companies, “the whole seems to take on a life of its own, almost dissociated from the specific characteristics of its individual building blocks.”

This collective outcome, in which a system manifests significantly different characteristics from those resulting from simply adding up all of the contributions of its individual constituent parts, is called an emergent behavior.

When we set out to understand a complex system, our intuition tells us to break it down into its component pieces. But that’s linear thinking, and it explains why so much of our thinking about complexity falls short. Small changes in a complex system can cause sudden and large changes. Small changes cause cascades among the connected parts, like knocking over the first domino in a long row.

Let’s return to the example of our hypothetical weightlifter-turned-runner. As they put in more time on the road, constraints will naturally arise on their progress.

Recall our exponential equation: Y=MX^B. Try applying it to the runner. (We’re going to simplify running, but stick with it.)

Y is the distance the runner can run before becoming exhausted. That’s what we’re trying to calculate. M, the constant, represents their running ability: some combination of their natural endowment and their training history. (Think of it this way: Olympic champion Usain Bolt has a high M; film director Woody Allen has a low M.)

That leaves us with the final term: X^B. The variable X represents the thing we have control over: in this case, our training mileage. If B, the exponent, is between 0 and 1, then the relationship between X and Y— between training mileage and endurance — becomes progressively less proportional. All it takes is plugging in a few numbers to see the effect.

Let’s set M to 1 for the sake of simplicity. If B=0.5 and X=4, then Y=2. Four miles on the road gives the athlete the ability to run two miles at a clip.

Increase X to 16, and Y increases only to 4. The runner has to put in four times the road mileage to merely double their running endurance.

Here’s the kicker: With both running and weightlifting, as we increase X, we’re likely to see the exponent, B, decline! Quadrupling our training mileage from 16 to 64 miles is unlikely to double our endurance again. It might take a 10x increase in mileage to do that. Eventually, the ratio of training mileage to endurance will become nearly infinite.

We know this state, of course, as diminishing returns: the point where more input yields progressively less output. Not only is the relationship between training mileage and endurance not linear to begin with, but it also gets less linear as we increase our training.

And what about negative exponents?

It gets even more interesting. If B=−0.5 and X=4, then Y=0.5. Four miles on the road gets us a half-mile of endurance. If X is increased to 16, Y declines to 0.25. More training, less endurance! This is akin to someone putting in way too much mileage, way too soon: the training is less than useful as injuries pile up.

With negative numbers, the more X increases, the more Y shrinks. This relationship is known as an inverse power law. B=−2, for example, is known as the inverse square law and is an important equation in physics.

The relationship between gravity and distance follows an inverse power law. G is the gravitational constant; it’s the constant in Newton’s law of gravitation, relating gravity to the masses and separation of particles, equal to:

6.67 × 10⁻¹¹ N m² kg⁻²

Any force radiating from a single point — including heat, light intensity, and magnetic and electrical forces — follows the inverse square law. At 1m away from a fire, 4 times as much heat is felt as at 2m, and so on.

Higher-Order Power Laws

When B is a positive integer (a whole number larger than zero), there are names for the power laws.

When B is equal to 1, we have a linear relationship, as we discussed above. This is also known as a first-order power law.

Things really get interesting after that.

When B is 2, we have a second-order power law. A great example of this is kinetic energy. Kinetic energy = 1/2 mv^2

When B is 3, we have a third-order power law. An example of this is the power converted from the wind into rotational energy.

Power Available = ½ (Air Density)( πr^2)(Windspeed^3)(Power Coefficient)

(There is a natural limit here. Albert Betz concluded in 1919 that wind turbines cannot convert more than 59.3% of the kinetic energy of the wind into mechanical energy. This number is called the Betz Limit and represents the power coefficient above.)[1]

The law of heat radiation is a fourth-order power law. Derived first by the Austrian physicist Josef Stefan in 1879 and separately by Austrian physicist Ludwig Boltzmann, the law works like this: the radiant heat energy emitted from a unit area in one second is equal to the constant of proportionality (the Stefan-Boltzmann constant) times the absolute temperature to the fourth power.[2]

There is only one power-law with a variable exponent, and it’s considered to be one of the most powerful forces in the universe. It’s also the most misunderstood. We call it compounding. The formula looks like this:

Future Value = (Present Value)(1+i)^n

where i is the interest rate, and n is the number of years.

Unlike in the other equations, the relationship between X and Y is potentially limitless. As long as B is positive, Y will increase as X does.

Non-integer power laws (where B is a fraction, as with our running example above) are also of great use to physicists. Formulas in which B=0.5 are common.

Imagine a car driving at a certain speed. A non-integer power law applies. V is the speed of the car, P is the petrol burnt per second to reach that speed, and A is the air resistance. For the car to go twice as fast, it must use 4 times as much petrol, and to go 3 times as fast, it must use 9 times as much petrol. Air resistance increases as speed increases, and that is why faster cars use such ridiculous amounts of petrol. It might seem logical to think that a car going from 40 miles an hour to 50 miles an hour would use a quarter more fuel. That is incorrect, though, because the relationship between air resistance and speed is itself a power law.

Another instance of a power law is the area of a square. Double the length of two parallel sides and the area quadruples. Do the same for a 3D cube, and the area increases by a factor of eight. It doesn’t matter if the length of the square went from 1cm to 2cm, or from 100m to 200m; the area still quadruples. We are all familiar with second-order (or square) power laws. This name comes from squares since the relationship between length and area reflects the way second-order power laws change a number. Third-order (or cubic) power laws are likewise named due to their relationship to cubes.

Using Power Laws in Our Lives

Now that we’ve gotten through the complicated part let’s take a look at how power laws crop up in many fields of knowledge. Most careers involve an understanding of them, even if it might not be so obvious.

“What’s the most powerful force in the universe? Compound interest. It builds on itself. Over time, a small amount of money becomes a large amount of money. Persistence is similar. A little bit improves performance, which encourages greater persistence, which improves persistence even more. And on and on it goes.”
— Daniel H. Pink, The Adventures of Johnny Bunko

The Power Behind Compounding

Compounding is one of our most important mental models and is absolutely vital to understand for investing, personal development, learning, and other crucial areas of life.

In economics, we calculate compound interest by using an equation with these variables: P is the original sum of money. P’ is the resulting sum of money, r is the annual interest rate, n is the compounding frequency, and t is the length of time. Using an equation, we can illustrate the power of compounding.

If a person deposits $1000 in a bank for five years, at a quarterly interest rate of 4%, the equation becomes this:

Future Value = Present Value * ((1 + Quarterly Interest Rate) ^ Number of Quarters)

This formula can be used to calculate how much money will be in the account after five years. The answer is $2,220.20.

Compound interest is a power law because the relationship between the amount of time a sum of money is left in an account and the amount accumulated at the end is non-linear.

In A Random Walk Down Wall Street, Burton Malkiel gives the example of two brothers, William and James. Beginning at age 20 and stopping at age 40, William invests $4,000 per year. Meanwhile, James invests the same amount per year between the ages of 40 and 65. By the time William is 65, he has invested less money than his brother but has allowed it to compound for 25 years. As a result, when both brothers retire, William has 600% more money than James — a gap of $2 million. One of the smartest financial choices we can make is to start saving as early as possible: by harnessing power laws, we increase the exponent as much as possible.

Compound interest can help us achieve financial freedom and wealth, without the need for a large annual income. Members of the financial independence movement (such as the blogger Mr. Money Mustache) are living examples of how we can apply power laws to our lives.

As far back as the 1800s, Robert G. Ingersoll emphasized the importance of compound interest:

One dollar at compound interest, at twenty-four per cent., for one hundred years, would produce a sum equal to our national debt. Interest eats night and day, and the more it eats the hungrier it grows. The farmer in debt, lying awake at night, can, if he listens, hear it gnaw. If he owes nothing, he can hear his corn grow. Get out of debt as soon as possible. You have supported idle avarice and lazy economy long enough.

Compounding can apply to areas beyond finance — personal development, health, learning, relationships, and more. For each area, a small input can lead to large output, and the results build upon themselves.

Nonlinear Language Learning

When we learn a new language, it’s always a good idea to start by learning the 100 or so most used words.

In all known languages, a small percentage of words make up the majority of usage. This is known as Zipf’s law, after George Kingsley Zipf, who first identified the phenomenon. The most used word in a language may make up as much as 7% of all words used, while the second-most-used word is used half as much, and so on. As few as 135 words can together form half of a language (as used by native speakers).

Why Zipf’s law holds true is unknown, although the concept is logical. Many languages include a large number of specialist terms that are rarely needed (including legal or anatomy terms). A small change in the frequency ranking of a word means a huge change in its usefulness.

Understanding Zipf’s law is a central component of accelerated language learning. Each new word we learn from the most common 100 words will have a huge impact on our ability to communicate. As we learn less-common words, diminishing returns set in. If each word in a language were listed in order of frequency of usage, the further we moved down the list, the less useful a word would be.

Power Laws in Business, Explained by Peter Thiel

Peter Thiel, the founder of PayPal (as well as an early investor in Facebook and Palantir), considers power laws to be a crucial concept for all businesspeople to understand. In his fantastic book, Zero to One, Thiel writes:

Indeed, the single most powerful pattern I have noticed is that successful people find value in unexpected places, and they do this by thinking about business from first principles instead of formulas.

And:

In 1906, economist Vilfredo Pareto discovered what became the “Pareto Principle,” or the 80-20 rule, when he noticed that 20% of the people owned 80% of the land in Italy—a phenomenon that he found just as natural as the fact that 20% of the peapods in his garden produced 80% of the peas. This extraordinarily stark pattern, when a small few radically outstrip all rivals, surrounds us everywhere in the natural and social world. The most destructive earthquakes are many times more powerful than all smaller earthquakes combined. The biggest cities dwarf all mere towns put together. And monopoly businesses capture more value than millions of undifferentiated competitors. Whatever Einstein did or didn’t say, the power law—so named because exponential equations describe severely unequal distributions—is the law of the universe. It defines our surroundings so completely that we usually don’t even see it.

… [I]n venture capital, where investors try to profit from exponential growth in early-stage companies, a few companies attain exponentially greater value than all others. … [W]e don’t live in a normal world; we live under a power law.

… The biggest secret in venture capital is that the best investment in a successful fund equals or outperforms the entire rest of the fund combined.

This implies two very strange rules for VCs. First, only invest in companies that have the potential to return the value of the entire fund. … This leads to rule number two: because rule number one is so restrictive, there can’t be any other rules.

…[L]ife is not a portfolio: not for a startup founder, and not for any individual. An entrepreneur cannot “diversify” herself; you cannot run dozens of companies at the same time and then hope that one of them works out well. Less obvious but just as important, an individual cannot diversify his own life by keeping dozens of equally possible careers in ready reserve.

Thiel teaches a class called Startup at Stanford, where he hammers home the value of understanding power laws. In his class, he imparts copious wisdom. From Blake Masters’ notes on Class 7:

Consider a prototypical successful venture fund. A number of investments go to zero over a period of time. Those tend to happen earlier rather than later. The investments that succeed do so on some sort of exponential curve. Sum it over the life of a portfolio and you get a J curve. Early investments fail. You have to pay management fees. But then the exponential growth takes place, at least in theory. Since you start out underwater, the big question is when you make it above the water line. A lot of funds never get there.

To answer that big question you have to ask another: what does the distribution of returns in [a] venture fund look like? The naïve response is just to rank companies from best to worst according to their return in multiple of dollars invested. People tend to group investments into three buckets. The bad companies go to zero. The mediocre ones do maybe 1x, so you don’t lose much or gain much. And then the great companies do maybe 3-10x.

But that model misses the key insight that actual returns are incredibly skewed. The more a VC understands this skew pattern, the better the VC. Bad VCs tend to think the dashed line is flat, i.e. that all companies are created equal, and some just fail, spin wheels, or grow. In reality you get a power law distribution.

Thiel explains how investors can apply the mental model of power laws (more from Masters’ notes on Class 7):

…Given a big power law distribution, you want to be fairly concentrated. … There just aren’t that many businesses that you can have the requisite high degree of conviction about. A better model is to invest in maybe 7 or 8 promising companies from which you think you can get a 10x return. …

Despite being rooted in middle school math, exponential thinking is hard. We live in a world where we normally don’t experience anything exponentially. Our general life experience is pretty linear. We vastly underestimate exponential things.

He also cautions against over-relying on power laws as a strategy (an assertion that should be kept in mind for all mental models). From Masters’ notes:

One shouldn’t be mechanical about this heuristic, or treat it as some immutable investment strategy. But it actually checks out pretty well, so at the very least it compels you to think about power law distribution.

Understanding exponents and power law distributions isn’t just about understanding VC. There are important personal applications too. Many things, such as key life decisions or starting businesses, also result in similar distributions.

Thiel then explains why founders should focus on one key revenue stream, rather than trying to build multiple equal ones:

Even within an individual business, there is probably a sort of power law as to what’s going to drive it. It’s troubling if a startup insists that it’s going to make money in many different ways. The power law distribution on revenues says that one source of revenue will dominate everything else.

For example, if you’re an entrepreneur who opens a coffee shop, you’ll have a lot of ways you can make money. You can sell coffee, cakes, paintings, merchandise, and more. But each of those things will not contribute to your success in an equal way. While there is value in the discovery process, once you’ve found the variable that matters most, you should place more time on that one and less on the others. The importance of finding this variable cannot be overstated.

He also acknowledges that power laws are one of the great secrets of investing success. From Masters’ notes on Class 11:

On one level, the anti-competition, power law, and distribution secrets are all secrets about nature. But they’re also secrets hidden by people. That is crucial to remember. Suppose you’re doing an experiment in a lab. You’re trying to figure out a natural secret. But every night another person comes into the lab and messes with your results. You won’t understand what’s going on if you confine your thinking to the nature side of things. It’s not enough to find an interesting experiment and try to do it. You have to understand the human piece too.

… We know that, per the power law secret, companies are not evenly distributed. The distribution tends to be bimodal; there are some great ones, and then there are a lot of ones that don’t really work at all. But understanding this isn’t enough. There is a big difference between understanding the power law secret in theory and being able to apply it in practice.

The key to all mental models is knowing the facts and being able to use the concept. As George Box said, “all models are false but some are useful.” Once we grasp the basics, the best next step is to start figuring out how to apply it.

The metaphor of an unseen person sabotaging laboratory results is an excellent metaphor for how cognitive biases and shortcuts cloud our judgment.

Natural Power Laws

Anyone who has kept a lot of pets will have noticed the link between an animal’s size and its lifespan. Small animals, like mice and hamsters, tend to live for a year or two. Larger ones, like dogs and cats, can live to 10-20 years, or even older in rare cases. Scaling up, even more, some whales can live for 200 years. This comes down to power laws.

Biologists have found clear links between an animal’s size and its metabolism. Kleiber’s law (identified by Max Kleiber) states that an animal’s metabolic rate increases at three-fourths of the power of the animal’s weight (mass). If an average rabbit (2 kg) weighs one hundred times as much as an average mouse (20g), the rabbit’s metabolic rate will be 32 times the mouse’s. In other words, the rabbit’s structure is more efficient. It all comes down to the geometry behind their mass.

This leads us to another biological power law: Smaller animals require more energy per gram of body weight, meaning that mice eat around half their body weight in dense foods each day. The reason is that, in terms of the percentage of mass, larger animals have more structure (bones, etc.) and fewer reserves (fat stores).

Research has illustrated how power laws apply to blood circulation in animals. The end units through which oxygen, water, and nutrients enter cells from the bloodstream are the same size in all animals. Only the number per animal varies. The relationship between the total area of these units and the size of the animal is a third-order power law. The distance blood travels to enter cells, and the actual volume of blood is also subject to power laws.

The Law of Diminishing Returns

As we have seen, a small change in one area can lead to a huge change in another. However, past a certain point, diminishing returns set in and more is worse. Working an hour extra per day might mean more gets done, whereas working three extra hours is likely to lead to less getting done due to exhaustion. Going from a sedentary lifestyle to running two days a week may result in greatly improved health, but stepping up to seven days a week will cause injuries. Overzealousness can turn a positive exponent into a negative exponent. For a busy restaurant, hiring an extra chef will mean that more people can be served, but hiring two new chefs might spoil the proverbial broth.

Perhaps the most underappreciated diminishing return, the one we never want to end up on the wrong side of, is the one between money and happiness.

In David and Goliath, Malcolm Gladwell discusses how diminishing returns relate to family incomes. Most people assume that the more money they make, the happier they and their families will be. This is true — up to a point. An income that’s too low to meet basic needs makes people miserable, leading to far more physical and mental health problems. A person who goes from earning $30,000 a year to earning $40,000 is likely to experience a dramatic boost in happiness. However, going from $100,000 to $110,000 leads to a negligible change in well-being.

Gladwell writes:

The scholars who research happiness suggest that more money stops making people happier at a family income of around seventy-five thousand dollars a year. After that, what economists call “diminishing marginal returns” sets in. If your family makes seventy-five thousand and your neighbor makes a hundred thousand, that extra twenty-five thousand a year means that your neighbor can drive a nicer car and go out to eat slightly more often. But it doesn’t make your neighbor happier than you, or better equipped to do the thousands of small and large things that make for being a good parent.