Tag: Margin of safety

Making the Most of Second Chances

We all get lucky. Once in a while we do something really stupid that could have resulted in death, but didn’t. Just the other day, I saw someone who was texting walk out into oncoming traffic, narrowly avoiding the car whose driver slammed on the brakes. As the adrenaline starts to dissipate, we realize that we don’t ever want to be in that situation again. What can we do? We can make the most of our second chances by building margins of safety into our lives.

What is a margin of safety and where can I get one?

The concept is a cornerstone of engineering. Engineers design systems to withstand significantly more emergencies, unexpected loads, misuse, or degradation than would normally be expected.

Take a bridge. You are designing a bridge to cross just under two hundred feet of river. The bridge has two lanes going in each direction. Given the average car size, the bridge could reasonably carry 50 to 60 cars at a time. At 4,000 pounds per car, your bridge needs to be able to carry at least 240,000 pounds of weight; otherwise, don’t bother building it. So that’s the minimum consideration for safety — but only the worst engineer would stop there.

Can anyone walk across your bridge? Can anyone park their car on the shoulder? What if cars get heavier? What if 20 cement trucks are on the bridge at the same time? How does the climate affect the integrity of your materials over time? You don’t want the weight capacity of the bridge to ever come close to the actual load. Otherwise, one seagull decides to land on the railing and the whole structure collapses.

Considering these questions and looking at the possibilities is how you get the right information so you can adjust your specs to build in a margin of safety. That’s the difference between what your system is expected to withstand and what it actually could. So when you are designing a bridge, the first step is to figure out the maximum load it should ever see (bumper-to-bumper vehicles, hordes of tourist groups, and birds perched wing to wing), and then you design for at least double that load.

Knowing that the infrastructure was designed to withstand significantly more than the anticipated maximum load makes us happy when we are on bridges, or in airplanes, or jumping on the bed in our second-story bedroom. We feel confident that many smart people have conspired to make these activities as safe as possible. We’re so sure of this that it almost never crosses our minds. Sure, occasional accidents happen. But it is remarkably reassuring that these structures can withstand quite a bit of the unexpected.

So how do we make ourselves a little more resilient? Less susceptible to the vagaries of change? Turns out that engineers aren’t the only ones obsessed with building in margins of safety. Spies are pretty good at it, too, and we can learn a lot from them.

Operation Kronstadt, by Harry Ferguson, chronicles the remarkable story of Paul Dukes, the only British secret agent working in Russia in 1919, and the equally amazing adventures of the small team that was sent in to rescue him.

Paul Dukes was not an experienced spy. He was actually a pianist. It was his deep love of Russian culture that led to him to approach his government and volunteer for the mission of collecting information on Bolshevik activities in St. Petersburg. As Ferguson writes, “Paul had no military experience, let alone any experience of intelligence work and yet they were going to send him back into one of the toughest espionage environments in the world.”

However, MI6, the part of British Intelligence that Paul worked for, wasn’t exactly the powerful and well-prepared agency that it’s portrayed as today. Consider this description by Ferguson: “having dragged Paul out of Russia, MI6 did not appear to have given much thought to how he should get back or how he would survive once he got there: ‘As to the means whereby you gain access to the country, under what cover you will live there, and how you will send out reports, we shall leave it to you, being best informed as to the conditions’.”

So off went Paul into Russia, not as a musician but as a spy. No training, no gadgets, no emergency network, no safe houses. Just a bunch of money and sentiments of ‘good luck’. So it is all the more amazing that Paul Dukes turned out to be an excellent spy. After reading his story, I think the primary reason for this is that he learned extremely quickly from his experiences. One of the things he learned quickly was how to build margins of safety into his tradecraft.

There is no doubt that the prospect of death wakes us up. We don’t often think about how dangerous something can be until we almost die doing it. Then, thanks to our big brains that let us learn from experience, we adapt. We recognize that if we don’t, we might not be so lucky next time. And no one wants to rely on luck as a survival strategy.

This is where margins of safety come in. We build them to reduce the precariousness of chance.

Imagine you are in St. Petersburg in 1919. What you have going for you is that you speak the language, understand the culture, and know the streets. Your major problem is that you have no idea how to start this spying thing. How do you get contacts and build a network in a city that is under psychological siege? The few names you have been given come from dubious sources at the border, and the people attached to those names may have been compromised, arrested, or both. You have nowhere to sleep at night, and although you have some money, it can’t buy anything, not even food, because there is nothing for sale. The whole country is on rations.

Not to mention, if by some miracle you actually get a few good contacts who give you useful information, how do you get it home? There are no cell phones or satellites. Your passport is fake and won’t hold up to any intense scrutiny, yet all your intelligence has to be taken out by hand from a country that has sealed its borders. And it’s 1919. You can’t hop on a plane or drive a car. Train or foot are your only options.

This is what Paul Dukes faced. Daunting to be sure. Which is why his ultimate success reads like the improbable plot of a Hollywood movie. Although he made mistakes, he learned from them as they were happening.

Consider this tense moment as described by Ferguson:

The doorbell in the flat rang loudly and Paul awoke with a start.

He had slept late. Stepanova had kindly allowed him sleep in one of the spare beds and she had even found him an old pair of Ivan’s pyjamas. There were no sheets, but there were plenty of blankets and Paul had been cosy and warm. Now it was 7.45 a.m., and here he was half-asleep and without his clothes. Suppose it was the Cheka [Russian Bolshevik Police] at the door? In a panic he realised that he had no idea what to do. The windows of the apartment were too high for him to jump from and like a fool he had chosen a hiding place with no other exits. … He was reduced to waiting nervously as he stood in Ivan’s pyjamas whilst Stepanova shuffled to the door to find out who it was. As he stood there with his stomach in knots, Paul swore that he would never again sleep in a place from which there was only one exit.

One exit was good enough for normal, anticipated use. But one exit wouldn’t allow him to adapt to the unexpected, the unusual load produced by the appearance of the state police. So from then on, his sleeping accommodations were chosen with a minimum margin of safety of two exits.

This type of thinking dictated a lot of his actions. He never stayed at the same house more than two nights in a row, and often moved after just one night. He arranged for the occupants to signal him, such as by placing a plant in the window, if they believed the house was unsafe. He siloed knowledge as much as he could, never letting the occupants of one safe house know about the others. Furthermore, as Ferguson writes:

He also arranged a back-up plan in case the Cheka finally got him. He had to pick one trustworthy agent … and soon Paul began entrusting her with all the details of his movements and told her at which safe house he would be sleeping so that if he did disappear MI6 would have a better idea of who had betrayed him. He even used her as part of his courier service and she hid all his reports in the float while he was waiting for someone who could take them out of the country.

Admittedly this plan didn’t provide a large margin of safety, but at least he wasn’t so arrogant as to assume he was never going to get captured.

Large margins of safety are not always possible. Sometimes they are too expensive. Sometimes they are not available. Dukes liked to have an extra identity handy should some of his dubious contacts turn him in, but this wasn’t always an option in a country that changed identity papers frequently. Most important, though, he was aware that planning for the unexpected was his best chance of staying alive, even if he couldn’t always put in place as large a margin of safety as he would have liked. And survival was a daily challenge, not something to take for granted.

The disaster at the Fukushima nuclear power plant taught us a lot about being cavalier regarding margins of safety. The unexpected is just that: not anticipated. That doesn’t mean it is impossible or even improbable. The unexpected is not the worst thing that has happened before. It is the worst thing, given realistic parameters such as the laws of physics, that could happen.

In the Fukushima case, the margin of safety was good enough to deal with the weather of the recent past. But preparing for the worst we have seen is not the same as preparing for the worst.

The Fukushima power plant was overwhelmed by a tsunami, creating a nuclear disaster on par with Chernobyl. Given the seismic activity in the area, although a tsunami wasn’t predictable, it was certainly possible. The plant could have been designed with a margin of safety to better withstand a tsunami. It wasn’t. Why? Because redundancy is expensive. That’s the trade-off. You are safer, but it costs more money.

Sometimes when the stakes are low, we decide the trade-off isn’t worth it. For instance, maybe we wouldn’t pay to insure a wedding ring that wasn’t expensive. You would think, however, that power plants wouldn’t cut it close. The consequences of a lost ring are some emotional pain and the cost of a new one. The consequences of a nuclear accident are exponentially higher. Lives are lost, and the environment corrupted. In the Fukushima case, the world will be dealing with the negative effects for a long time.

What decisions would you make differently if you were factoring safety margins into your life? To be fair, you can’t put them everywhere. Otherwise, your life might be all margin and no living. But you can identify the maximum load your life is currently designed to withstand and figure out how close to it you are coming.

For example, having your expenses equal 100 percent of your income is allowing you no flexibility in the load you have to carry. A job loss, a bad flood in your neighborhood, or significant sickness are all unexpected events that would change the load your financial structure has to support. Without a margin of safety, such as a healthy savings or investment account, you could find your structure collapsing, compromising the roof over your head.

The idea is to identify the unlikely but possible risks to your survival and build margins of safety that will allow you to continue your lifestyle should these things come to pass. That way, a missed paycheck will be easily absorbed instead of jeopardizing your ability to put food on the table.

To figure out where else you should build margins of safety into your life, think of the times you’ve been terrified and desperate. Those might be good places to start learning from experience and making the most of your second chances.

Margin of Safety: An Introduction to the Mental Model

Previously on Farnam Street, we covered the idea of Redundancy — a central concept in both the world of engineering and in practical life. Today we’re going to explore a related concept: Margin of Safety.

The margin of safety is another concept rooted in engineering and quality control. Let’s start there, then see where else our model might apply in practical life, and lastly, where it might have limitations.

* * *

Consider a highly-engineered jet engine part. If the part were to fail, the engine would also fail, perhaps at the worst possible moment—while in flight with passengers on board. Like most jet engine parts, let us assume the part is replaceable over time—though we don’t want to replace it too often (creating prohibitively high costs), we don’t expect it to last the lifetime of the engine. We design the part for 10,000 hours of average flying time.

That brings us to a central question: After how many hours of service do we replace this critical part? The easily available answer might be 9,999 hours. Why replace it any sooner than we have to? Wouldn’t that be a waste of money?

The first problem is, we know nothing of the composition of the 10,000 hours any individual part has gone through. Were they 10,000 particularly tough hours, filled with turbulent skies? Was it all relatively smooth sailing? Somewhere in the middle?

Just as importantly, how confident are we that the part will really last the full 10,000 hours? What if it had a slight flaw during manufacturing? What if we made an assumption about its reliability that was not conservative enough? What if the material degraded in bad weather to a degree we didn’t foresee?

The challenge is clear, and the implication obvious: we do not wait until the part has been in service for 9,999 hours. Perhaps at 7,000 hours, we seriously consider replacing the part, and we put a hard stop at 7,500 hours.

The difference between waiting until the last minute and replacing it comfortably early gives us a margin of safety. The sooner we replace the part, the more safety we have—by not pushing the boundaries, we leave ourselves a cushion. (Ever notice how your gas tank indicator goes on long before you’re really on empty? It’s the same idea.)

The principle is essential in bridge building. Let’s say we calculate that, on an average day, a proposed bridge will be required to support 5,000 tons at any one time. Do we build the structure to withstand 5,001 tons? I’m not interested in driving on that bridge. What if we get a day with much heavier traffic than usual? What if our calculations and estimates are little off? What if the material weakens over time at a rate faster than we imagined? To account for these, we build the bridge to support 20,000 tons. Only now do we have a margin of safety.

This fundamental engineering principle is useful in many practical areas of life, even for non-engineers. Let’s look at one we all face.

* * *

Take a couple earning $100,000 per year after taxes, or about $8,300 per month. In designing their life, they must necessarily decide what standard of living to enjoy. (The part which can be quantified, anyway.) What sort of monthly expenses should they allow themselves to accumulate?

One all-too-familiar approach is to build in monthly expenses approaching $8,000. A $4,000 mortgage, $1,000 worth of car payments, $1,000/month for private schools…and so on. The couple rationalizes that they have “earned” the right live large.

However, what if there are some massive unexpected expenditures thrown their way? (In the way life often does.) What if one of them lost their job and their combined monthly income dropped to $4,000?

The couple must ask themselves whether the ensuing misery is worth the lavish spending. If they kept up their $8,000/month spending habit after a loss of income, they would have to choose between two difficult paths: Rapidly eating into their savings or considerably downsizing their life. Either is likely to cause extreme misery from the loss of long-held luxuries.

Thinking in reverse, how can we avoid the potential misery?

A common refrain is to tell the couple to make sure they’ve stashed away some money in case of emergency, to provide a buffer. Often there is a specific multiple of current spending we’re told to have in reserve—perhaps 6-12 months. In this case, savings of $48,000-$96,000 should suffice.

However, is there a way we can build them a much larger margin for error?

Let’s say the couple decides instead to permanently limit their monthly spending to $4,000 by owning a smaller house, driving less expensive cars, and trusting their public schools. What happens?

Our margin of safety now compounds. Obviously, a savings rate exceeding 50% will rapidly accumulate in their favor — $4,300 put away by the first month, $8,600 by the second month, and so on. The mere act of systematically underspending their income rapidly gives them a cushion without much trying. If an unexpected expenditure comes up, they’ll almost certainly be ready.

The unseen benefit, and the extra margin of safety in this choice comes if either spouse loses their income – either by choice (perhaps to care for a child) or by bad luck (health issues). In this case, not only has a high savings rate accumulated in their favor but because their spending is systematically low, they are able to avoid tapping it altogether! Their savings simply stop growing temporarily while they live on one income. This sort of “belt and suspenders” solution is the essence of margin-of-safety thinking.

(On a side note: Let’s take it even one step further. Say their former $8,000 monthly spending rate meant they probably could not retire until age 70, given their current savings rate, investment choices, and desired lifestyle post-retirement. Reducing their needs to $4,000 not only provides them much needed savings, quickly accelerating their retirement date, but they now need even less to retire on in the first place. Retiring at 70 can start to look like retiring at 45 in a hurry.)

* * *

Clearly, the margin of safety model is very powerful and we’re wise to use it whenever possible to avoid failure. But it has limitations.

One obvious issue, most salient in the engineering world, comes in the tradeoff with time and money. Given an unlimited runway of time and the most expensive materials known to mankind, it’s likely that we could “fail-proof” many products to such a ridiculous degree as to be impractical in the modern world.

For example, it’s possible to imagine Boeing designing a plane that would have a fail rate indistinguishable from zero, with parts being replaced 10% into their useful lives, built with rare but super-strong materials, etc.—so long as the world was willing to pay $25,000 for a coach seat from Boston to Chicago. Given the impracticability of that scenario, our tradeoff has been to accept planes that are not “fail-proof,” but merely extremely unlikely to fail, in order to give the world safe enough air travel at an affordable cost. This tradeoff has been enormously wise and helpful to the world. Simply put, the margin-of-safety idea can be pushed into farce without careful judgment.

* * *

This brings us to another limitation of the model, which is the failure to engage in “total systems” thinking. I’m reminded of a quote I’ve used before at Farnam Street:

The reliability that matters is not the simple reliability of one component of a system,
but the final reliability of the total control system
— Garrett Hardin in Filters Against Folly

Let’s return to the Boeing analogy. Say we did design the safest and most reliable jet airplane imaginable, with parts that would not fail in one billion hours of flight time under the most difficult weather conditions imaginable on Earth—and then let it be piloted by a drug addict high on painkillers.

The problem is that the whole flight system includes much more than just the reliability of the plane itself. Just because we built in safety margins in one area does not mean the system will not fail. This illustrates not so much a failure of the model itself, but a common mistake in the way the model is applied.

* * *

Which brings us to a final issue with the margin of safety model—naïve extrapolation of past data. Let’s look at a common insurance scenario to illustrate this one.

Suppose we have a 100-year-old reinsurance company – PropCo – which reinsures major primary insurers in the event of property damage in California caused by a catastrophe – most worrying being an earthquake and its aftershocks. Throughout its entire (long) history, PropCo had never experienced a yearly loss on this sort of coverage worse than $1 billion. Most years saw no loss worse than $250 million, and in fact, many years had no losses at all – giving them comfortable profit margins.

Thinking like engineers, the directors of PropCo insisted that the company have such a strong financial position so that they could safely cover a loss twice as bad as anything ever encountered. Given their historical losses, the directors believed this extra capital would give PropCo a comfortable “margin of safety” against the worst case. Right?

However, our directors missed a few crucial details. The $1 billion loss, the insurer’s worst, had been incurred in the year 1994 during the Northridge earthquake. Since then, the building density of Californian cities had increased significantly, and due to ongoing budget issues and spreading fraud, strict building codes had not been enforced. Considerable inflation in the period since 1994 also ensured that losses per damaged square foot would be far higher than ever faced previously.

With these conditions present, let’s propose that California is hit with an earthquake reading 7.0 on the Richter scale, with an epicenter 10 miles outside of downtown LA. PropCo faces a bill of $5 billion – not twice as bad, but five times as bad as it had ever faced. In this case, PropCo fails.

This illustration (which recurs every so often in the insurance field) shows the limitation of naïvely assuming a margin of safety is present based on misleading or incomplete past data.

* * *

Margin of safety is an important component to some decisions and life. You can think of it as a reservoir to absorb errors or poor luck. Size matters. At least, in this case, bigger is better. And if you need a calculator to figure out how much room you have, you’re doing something wrong.

Margin of safety is part of the Farnam Street Latticework of Mental Models.

Eric Drexler on taking action in the face of limited knowledge

radical abundance

Science pursues answers to questions, but not always the questions that engineering must ask.

The founding father of nanotechnology, Eric Drexler, who aptly described the difference between science and engineering, comments on the central differences between how science and engineering approach solutions in a world of limited knowledge.

Drexler’s explanation, found in his insightful book Radical Abundance: How a Revolution in Nanotechnology Will Change Civilization, discusses how there is a certain amount of ignorance that pervades everything. How then, should we respond? Engineers apply a margin of safety.

Drexler writes:

When faced with imprecise knowledge, a scientist will be inclined to improve it, yet an engineer will routinely accept it. Might predictions be wrong by as much as 10 percent, and for poorly understood reasons? The reasons may pose a difficult scientific puzzle, yet an engineer might see no problem at all. Add a 50 percent margin of safety, and move on.

Safety margins are standard parts of design, and imprecise knowledge is but one of many reasons.

Engineers and scientists ask different questions:

… Accuracy can only be judged with respect to a purpose and engineers often can choose to ask questions for which models give good-enough answers.

The moral of the story: Beware of mistaking the precise knowledge that scientists naturally seek for the reliable knowledge that engineers actually need.


Nature presents puzzles that thwart human understanding.

Some of this is necessary fallibility—some things we simply cannot understand or predict. Just because we want to understand something doesn’t mean it’s within our capacity to do so.

Other problems represent limited understanding and predictability — there are things we simply cannot do yet, for a variety of reasons.

… Predicting the weather, predicting the folding of membrane proteins, predicting how particular molecules will fit together to form a crystal— all of these problems are long-standing areas of research that have achieved substantial but only partial success. In each of these cases, the unpredictable objects of study result from a spontaneous process— evolution, crystallization, atmospheric dynamics— and none has the essential features of engineering design.

What leads to system-level predictability?

— Well-understood parts with predictable local interactions, whether predictability stems from calculation or testing
— Design margins and controlled system dynamics to limit the effects of imprecision and variable conditions
— Modular organization, to facilitate calculation and testing and to insulate subsystems from one another and the external

… When judging engineering concepts, beware of assuming that familiar concerns will cause problems in systems designed to avoid them.

Seeking Unique Answers vs. Seeking Multiple Options

Expanding the range of possibilities plays opposite roles in inquiry and design.

If elephantologists have three viable hypotheses about an animal’s ancestry, at least two hypotheses must be wrong. Discovering yet another possible line of descent creates more uncertainty, not less— now three must be wrong. In science, alternatives represent ignorance.

If automobile engineers have three viable designs for a car’s suspension, all three designs will presumably work. Finding yet another design reduces overall risk and increases the likelihood that at least one of the designs will be excellent. In engineering, alternatives represent options. Not knowing which scientific hypothesis is true isn’t at all like having a choice of engineering solutions. Once again, what may seem like similar questions in science and engineering are more nearly opposite.

Knowledge of options is sometimes mistaken for ignorance of facts.

Remarkably, in engineering, even scientific uncertainty can contribute to knowledge, because uncertainty about scientific facts can suggest engineering options.

Simple, Specific Theories vs. Complex, Flexible Designs

Engineers value what scientists don’t: flexibility.

Science likewise has no use for a theory that can be adjusted to fit arbitrary data, because a theory that fits anything forbids nothing, which is to say that it makes no predictions at all. In developing designs, by contrast, engineers prize flexibility — a design that can be adjusted to fit more requirements can solve more problems. The components of the Saturn V vehicle fit together because the design of each component could be adjusted to fit its role.

In science, a theory should be easy to state and within reach of an individual’s understanding. In engineering, however, a fully detailed design might fill a truck if printed out on paper.

This is why engineers must sometimes design, analyze, and judge concepts while working with descriptions that take masses of detail for granted. A million parameters may be left unspecified, but these parameters represent adjustable engineering options, not scientific uncertainty; they represent, not a uselessly bloated and flexible theory, but a stage in a process that routinely culminates in a fully specified product.

Beware of judging designs as if they were theories in science. An esthetic that demands uniqueness and simplicity is simply misplaced.

Curiosity-Driven Investigation vs. Goal-Oriented Development

Organizational structure differs between scientific and engineering pursuits. The coordination of work isn’t interchangeable.

In science, independent exploration by groups with diverse ideas leads to discovery, while in systems engineering, independent work would lead to nothing of use, because building a tightly integrated system requires tight coordination. Small, independent teams can design simple devices, but never a higher-order system like a passenger jet.

In inquiry, investigator-led, curiosity-driven research is essential and productive. If the goal is to engineer complex products, however, even the most brilliant independent work will reliably produce no results.

The moral of the story: Beware of approaching engineering as if it were science, because this mistake has opportunity costs that reduce the value of science itself.

In closing, Drexler comments on applying the engineering perspective.

Drawing on established knowledge to expand human capabilities, by contrast, requires an intellectual discipline that, in its fullest, high-level form, differs from science in almost every respect.

Radical Abundance: How a Revolution in Nanotechnology Will Change Civilization is worth reading in its entirety.