Tag: Scientific Method

How To Spot Bad Science

In a digital world that clamors for clicks, news is sensationalized and “facts” change all the time. Here’s how to discern what is trustworthy and what is hogwash.

***

Unless you’ve studied it, most of us are never taught how to evaluate science or how to parse the good from the bad. Yet it is something that dictates every area of our lives. It is vital for helping us understand how the world works. It might be too much effort and time to appraise research for yourself, however. Often, it can be enough to consult an expert or read a trustworthy source.

But some decisions require us to understand the underlying science. There is no way around it. Many of us hear about scientific developments from news articles and blog posts. Some sources put the work into presenting useful information. Others manipulate or misinterpret results to get more clicks. So we need the thinking tools necessary to know what to listen to and what to ignore. When it comes to important decisions, like knowing what individual action to take to minimize your contribution to climate change or whether to believe the friend who cautions against vaccinating your kids, being able to assess the evidence is vital.

Much of the growing (and concerning) mistrust of scientific authority is based on a misunderstanding of how it works and a lack of awareness of how to evaluate its quality. Science is not some big immovable mass. It is not infallible. It does not pretend to be able to explain everything or to know everything. Furthermore, there is no such thing as “alternative” science. Science does involve mistakes. But we have yet to find a system of inquiry capable of achieving what it does: move us closer and closer to truths that improve our lives and understanding of the universe.

“Rather than love, than money, than fame, give me truth.”

— Henry David Thoreau

There is a difference between bad science and pseudoscience. Bad science is a flawed version of good science, with the potential for improvement. It follows the scientific method, only with errors or biases. Often, it’s produced with the best of intentions, just by researchers who are responding to skewed incentives.

Pseudoscience has no basis in the scientific method. It does not attempt to follow standard procedures for gathering evidence. The claims involved may be impossible to disprove. Pseudoscience focuses on finding evidence to confirm it, disregarding disconfirmation. Practitioners invent narratives to preemptively ignore any actual science contradicting their views. It may adopt the appearance of actual science to look more persuasive.

While the tools and pointers in this post are geared towards identifying bad science, they will also help with easily spotting pseudoscience.

Good science is science that adheres to the scientific method, a systematic method of inquiry involving making a hypothesis based on existing knowledge, gathering evidence to test if it is correct, then either disproving or building support for the hypothesis. It takes many repetitions of applying this method to build reasonable support for a hypothesis.

In order for a hypothesis to count as such, there must be evidence that, if collected, would disprove it.

In this post, we’ll talk you through two examples of bad science to point out some of the common red flags. Then we’ll look at some of the hallmarks of good science you can use to sort the signal from the noise. We’ll focus on the type of research you’re likely to encounter on a regular basis, including medicine and psychology, rather than areas less likely to be relevant to your everyday life.

[Note: we will use the terms “research” and “science” and “researcher” and “scientist” interchangeably here.]

Power Posing

“The saddest aspect of life right now is that science gathers knowledge faster than society gathers wisdom.” ―Isaac Asimov

First, here’s an example of flawed science from psychology: power posing. A 2010 study by Dana Carney, Andy J. Yap, and Amy Cuddy entitledPower Posing: Brief Nonverbal Displays Effects Neuroendocrine Levels and Risk Tolerance” claimed “open, expansive” poses caused participants to experience elevated testosterone levels, reduced cortisol levels, and greater risk tolerance. These are all excellent things in a high-pressure situation, like a job interview. The abstract concluded that “a person can, via a simple two-minute pose, embody power and instantly become more powerful.” The idea took off. It spawned hundreds of articles, videos, and tweets espousing the benefits of including a two-minute power pose in your day.

Yet at least eleven follow up studies, many led by Joseph Cesario of Michigan State University including “’Power Poses’ Don’t Work, Eleven New Studies Suggest,” failed to replicate the results. None found that power posing has a measurable impact on people’s performance in tasks or on their physiology. While subjects did report a subjective feeling of increased powerfulness, their performance did not differ from subjects who did not strike a power pose.

One of the researchers of the original study, Carney, has since changed her mind about the effect. Carney stated she no longer believe the results of the original study. Unfortunately, this isn’t always how researchers respond when confronted with evidence discrediting their prior work. We all know how uncomfortable changing our minds is.

The notion of power posing is exactly the kind of nugget that spreads fast online. It’s simple, free, promises dramatic benefits with minimal effort, and is intuitive. We all know posture is important. It has a catchy, memorable name. Yet examining the details of the original study reveals a whole parade of red flags. The study had 42 participants. That might be reasonable for preliminary or pilot studies. But is in no way sufficient to “prove” anything. It was not blinded. Feedback from participants was self-reported, which is notorious for being biased and inaccurate.

There is also a clear correlation/causation issue. Powerful, dominant animals tend to use expansive body language that exaggerates their size. Humans often do the same. But that doesn’t mean it’s the pose making them powerful. Being powerful could make them pose that way.

A TED Talk in which Amy Cuddy, the study’s co-author, claimed power posing could “significantly change the way your life unfolds” is one of the most popular to date, with tens of millions of views. The presentation of the science in the talk is also suspect. Cuddy makes strong claims with a single, small study as justification. She portrays power posing as a panacea. Likewise, the original study’s claim that a power pose makes someone “instantly become more powerful” is suspiciously strong.

This is one of the examples of psychological studies related to small tweaks in our behavior that have not stood up to scrutiny. We’re not singling out the power pose study as being unusually flawed or in any way fraudulent. The researchers had clear good intentions and a sincere belief in their work. It’s a strong example of why we should go straight to the source if we want to understand research. Coverage elsewhere is unlikely to even mention methodological details or acknowledge any shortcomings. It would ruin the story. We even covered power posing on Farnam Street in 2016—we’re all susceptible to taking these ‘scientific’ results seriously, without checking on the validity of the underlying science.

It is a good idea to be skeptical of research promising anything too dramatic or extreme with minimal effort, especially without substantial evidence. If it seems too good to be true, it most likely is.

Green Coffee Beans

“An expert is a person who has made all the mistakes that can be made in a very narrow field.” ―Niels Bohr

The world of weight-loss science is one where bad science is rampant. We all know, deep down, that we cannot circumnavigate the need for healthy eating and exercise. Yet the search for a magic bullet, offering results without effort or risks, continues. Let’s take a look at one study that is a masterclass in bad science.

EntitledRandomized, Double-Blind, Placebo-Controlled, Linear Dose, Crossover Study to Evaluate the Efficacy and Safety of a Green Coffee Bean Extract in Overweight Subjects,” it was published in 2012 in the journal Diabetes, Metabolic Syndrome and Obesity: Targets and Therapy. On the face of it, and to the untrained eye, the study may appear legitimate, but it is rife with serious problems, as Scott Gavura explained in the article “Dr. Oz and Green Coffee Beans – More Weight Loss Pseudoscience” in the publication Science-Based Medicine. The original paper was later retracted by its authors. The Federal Trade Commission (FTC) ordered the supplement manufacturer who funded the study to pay a $3.5 million fine for using it in their marketing materials, describing it as “botched.”

The Food and Drug Administration (FDA) recommends studies relating to weight-loss consist of at least 3,000 participants receiving the active medication and at least 1,500 receiving a placebo, all for a minimum period of 12 months. This study used a mere 16 subjects, with no clear selection criteria or explanation. None of the researchers involved had medical experience or had published related research. They did not disclose the conflict of interest inherent in the funding source. It didn’t cover efforts to avoid any confounding factors. It is vague about whether subjects changed their diet and exercise, showing inconsistencies. The study was not double-blinded, despite claiming to be. It has not been replicated.

The FTC reported that the study’s lead investigator “repeatedly altered the weights and other key measurements of the subjects, changed the length of the trial, and misstated which subjects were taking the placebo or GCA during the trial.” A meta-analysis by Rachel Buchanan and Robert D. Beckett, “Green Coffee for Pharmacological Weight Loss” published in the Journal of Evidence-Based Complementary & Alternative Medicine, failed to find evidence for green coffee beans being safe or effective; all the available studies had serious methodological flaws, and most did not comply with FDA guidelines.

Signs of Good Science

“That which can be asserted without evidence can be dismissed without evidence.” ―Christopher Hitchens

We’ve inverted the problem and considered some of the signs of bad science. Now let’s look at some of the indicators a study is likely to be trustworthy. Unfortunately, there is no single sign a piece of research is good science. None of the signs mentioned here are, alone, in any way conclusive. There are caveats and exceptions to all. These are simply factors to evaluate.

It’s Published by a Reputable Journal

“The discovery of instances which confirm a theory means very little if we have not tried, and failed, to discover refutations.” —Karl Popper

A journal, any journal, publishing a study says little about its quality. Some will publish any research they receive in return for a fee. A few so-called “vanity publishers” claim to have a peer-review process, yet they typically have a short gap between receiving a paper and publishing it. We’re talking days or weeks, not the expected months or years. Many predatory publishers do not even make any attempt to verify quality.

No journal is perfect. Even the most respected journals make mistakes and publish low-quality work sometimes. However, anything that is not published research or based on published research in a journal is not worth consideration. Not as science. A blog post saying green smoothies cured someone’s eczema is not comparable to a published study. The barrier is too low. If someone cared enough about using a hypothesis or “finding” to improve the world and educate others, they would make the effort to get it published. The system may be imperfect, but reputable researchers will generally make the effort to play within it to get their work noticed and respected.

It’s Peer Reviewed

Peer review is a standard process in academic publishing. It’s intended as an objective means of assessing the quality and accuracy of new research. Uninvolved researchers with relevant experience evaluate papers before publication. They consider factors like how well it builds upon pre-existing research or if the results are statistically significant. Peer review should be double-blinded. This means the researcher doesn’t know who is reviewing their work and the reviewer doesn’t know who the researcher is.

Publishers only perform a cursory “desk check” before moving onto peer review. This is to check for major errors, nothing more. They cannot have the expertise necessary to vet the quality of every paper they handle—hence the need for external experts. The number of reviewers and strictness of the process depends on the journal. Reviewers either declare a paper unpublishable or suggest improvements. It is rare for them to suggest publishing without modifications.

Sometimes several rounds of modifications prove necessary. It can take years for a paper to see the light of day, which is no doubt frustrating for the researcher. But it ensures no or fewer mistakes or weak areas.

Pseudoscientific practitioners will often claim they cannot get their work published because peer reviewers suppress anything contradicting prevailing doctrines. Good researchers know having their work challenged and argued against is positive. It makes them stronger. They don’t shy away from it.

Peer review is not a perfect system. Seeing as it involves humans, there is always room for bias and manipulation. In a small field, it may be easy for a reviewer to get past the double-blinding. However, as it stands, peer review seems to be the best available system. In isolation, it’s not a guarantee that research is perfect, but it’s one factor to consider.

The Researchers Have Relevant Experience and Qualifications

One of the red flags in the green coffee bean study was that the researchers involved had no medical background or experience publishing obesity-related research.

While outsiders can sometimes make important advances, researchers should have relevant qualifications and a history of working in that field. It is too difficult to make scientific advancements without the necessary background knowledge and expertise. If someone cares enough about advancing a given field, they will study it. If it’s important, verify their backgrounds.

It’s Part of a Larger Body of Work

“Science, my lad, is made up of mistakes, but they are mistakes which it is useful to make, because they lead little by little to the truth.” ―Jules Verne

We all like to stand behind the maverick. But we should be cautious of doing so when it comes to evaluating the quality of science. On the whole, science does not progress in great leaps. It moves along millimeter by millimeter, gaining evidence in increments. Even if a piece of research is presented as groundbreaking, it has years of work behind it.

Researchers do not work in isolation. Good science is rarely, if ever, the result of one person or even one organization. It comes from a monumental collective effort. So when evaluating research, it is important to see if other studies point to similar results and if it is an established field of work. For this reason, meta-analyses, which analyze the combined results of many studies on the same topic, are often far more useful to the public than individual studies. Scientists are humans and they all make mistakes. Looking at a collective body of work helps smooth out any problems. Individual studies are valuable in that they further the field as a whole, allowing for the creation of meta-studies.

Science is about evidence, not reputation. Sometimes well-respected researchers, for whatever reason, produce bad science. Sometimes outsiders produce amazing science. What matters is the evidence they have to support it. While an established researcher may have an easier time getting support for their work, the overall community accepts work on merit. When we look to examples of unknowns who made extraordinary discoveries out of the blue, they always had extraordinary evidence for it.

Questioning the existing body of research is not inherently bad science or pseudoscience. Doing so without a remarkable amount of evidence is.

It Doesn’t Promise a Panacea or Miraculous Cure

Studies that promise anything a bit too amazing can be suspect. This is more common in media reporting of science or in research used for advertising.

In medicine, a panacea is something that can supposedly solve all, or many, health problems. These claims are rarely substantiated by anything even resembling evidence. The more outlandish the claim, the less likely it is to be true. Occam’s razor teaches us that the simplest explanation with the fewest inherent assumptions is most likely to be true. This is a useful heuristic for evaluating potential magic bullets.

It Avoids or at Least Discloses Potential Conflicts of Interest

A conflict of interest is anything that incentivizes producing a particular result. It distorts the pursuit of truth. A government study into the health risks of recreational drug use will be biased towards finding evidence of negative risks. A study of the benefits of breakfast cereal funded by a cereal company will be biased towards finding plenty of benefits. Researchers do have to get funding from somewhere, so this does not automatically make a study bad science. But research without conflicts of interest is more likely to be good science.

High-quality journals require researchers to disclose any potential conflicts of interest. But not all journals do. Media coverage of research may not mention this (another reason to go straight to the source). And people do sometimes lie. We don’t always know how unconscious biases influence us.

It Doesn’t Claim to Prove Anything Based on a Single Study

In the vast majority of cases, a single study is a starting point, not proof of anything. The results could be random chance, or the result of bias, or even outright fraud. Only once other researchers replicate the results can we consider a study persuasive. The more replications, the more reliable the results are. If attempts at replication fail, this can be a sign the original research was biased or incorrect.

A note on anecdotes: they’re not science. Anecdotes, especially from people close to us or those who have a lot of letters behind their name, have a disproportionate clout. But hearing something from one person, no matter how persuasive, should not be enough to discredit published research.

Science is about evidence, not proof. And evidence can always be discredited.

It Uses a Reasonable, Representative Sample Size

A representative sample represents the wider population, not one segment of it. If it does not, then the results may only be relevant for people in that demographic, not everyone. Bad science will often also use very small sample sizes.

There is no set target for what makes a large enough sample size; it all depends on the nature of the research. In general, the larger, the better. The exception is in studies that may put subjects at risk, which use the smallest possible sample to achieve usable results.

In areas like nutrition and medicine, it’s also important for a study to last a long time. A study looking at the impact of a supplement on blood pressure over a week is far less useful than one over a decade. Long-term data smooths out fluctuations and offers a more comprehensive picture.

The Results Are Statistically Significant

Statistical significance refers to the likelihood, measured in a percentage, that the results of a study were not due to pure random chance. The threshold for statistical significance varies between fields. Check if the confidence interval is in the accepted range. If it’s not, it’s not worth paying attention to.

It Is Well Presented and Formatted

“When my information changes, I alter my conclusions. What do you do, sir?” ―John Maynard Keynes

As basic as it sounds, we can expect good science to be well presented and carefully formatted, without prominent typos or sloppy graphics.

It’s not that bad presentation makes something bad science. It’s more the case that researchers producing good science have an incentive to make it look good. As Michael J. I. Brown of Monash University explains in How to Quickly Spot Dodgy Science, this is far more than a matter of aesthetics. The way a paper looks can be a useful heuristic for assessing its quality. Researchers who are dedicated to producing good science can spend years on a study, fretting over its results and investing in gaining support from the scientific community. This means they are less likely to present work looking bad. Brown gives an example of looking at an astrophysics paper and seeing blurry graphs and misplaced image captions—then finding more serious methodological issues upon closer examination. In addition to other factors, sloppy formatting can sometimes be a red flag. At the minimum, a thorough peer-review process should eliminate glaring errors.

It Uses Control Groups and Double-Blinding

A control group serves as a point of comparison in a study. The control group should be people as similar as possible to the experimental group, except they’re not subject to whatever is being tested. The control group may also receive a placebo to see how the outcome compares.

Blinding refers to the practice of obscuring which group participants are in. For a single-blind experiment, the participants do not know if they are in the control or the experimental group. In a double-blind experiment, neither the participants nor the researchers know. This is the gold standard and is essential for trustworthy results in many types of research. If people know which group they are in, the results are not trustworthy. If researchers know, they may (unintentionally or not) nudge participants towards the outcomes they want or expect. So a double-blind study with a control group is far more likely to be good science than one without.

It Doesn’t Confuse Correlation and Causation

In the simplest terms, two things are correlated if they happen at the same time. Causation is when one thing causes another thing to happen. For example, one large-scale study entitled “Are Non-Smokers Smarter than Smokers?” found that people who smoke tobacco tend to have lower IQs than those who don’t. Does this mean smoking lowers your IQ? It might, but there is also a strong link between socio-economic status and smoking. People of low income are, on average, likely to have lower IQ than those with higher incomes due to factors like worse nutrition, less access to education, and sleep deprivation. A study by the Centers for Disease Control and Prevention entitled “Cigarette Smoking and Tobacco Use Among People of Low Socioeconomic Status,” people of low socio-economic status are also more likely to smoke and to do so from a young age. There might be a correlation between smoking and IQ, but that doesn’t mean causation.

Disentangling correlation and causation can be difficult, but good science will take this into account and may detail potential confounding factors of efforts made to avoid them.

Conclusion

“The scientist is not a person who gives the right answers, he’s one who asks the right questions.” ―Claude Lévi-Strauss

The points raised in this article are all aimed at the linchpin of the scientific method—we cannot necessarily prove anything; we must consider the most likely outcome given the information we have. Bad science is generated by those who are willfully ignorant or are so focused on trying to “prove” their hypotheses that they fudge results and cherry-pick to shape their data to their biases. The problem with this approach is that it transforms what could be empirical and scientific into something subjective and ideological.

When we look to disprove what we know, we are able to approach the world with a more flexible way of thinking. If we are unable to defend what we know with reproducible evidence, we may need to reconsider our ideas and adjust our worldviews accordingly. Only then can we properly learn and begin to make forward steps. Through this lens, bad science and pseudoscience are simply the intellectual equivalent of treading water, or even sinking.

Article Summary

  • Most of us are never taught how to evaluate science or how to parse the good from the bad. Yet it is something that dictates every area of our lives.
  • Bad science is a flawed version of good science, with the potential for improvement. It follows the scientific method, only with errors or biases.
  • Pseudoscience has no basis in the scientific method. It does not attempt to follow standard procedures for gathering evidence. The claims involved may be impossible to disprove.
  • Good science is science that adheres to the scientific method, a systematic method of inquiry involving making a hypothesis based on existing knowledge, gathering evidence to test if it is correct, then either disproving or building support for the hypothesis.
  • Science is about evidence, not proof. And evidence can always be discredited.
  • In science, if it seems too good to be true, it most likely is.

Signs of good science include:

  • It’s Published by a Reputable Journal
  • It’s Peer Reviewed
  • The Researchers Have Relevant Experience and Qualifications
  • It’s Part of a Larger Body of Work
  • It Doesn’t Promise a Panacea or Miraculous Cure
  • It Avoids or at Least Discloses Potential Conflicts of Interest
  • It Doesn’t Claim to Prove Anything Based on a Single Study
  • It Uses a Reasonable, Representative Sample Size
  • The Results Are Statistically Significant
  • It Is Well Presented and Formatted
  • It Uses Control Groups and Double-Blinding
  • It Doesn’t Confuse Correlation and Causation

Karl Popper on The Line Between Science and Pseudoscience

It’s not immediately clear, to the layman, what the essential difference is between science and something masquerading as science: pseudoscience. The distinction gets at the core of what comprises human knowledge: How do we actually know something to be true? Is it simply because our powers of observation tell us so? Or is there more to it?

Sir Karl Popper (1902-1994), the scientific philosopher, was interested in the same problem. How do we actually define the scientific process? How do we know which theories can be said to be truly explanatory?

3833724834_397c34132c_z

He began addressing it in a lecture, which is printed in the book Conjectures and Refutations: The Growth of Scientific Knowledge (also available online):

When I received the list of participants in this course and realized that I had been asked to speak to philosophical colleagues I thought, after some hesitation and consultation, that you would probably prefer me to speak about those problems which interest me most, and about those developments with which I am most intimately acquainted. I therefore decided to do what I have never done before: to give you a report on my own work in the philosophy of science, since the autumn of 1919 when I first began to grapple with the problem, ‘When should a theory be ranked as scientific?’ or ‘Is there a criterion for the scientific character or status of a theory?’

Popper saw a problem with the number of theories he considered non-scientific that, on their surface, seemed to have a lot in common with good, hard, rigorous science. But the question of how we decide which theories are compatible with the scientific method, and those which are not, was harder than it seemed.

***

It is most common to say that science is done by collecting observations and grinding out theories from them. Charles Darwin once said, after working long and hard on the problem of the Origin of Species,

My mind seems to have become a kind of machine for grinding general laws out of large collections of facts.

This is a popularly accepted notion. We observe, observe, and observe, and we look for theories to best explain the mass of facts. (Although even this is not really true: Popper points out that we must start with some a priori knowledge to be able to generate new knowledge. Observation is always done with some hypotheses in mind–we can’t understand the world from a totally blank slate. More on that another time.)

The problem, as Popper saw it, is that some bodies of knowledge more properly named pseudosciences would be considered scientific if the “Observe & Deduce” operating definition were left alone. For example, a believing astrologist can ably provide you with “evidence” that their theories are sound. The biographical information of a great many people can be explained this way, they’d say.

The astrologist would tell you, for example, about how “Leos” seek to be the centre of attention; ambitious, strong, seeking the limelight. As proof, they might follow up with a host of real-life Leos: World-leaders, celebrities, politicians, and so on. In some sense, the theory would hold up. The observations could be explained by the theory, which is how science works, right?

Sir Karl ran into this problem in a concrete way because he lived during a time when psychoanalytic theories were all the rage at just the same time Einstein was laying out a new foundation for the physical sciences with the concept of relativity. What made Popper uncomfortable were comparisons between the two. Why did he feel so uneasy putting Marxist theories and Freudian psychology in the same category of knowledge as Einstein’s Relativity? Did all three not have vast explanatory power in the world? Each theory’s proponents certainly believed so, but Popper was not satisfied.

It was during the summer of 1919 that I began to feel more and more dissatisfied with these three theories–the Marxist theory of history, psychoanalysis, and individual psychology; and I began to feel dubious about their claims to scientific status. My problem perhaps first took the simple form, ‘What is wrong with Marxism, psycho-analysis, and individual psychology? Why are they so different from physical theories, from Newton’s theory, and especially from the theory of relativity?’

I found that those of my friends who were admirers of Marx, Freud, and Adler, were impressed by a number of points common to these theories, and especially by their apparent explanatory power. These theories appeared to be able to explain practically everything that happened within the fields to which they referred. The study of any of them seemed to have the effect of an intellectual conversion or revelation, opening your eyes to a new truth hidden from those not yet initiated. Once your eyes were thus opened you saw confirming instances everywhere: the world was full of verifications of the theory.

Whatever happened always confirmed it. Thus its truth appeared manifest; and unbelievers were clearly people who did not want to see the manifest truth; who refused to see it, either because it was against their class interest, or because of their repressions which were still ‘un-analysed’ and crying aloud for treatment.

Here was the salient problem: The proponents of these new sciences saw validations and verifications of their theories everywhere. If you were having trouble as an adult, it could always be explained by something your mother or father had done to you when you were young, some repressed something-or-other that hadn’t been analysed and solved. They were confirmation bias machines.

What was the missing element? Popper had figured it out before long: The non-scientific theories could not be falsified. They were not testable in a legitimate way. There was no possible objection that could be raised which would show the theory to be wrong.

In a true science, the following statement can be easily made: “If happens, it would show demonstrably that theory is not true.” We can then design an experiment, a physical one or sometimes a simple thought experiment, to figure out if actually does happen It’s the opposite of looking for verification; you must try to show the theory is incorrect, and if you fail to do so, thereby strengthen it.

Pseudosciences cannot and do not do this–they are not strong enough to hold up. As an example, Popper discussed Freud’s theories of the mind in relation to Alfred Adler’s so-called “individual psychology,” which was popular at the time:

I may illustrate this by two very different examples of human behaviour: that of a man who pushes a child into the water with the intention of drowning it; and that of a man who sacrifices his life in an attempt to save the child. Each of these two cases can be explained with equal ease in Freudian and in Adlerian terms. According to Freud the first man suffered from repression (say, of some component of his Oedipus complex), while the second man had achieved sublimation. According to Adler the first man suffered from feelings of inferiority (producing perhaps the need to prove to himself that he dared to commit some crime), and so did the second man (whose need was to prove to himself that he dared to rescue the child). I could not think of any human behaviour which could not be interpreted in terms of either theory. It was precisely this fact–that they always fitted, that they were always confirmed–which in the eyes of their admirers constituted the strongest argument in favour of these theories. It began to dawn on me that this apparent strength was in fact their weakness.

Popper contrasted these theories against Relativity, which made specific, verifiable predictions, giving the conditions under which the predictions could be shown false. It turned out that Einstein’s predictions came to be true when tested, thus verifying the theory through attempts to falsify it. But the essential nature of the theory gave grounds under which it could have been wrong. To this day, physicists seek to figure out where Relativity breaks down in order to come to a more fundamental understanding of physical reality. And while the theory may eventually be proven incomplete or a special case of a more general phenomenon, it has still made accurate, testable predictions that have led to practical breakthroughs.

Thus, in Popper’s words, science requires testability: “If observation shows that the predicted effect is definitely absent, then the theory is simply refuted.”  This means a good theory must have an element of risk to it. It must be able to be proven wrong under stated conditions.

From there, Popper laid out his essential conclusions, which are useful to any thinker trying to figure out if a theory they hold dear is something that can be put in the scientific realm:

1. It is easy to obtain confirmations, or verifications, for nearly every theory–if we look for confirmations.

2. Confirmations should count only if they are the result of risky predictions; that is to say, if, unenlightened by the theory in question, we should have expected an event which was incompatible with the theory–an event which would have refuted the theory.

3. Every ‘good’ scientific theory is a prohibition: it forbids certain things to happen. The more a theory forbids, the better it is.

4. A theory which is not refutable by any conceivable event is nonscientific. Irrefutability is not a virtue of a theory (as people often think) but a vice.

5. Every genuine test of a theory is an attempt to falsify it, or to refute it. Testability is falsifiability; but there are degrees of testability: some theories are more testable, more exposed to refutation, than others; they take, as it were, greater risks.

6. Confirming evidence should not count except when it is the result of a genuine test of the theory; and this means that it can be presented as a serious but unsuccessful attempt to falsify the theory. (I now speak in such cases of ‘corroborating evidence’.)

7. Some genuinely testable theories, when found to be false, are still upheld by their admirers–for example by introducing ad hoc some auxiliary assumption, or by re-interpreting the theory ad hoc in such a way that it escapes refutation. Such a procedure is always possible, but it rescues the theory from refutation only at the price of destroying, or at least lowering, its scientific status. (I later described such a rescuing operation as a ‘conventionalist twist’ or a ‘conventionalist stratagem’.)

One can sum up all this by saying that the criterion of the scientific status of a theory is its falsifiability, or refutability, or testability.

Finally, Popper was careful to say that it is not possible to prove that Freudianism was not true, at least in part. But we can say that we simply don’t know whether it’s true because it does not make specific testable predictions. It may have many kernels of truth in it, but we can’t tell. The theory would have to be restated.

This is the essential “line of demarcation, as Popper called it, between science and pseudoscience.

Richard Feynman Teaches you the Scientific Method

The scientific method refers to a process of thought based on integrating previous knowledge, observing, measuring, and logical reasoning.

“If it disagrees with experiment, it’s wrong. In that simple statement is the key to science.”

— Richard Feynman

In this short video taken from his lectures, Physicist Richard Feynman offers perhaps one of the greatest definitions of science and the scientific method that I’ve ever heard. And he does it in about a minute.

Now I’m going to discuss how we would look for a new law. In general, we look for a new law by the following process. First, we guess it (audience laughter), no, don’t laugh, that’s the truth. Then we compute the consequences of the guess, to see what, if this is right, if this law we guess is right, to see what it would imply and then we compare the computation results to nature or we say compare to experiment or experience, compare it directly with observations to see if it works.

If it disagrees with experiment, it’s wrong. In that simple statement is the key to science. It doesn’t make any difference how beautiful your guess is, it doesn’t matter how smart you are who made the guess, or what his name is … If it disagrees with experiment, it’s wrong. That’s all there is to it.

For more color watch the longer version below, which offers the next 9 minutes of the lecture. In this clip Feynman explains that guessing is not unscientific: “It is not unscientific to take a guess, although many people who are not in science believe that it is.”

The Scientific Method is part of the Farnam Street Latticework of Mental Models.