Tag: Algorithms

A Primer on Algorithms and Bias

The growing influence of algorithms on our lives means we owe it to ourselves to better understand what they are and how they work. Understanding how the data we use to inform algorithms influences the results they give can help us avoid biases and make better decisions.

***

Algorithms are everywhere: driving our cars, designing our social media feeds, dictating which mixer we end up buying on Amazon, diagnosing diseases, and much more.

Two recent books explore algorithms and the data behind them. In Hello World: Being Human in the Age of Algorithms, mathematician Hannah Fry shows us the potential and the limitations of algorithms. And Invisible Women: Data Bias in a World Designed for Men by writer, broadcaster, and feminist activist Caroline Criado Perez demonstrates how we need to be much more conscientious of the quality of the data we feed into them.

Humans or algorithms?

First, what is an algorithm? Explanations of algorithms can be complex. Fry explains that at their core, they are defined as step-by-step procedures for solving a problem or achieving a particular end. We tend to use the term to refer to mathematical operations that crunch data to make decisions.

When it comes to decision-making, we don’t necessarily have to choose between doing it ourselves and relying wholly on algorithms. The best outcome may be a thoughtful combination of the two.

We all know that in certain contexts, humans are not the best decision-makers. For example, when we are tired, or when we already have a desired outcome in mind, we may ignore relevant information. In Thinking, Fast and Slow, Daniel Kahneman gave multiple examples from his research with Amos Tversky that demonstrated we are heavily influenced by cognitive biases such as availability and anchoring when making certain types of decisions. It’s natural, then, that we would want to employ algorithms that aren’t vulnerable to the same tendencies. In fact, their main appeal for use in decision-making is that they can override our irrationalities.

Algorithms, however, aren’t without their flaws. One of the obvious ones is that because algorithms are written by humans, we often code our biases right into them. Criado Perez offers many examples of algorithmic bias.

For example, an online platform designed to help companies find computer programmers looks through activity such as sharing and developing code in online communities, as well as visiting Japanese manga (comics) sites. People visiting certain sites with frequency received higher scores, thus making them more visible to recruiters.

However, Criado Perez presents the analysis of this recruiting algorithm by Cathy O’Neil, scientist and author of Weapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracy, who points out that “women, who do 75% of the world’s unpaid care work, may not have the spare leisure time to spend hours chatting about manga online . . . and if, like most of techdom, that manga site is dominated by males and has a sexist tone, a good number of women in the industry will probably avoid it.”

Criado Perez postulates that the authors of the recruiting algorithm didn’t intend to encode a bias that discriminates against women. But, she says, “if you aren’t aware of how those biases operate, if you aren’t collecting data and taking a little time to produce evidence-based processes, you will continue to blindly perpetuate old injustices.”

Fry also covers algorithmic bias and asserts that “wherever you look, in whatever sphere you examine, if you delve deep enough into any system at all, you’ll find some kind of bias.” We aren’t perfect—and we shouldn’t expect our algorithms to be perfect, either.

In order to have a conversation about the value of an algorithm versus a human in any decision-making context, we need to understand, as Fry explains, that “algorithms require a clear, unambiguous idea of exactly what we want them to achieve and a solid understanding of the human failings they are replacing.”

Garbage in, garbage out

No algorithm is going to be successful if the data it uses is junk. And there’s a lot of junk data in the world. Far from being a new problem, Criado Perez argues that “most of recorded human history is one big data gap.” And that has a serious negative impact on the value we are getting from our algorithms.

Criado Perez explains the situation this way: We live in “a world [that is] increasingly reliant on and in thrall to data. Big data. Which in turn is panned for Big Truths by Big Algorithms, using Big Computers. But when your data is corrupted by big silences, the truths you get are half-truths, at best.”

A common human bias is one regarding the universality of our own experience. We tend to assume that what is true for us is generally true across the population. We have a hard enough time considering how things may be different for our neighbors, let alone for other genders or races. It becomes a serious problem when we gather data about one subset of the population and mistakenly assume that it represents all of the population.

For example, Criado Perez examines the data gap in relation to incorrect information being used to inform decisions about safety and women’s bodies. From personal protective equipment like bulletproof vests that don’t fit properly and thus increase the chances of the women wearing them getting killed to levels of exposure to toxins that are unsafe for women’s bodies, she makes the case that without representative data, we can’t get good outputs from our algorithms. She writes that “we continue to rely on data from studies done on men as if they apply to women. Specifically, Caucasian men aged twenty-five to thirty, who weigh 70 kg. This is ‘Reference Man’ and his superpower is being able to represent humanity as whole. Of course, he does not.” Her book contains a wide variety of disciplines and situations where the gender gap in data leads to increased negative outcomes for women.

The limits of what we can do

Although there is a lot we can do better when it comes to designing algorithms and collecting the data sets that feed them, it’s also important to consider their limits.

We need to accept that algorithms can’t solve all problems, and there are limits to their functionality. In Hello World, Fry devotes a chapter to the use of algorithms in justice. Specifically, algorithms designed to provide information to judges about the likelihood of a defendant committing further crimes. Our first impulse is to say, “Let’s not rely on bias here. Let’s not have someone’s skin color or gender be a key factor for the algorithm.” After all, we can employ that kind of bias just fine ourselves. But simply writing bias out of an algorithm is not as easy as wishing it so. Fry explains that “unless the fraction of people who commit crimes is the same in every group of defendants, it is mathematically impossible to create a test which is equally accurate at predicting across the board and makes false positive and false negative mistakes at the same rate for every group of defendants.”

Fry comes back to such limits frequently throughout her book, exploring them in various disciplines. She demonstrates to the reader that “there are boundaries to the reach of algorithms. Limits to what can be quantified.” Perhaps a better understanding of those limits is needed to inform our discussions of where we want to use algorithms.

There are, however, other limits that we can do something about. Both authors make the case for more education about algorithms and their input data. Lack of understanding shouldn’t hold us back. Algorithms that have a significant impact on our lives specifically need to be open to scrutiny and analysis. If an algorithm is going to put you in jail or impact your ability to get a mortgage, then you ought to be able to have access to it.

Most algorithm writers and the companies they work for wave the “proprietary” flag and refuse to open themselves up to public scrutiny. Many algorithms are a black box—we don’t actually know how they reach the conclusions they do. But Fry says that shouldn’t deter us. Pursuing laws (such as the data access and protection rights being instituted in the European Union) and structures (such as an algorithm-evaluating body playing a role similar to the one the U.S. Food and Drug Administration plays in evaluating whether pharmaceuticals can be made available to the U.S. market) will help us decide as a society what we want and need our algorithms to do.

Where do we go from here?

Algorithms aren’t going away, so it’s best to acquire the knowledge needed to figure out how they can help us create the world we want.

Fry suggests that one way to approach algorithms is to “imagine that we designed them to support humans in their decisions, rather than instruct them.” She envisions a world where “the algorithm and the human work together in partnership, exploiting each other’s strengths and embracing each other’s flaws.”

Part of getting to a world where algorithms provide great benefit is to remember how diverse our world really is and make sure we get data that reflects the realities of that diversity. We can either actively change the algorithm, or we change the data set. And if we do the latter, we need to make sure we aren’t feeding our algorithms data that, for example, excludes half the population. As Criado Perez writes, “when we exclude half of humanity from the production of knowledge, we lose out on potentially transformative insights.”

Given how complex the world of algorithms is, we need all the amazing insights we can get. Algorithms themselves perhaps offer the best hope, because they have the inherent flexibility to improve as we do.

Fry gives this explanation: “There’s nothing inherent in [these] algorithms that means they have to repeat the biases of the past. It all comes down to the data you give them. We can choose to be ‘crass empiricists’ (as Richard Berk put it ) and follow the numbers that are already there, or we can decide that the status quo is unfair and tweak the numbers accordingly.”

We can get excited about the possibilities that algorithms offer us and use them to create a world that is better for everyone.

Do Algorithms Beat Us at Complex Decision Making?

Decision-making algorithms are undoubtedly controversial. If a decision is being made that will have a major influence on your life, most people would prefer a human make it. But what if algorithms really can make better decisions?

***

Algorithms are all the rage these days. AI researchers are taking more and more ground from humans in areas like rules-based games, visual recognition, and medical diagnosis. However, the idea that algorithms make better predictive decisions than humans in many fields is a very old one.

In 1954, the psychologist Paul Meehl published a controversial book with a boring sounding name: Clinical vs. Statistical Prediction: A Theoretical Analysis and a Review of the Evidence.

The controversy? After reviewing the data, Meehl claimed that mechanical, data-driven algorithms could better predict human behavior than trained clinical psychologists — and with much simpler criteria. He was right.

The passing of time has not been friendly to humans in this game: Studies continue to show that the algorithms do a better job than experts in a range of fields. In Daniel Kahneman’s Thinking Fast and Slow, he details a selection of fields which have demonstrated inferior human judgment compared to algorithms:

The range of predicted outcomes has expanded to cover medical variables such as the longevity of cancer patients, the length of hospital stays, the diagnosis of cardiac disease, and the susceptibility of babies to sudden infant death syndrome; economic measures such as the prospects of success for new businesses, the evaluation of credit risks by banks, and the future career satisfaction of workers; questions of interest to government agencies, including assessments of the suitability of foster parents, the odds of recidivism among juvenile offenders, and the likelihood of other forms of violent behavior; and miscellaneous outcomes such as the evaluation of scientific presentations, the winners of football games, and the future prices of Bordeaux wine.

The connection between them? Says Kahneman: “Each of these domains entails a significant degree of uncertainty and unpredictability.” He called them “low-validity environments”, and in those environments, simple algorithms matched or outplayed humans and their “complex” decision making criteria, essentially every time.

***

A typical case is described in Michael Lewis’ book on the relationship between Daniel Kahneman and Amos Tversky, The Undoing Project. He writes of work done at the Oregon Research Institute on radiologists and their x-ray diagnoses:

The Oregon researchers began by creating, as a starting point, a very simple algorithm, in which the likelihood that an ulcer was malignant depended on the seven factors doctors had mentioned, equally weighted. The researchers then asked the doctors to judge the probability of cancer in ninety-six different individual stomach ulcers, on a seven-point scale from “definitely malignant” to “definitely benign.” Without telling the doctors what they were up to, they showed them each ulcer twice, mixing up the duplicates randomly in the pile so the doctors wouldn’t notice they were being asked to diagnose the exact same ulcer they had already diagnosed. […] The researchers’ goal was to see if they could create an algorithm that would mimic the decision making of doctors.

This simple first attempt, [Lewis] Goldberg assumed, was just a starting point. The algorithm would need to become more complex; it would require more advanced mathematics. It would need to account for the subtleties of the doctors’ thinking about the cues. For instance, if an ulcer was particularly big, it might lead them to reconsider the meaning of the other six cues.

But then UCLA sent back the analyzed data, and the story became unsettling. (Goldberg described the results as “generally terrifying”.) In the first place, the simple model that the researchers had created as their starting point for understanding how doctors rendered their diagnoses proved to be extremely good at predicting the doctors’ diagnoses. The doctors might want to believe that their thought processes were subtle and complicated, but a simple model captured these perfectly well. That did not mean that their thinking was necessarily simple, only that it could be captured by a simple model.

More surprisingly, the doctors’ diagnoses were all over the map: The experts didn’t agree with each other. Even more surprisingly, when presented with duplicates of the same ulcer, every doctor had contradicted himself and rendered more than one diagnosis: These doctors apparently could not even agree with themselves.

[…]

If you wanted to know whether you had cancer or not, you were better off using the algorithm that the researchers had created than you were asking the radiologist to study the X-ray. The simple algorithm had outperformed not merely the group of doctors; it had outperformed even the single best doctor.

The fact that doctors (and psychiatrists, and wine experts, and so forth) cannot even agree with themselves is a problem called decision making “noise”: Given the same set of data twice, we make two different decisions. Noise. Internal contradiction.

Algorithms win, at least partly, because they don’t do this: The same inputs generate the same outputs every single time. They don’t get distracted, they don’t get bored, they don’t get mad, they don’t get annoyed. Basically, they don’t have off days. And they don’t fall prey to the litany of biases that humans do, like the representativeness heuristic.

The algorithm doesn’t even have to be a complex one. As demonstrated above with radiology, simple rules work just as well as complex ones. Kahneman himself addresses this in Thinking, Fast and Slow when discussing Robyn Dawes’s research on the superiority of simple algorithms using a few equally-weighted predictive variables:

The surprising success of equal-weighting schemes has an important practical implication: it is possible to develop useful algorithms without prior statistical research. Simple equally weight formulas based on existing statistics or on common sense are often very good predictors of significant outcomes. In a memorable example, Dawes showed that marital stability is well predicted by a formula: Frequency of lovemaking minus frequency of quarrels.

You don’t want your result to be a negative number.

The important conclusion from this research is that an algorithm that is constructed on the back of an envelope is often good enough to compete with an optimally weighted formula, and certainly good enough to outdo expert judgment. This logic can be applied in many domains, ranging from the selection of stocks by portfolio managers to the choices of medical treatments by doctors or patients.

Stock selection, certainly a “low validity environment”, is an excellent example of the phenomenon.

As John Bogle pointed out to the world in the 1970’s, a point which has only strengthened with time, the vast majority of human stock-pickers cannot outperform a simple S&P 500 index fund, an investment fund that operates on strict algorithmic rules about which companies to buy and sell and in what quantities. The rules of the index aren’t complex, and many people have tried to improve on them with less success than might be imagined.

***

Another interesting area where this holds is interviewing and hiring, a notoriously difficult “low-validity” environment. Even elite firms often don’t do it that well, as has been well documented.

Fortunately, if we take heed of the advice of the psychologists, operating in a low-validity environment has rules that can work very well. In Thinking Fast and Slow, Kahneman recommends fixing your hiring process by doing the following (or some close variant), in order to replicate the success of the algorithms:

Suppose you need to hire a sales representative for your firm. If you are serious about hiring the best possible person for the job, this is what you should do. First, select a few traits that are prerequisites for success in this position (technical proficiency, engaging personality, reliability, and so on). Don’t overdo it — six dimensions is a good number. The traits you choose should be as independent as possible from each other, and you should feel that you can assess them reliably by asking a few factual questions. Next, make a list of questions for each trait and think about how you will score it, say on a 1-5 scale. You should have an idea of what you will call “very weak” or “very strong.”

These preparations should take you half an hour or so, a small investment that can make a significant difference in the quality of the people you hire. To avoid halo effects, you must collect the information one at a time, scoring each before you move on to the next one. Do not skip around. To evaluate each candidate, add up the six scores. […] Firmly resolve that you will hire the candidate whose final score is the highest, even if there is another one whom you like better–try to resit your wish to invent broken legs to change the ranking. A vast amount of research offers a promise: you are much more likely to find the best candidate if you use this procedure than if you do what people normally do in such situations, which is to go into the interview unprepared and to make choices by an overall intuitive judgment such as “I looked into his eyes and liked what I saw.”

In the battle of man vs algorithm, unfortunately, man often loses. The promise of Artificial Intelligence is just that. So if we’re going to be smart humans, we must learn to be humble in situations where our intuitive judgment simply is not as good as a set of simple rules.