The growing influence of algorithms on our lives means we owe it to ourselves to better understand what they are and how they work. Understanding how the data we use to inform algorithms influences the results they give can help us avoid biases and make better decisions.
Algorithms are everywhere: driving our cars, designing our social media feeds, dictating which mixer we end up buying on Amazon, diagnosing diseases, and much more.
Two recent books explore algorithms and the data behind them. In Hello World: Being Human in the Age of Algorithms, mathematician Hannah Fry shows us the potential and the limitations of algorithms. And Invisible Women: Data Bias in a World Designed for Men by writer, broadcaster, and feminist activist Caroline Criado Perez demonstrates how we need to be much more conscientious of the quality of the data we feed into them.
Humans or algorithms?
First, what is an algorithm? Explanations of algorithms can be complex. Fry explains that at their core, they are defined as step-by-step procedures for solving a problem or achieving a particular end. We tend to use the term to refer to mathematical operations that crunch data to make decisions.
When it comes to decision-making, we don’t necessarily have to choose between doing it ourselves and relying wholly on algorithms. The best outcome may be a thoughtful combination of the two.
We all know that in certain contexts, humans are not the best decision-makers. For example, when we are tired, or when we already have a desired outcome in mind, we may ignore relevant information. In Thinking, Fast and Slow, Daniel Kahneman gave multiple examples from his research with Amos Tversky that demonstrated we are heavily influenced by cognitive biases such as availability and anchoring when making certain types of decisions. It’s natural, then, that we would want to employ algorithms that aren’t vulnerable to the same tendencies. In fact, their main appeal for use in decision-making is that they can override our irrationalities.
Algorithms, however, aren’t without their flaws. One of the obvious ones is that because algorithms are written by humans, we often code our biases right into them. Criado Perez offers many examples of algorithmic bias.
For example, an online platform designed to help companies find computer programmers looks through activity such as sharing and developing code in online communities, as well as visiting Japanese manga (comics) sites. People visiting certain sites with frequency received higher scores, thus making them more visible to recruiters.
However, Criado Perez presents the analysis of this recruiting algorithm by Cathy O’Neil, scientist and author of Weapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracy, who points out that “women, who do 75% of the world’s unpaid care work, may not have the spare leisure time to spend hours chatting about manga online . . . and if, like most of techdom, that manga site is dominated by males and has a sexist tone, a good number of women in the industry will probably avoid it.”
Criado Perez postulates that the authors of the recruiting algorithm didn’t intend to encode a bias that discriminates against women. But, she says, “if you aren’t aware of how those biases operate, if you aren’t collecting data and taking a little time to produce evidence-based processes, you will continue to blindly perpetuate old injustices.”
Fry also covers algorithmic bias and asserts that “wherever you look, in whatever sphere you examine, if you delve deep enough into any system at all, you’ll find some kind of bias.” We aren’t perfect—and we shouldn’t expect our algorithms to be perfect, either.
In order to have a conversation about the value of an algorithm versus a human in any decision-making context, we need to understand, as Fry explains, that “algorithms require a clear, unambiguous idea of exactly what we want them to achieve and a solid understanding of the human failings they are replacing.”
Garbage in, garbage out
No algorithm is going to be successful if the data it uses is junk. And there’s a lot of junk data in the world. Far from being a new problem, Criado Perez argues that “most of recorded human history is one big data gap.” And that has a serious negative impact on the value we are getting from our algorithms.
Criado Perez explains the situation this way: We live in “a world [that is] increasingly reliant on and in thrall to data. Big data. Which in turn is panned for Big Truths by Big Algorithms, using Big Computers. But when your data is corrupted by big silences, the truths you get are half-truths, at best.”
A common human bias is one regarding the universality of our own experience. We tend to assume that what is true for us is generally true across the population. We have a hard enough time considering how things may be different for our neighbors, let alone for other genders or races. It becomes a serious problem when we gather data about one subset of the population and mistakenly assume that it represents all of the population.
For example, Criado Perez examines the data gap in relation to incorrect information being used to inform decisions about safety and women’s bodies. From personal protective equipment like bulletproof vests that don’t fit properly and thus increase the chances of the women wearing them getting killed to levels of exposure to toxins that are unsafe for women’s bodies, she makes the case that without representative data, we can’t get good outputs from our algorithms. She writes that “we continue to rely on data from studies done on men as if they apply to women. Specifically, Caucasian men aged twenty-five to thirty, who weigh 70 kg. This is ‘Reference Man’ and his superpower is being able to represent humanity as whole. Of course, he does not.” Her book contains a wide variety of disciplines and situations where the gender gap in data leads to increased negative outcomes for women.
The limits of what we can do
Although there is a lot we can do better when it comes to designing algorithms and collecting the data sets that feed them, it’s also important to consider their limits.
We need to accept that algorithms can’t solve all problems, and there are limits to their functionality. In Hello World, Fry devotes a chapter to the use of algorithms in justice. Specifically, algorithms designed to provide information to judges about the likelihood of a defendant committing further crimes. Our first impulse is to say, “Let’s not rely on bias here. Let’s not have someone’s skin color or gender be a key factor for the algorithm.” After all, we can employ that kind of bias just fine ourselves. But simply writing bias out of an algorithm is not as easy as wishing it so. Fry explains that “unless the fraction of people who commit crimes is the same in every group of defendants, it is mathematically impossible to create a test which is equally accurate at predicting across the board and makes false positive and false negative mistakes at the same rate for every group of defendants.”
Fry comes back to such limits frequently throughout her book, exploring them in various disciplines. She demonstrates to the reader that “there are boundaries to the reach of algorithms. Limits to what can be quantified.” Perhaps a better understanding of those limits is needed to inform our discussions of where we want to use algorithms.
There are, however, other limits that we can do something about. Both authors make the case for more education about algorithms and their input data. Lack of understanding shouldn’t hold us back. Algorithms that have a significant impact on our lives specifically need to be open to scrutiny and analysis. If an algorithm is going to put you in jail or impact your ability to get a mortgage, then you ought to be able to have access to it.
Most algorithm writers and the companies they work for wave the “proprietary” flag and refuse to open themselves up to public scrutiny. Many algorithms are a black box—we don’t actually know how they reach the conclusions they do. But Fry says that shouldn’t deter us. Pursuing laws (such as the data access and protection rights being instituted in the European Union) and structures (such as an algorithm-evaluating body playing a role similar to the one the U.S. Food and Drug Administration plays in evaluating whether pharmaceuticals can be made available to the U.S. market) will help us decide as a society what we want and need our algorithms to do.
Where do we go from here?
Algorithms aren’t going away, so it’s best to acquire the knowledge needed to figure out how they can help us create the world we want.
Fry suggests that one way to approach algorithms is to “imagine that we designed them to support humans in their decisions, rather than instruct them.” She envisions a world where “the algorithm and the human work together in partnership, exploiting each other’s strengths and embracing each other’s flaws.”
Part of getting to a world where algorithms provide great benefit is to remember how diverse our world really is and make sure we get data that reflects the realities of that diversity. We can either actively change the algorithm, or we change the data set. And if we do the latter, we need to make sure we aren’t feeding our algorithms data that, for example, excludes half the population. As Criado Perez writes, “when we exclude half of humanity from the production of knowledge, we lose out on potentially transformative insights.”
Given how complex the world of algorithms is, we need all the amazing insights we can get. Algorithms themselves perhaps offer the best hope, because they have the inherent flexibility to improve as we do.
Fry gives this explanation: “There’s nothing inherent in [these] algorithms that means they have to repeat the biases of the past. It all comes down to the data you give them. We can choose to be ‘crass empiricists’ (as Richard Berk put it ) and follow the numbers that are already there, or we can decide that the status quo is unfair and tweak the numbers accordingly.”
We can get excited about the possibilities that algorithms offer us and use them to create a world that is better for everyone.