The Faults in “Big Data”

Cathy O’Neil’s novel, titled Weapons of Math Destruction, delves into the idea that mathematical models aren’t as perfect as we think. O’Neil refers to these “weapons” as WMDs, for short. She defines these as algorithms that try to quantitatively rank certain characteristics, such as teaching skill and creditworthiness but actually have harmful outcomes, contributing to the inequality that permeates society; these algorithms keep the rich rich and the poor poor. 

O’Neil opens with a story: a group of teachers in a poor community is judged based on an algorithm to see which ones are best, and thus which ones get fired. One of the criteria that goes into this mathematical model is the student test scores on a standardized test. Specifically, which students show a stable or increase in test scores year-to-year when the difficulty is adjusted accordingly. There are obviously discrepancies to this: for one, student experiences outside the classroom have significant impacts on their in-class performance. However, despite its many flaws, the model was accepted as a good way to choose between teachers, because humans have inherent natural biases. When it led to many popular educators being fired, though, sentiments changed. Sarah Wysocki was one of them, and she investigated the standardized tests taken by her students. At the beginning of the year, she was pleasantly surprised when the fifth graders in her class had scored very well on their tests, with 29% of them being in the “advanced reading level.” However, when her students were struggling to read simple sentences, she was suspicious. O’Neil writes that a high rate of erasures on the answer sheets was indicative of the teachers fudging the scores so as to not get fired. This resulted in Wysocki having to maintain stable year-to-year test scores for students that had were not at the level their tests showed. 

How does this relate to WMDs? O’Neil writes,

“After the shock of her firing, Sarah Wysocki was out of a job for a few days. She had plenty of people, including her principal, to vouch for her as a teacher, and she promptly landed a position at a school in an affluent district in northern Virginia. So thanks to a highly questionable model, a poor school lost a good teacher, and a rich school, which didn’t fire people on the basis of their students’ scores, gained one.” 

This story illustrates the “dark side” of data science: one that fuels inequality across communities worldwide. 

As mathematical models are increasingly used worldwide, whether it’s in banking or the justice system, Cathy O’Neil describes WMDs as meeting three criteria: they are opaque and their effects are both widespread and harmful. What this means is that the actual algorithms that go into WMDs are secret, they affect a lot of people, and can damage lives/contribute to economic inequality. Despite their many flaws, she asserts that it would be difficult to remove WMDs from American society because of how connected everything is. So, we shouldn’t aim to get rid of them. Instead, the biases can be removed so that rather than increasing inequality, the models do what they’re intended to do: help people. 

I recently read an article in computer science class about the ethics of data science that resonated with me and my experience reading this book. It was a story about an African American woman who sought medical care for sickness, and a model was used to predict the amount of health care she would need. However, the primary component of this model was money spent on health care in previous years, perhaps because it would indicate a “healthier” person. Looking at the data, though, this model was biased; African Americans on average spent less money on health care for the same necessity than others, which ties into the idea of racial inequality and economic inequality being linked. This woman was inevitably given less health care than she actually needed, which could have led to catastrophic results. 

Although data analysis is often helpful, we must be careful that our machines don’t inherit the same biases we hold. Perhaps math modeling is the way of the future, but there’s a lot to be ironed out before then.

Image Links:

https://blogs.iadb.org/ideas-matter/en/big-data-in-the-age-of-the-coronavirus/

https://www.amazon.com/Weapons-Math-Destruction-Increases-Inequality/dp/0553418815