Book Review - Weapons of Math Destruction : How Big Data Increases Inequality and Threatens Democracy

Ganesh Sreeramulu
May 10
6 min read

Updated: May 17

Disclaimer: The below began as a book review, but quickly became an amalgamation of my own experience — both personal & professional (in an omnipresent IT services industry) along with the elements that I could relate from the book

Picked up this book based upon a recommendation from a colleague (who, I am yet to meet. The interaction has been virtual till date)

The who’s who of the IT industry, be it the mutli-billion dollar darlings of the stock market or the ever mushrooming start-ups, can all be heard hawking their Big Data (mal)wares . It is pretty much the “Snake Oil” of the modern day IT world. It is the “Silver Bullet” to cure all their real world problems. One single “Silver bullet”.

Yes , there is an definitely an element of truth to the potential of Big Data, but it has been hyped up well beyond comprehension. In the past few years I have had the opportunity to work with clients across domains, who are just ever so gingerly taking their baby steps into the world of BigData. A vast majority of them look upon Big Data platforms with awe. They want to jump into the bandwagon and create a “Data Lake” for their future BigData initiatives.

Heard these somewhere ?

· Customer Micro-segmentation ? BigData is the way to go. Let’s build a Data Lake

· Fraud Analytics ? BigData is the way to go. Let’s build a Data Lake

· Improve Sales ? BigData is the way to go. Lets build a Data Lake

· Operational Efficiency ? BigData is the way to go. Let’s build a Data Lake

· Online Dating ? BigData is the way to go. Let’s build a Data Lake

· Influence election campaigns? BigData is the way to go. Let’s build a Data Lake

· Citizen programs? BigData is the way to go. Let’s build a Data Lake (And in this case, give the admin access to this treasure trove to a person who gladly sells it for $8. You would have thought it would be worth more, much more. But as it turns out, just $8 is sufficient)

While Data Lakes are all the rage these days, seldom do companies start with a clear vision of what they want to accomplish via this Data Lake, and the steps to realize this vision. One may argue that with ever decreasing entry barriers into the Big Data Super Bowl (Fueled by commodity servers being used in infrastructure set up, driving down costs significantly) it may still be wise to create a massive dump (Yes. A dump it is, until the point that you start squeezing intelligence from it) and then wait for the “Eureka”moment to figure out what to do with it .

But with multiple glitches creeping into the system — these Data Lakes, often end up transmorphing into the epitome of “Garbage in, Garbage Out” . These systems are far-reaching with the ability to critically influence outcomes of — background checks for jobs, health/auto insurance premium (or in some cases — even being eligible for insurance), financing/re-financing (remember the sub-prime crisis anyone ?), Jail sentences (Yes, even they use Big Data to determine the severity of the sentence. And by the way — Many of the Jails in the US are owned and run by private entities . Get the drift ? )

Is this fear misplaced ? No — not really. The below are some of key perpetuators highlighted by Cathy O’Neil — that trigger such BigData initiatives into a nefarious loop, which only further the inequality

1. Human Biases: It’s at the end of the day — Humans, who model these systems. The inherent human bias gets also into the system (intentionally/unintentionally). What are the biases — Race, Gender, Color, City etc. Again one might argue that, these biases were present in the Pre-digital era also. But in the pre-digital era — these biases when taken at a wider pool tend to counterbalance and even-out (well, atleast to some extent). To quote the author

“Human decision making, while often flawed, has one chief virtue. It can evolve”

But in this BigData driven world, such a simple bias when codified into a model can be potentially applied to an extremely large set of people in a very efficient manner. And that, ladies and gentlemen — is what a scary future looks like.

This resonated well with my own personal experience in 2005, when I was hunting for a student loan to pursue my MBA. Armed with a certificate, from my potential university, stating that I am a merit candidate with a 25% scholarship (based on their entrance examinations), I approached nearby banks with an illusion of smooth sailing ahead. But then reality struck — the manager of the first bank spent an approximate grand total of 5 minutes with me and decided that I am a bad bet and thereby turned down my loan. The human bias that went in against me — I couldn’t show collateral for the loan (Property, Investment Policies on my name etc). And me being a first generation literate, my parents too couldn’t conjure up compelling credentials on their behalf (Educational qualifications etc). And this, by the way, is when the Indian Government had already announced that collateral wasn’t required for education loans (and hence I call it a human bias from the bank manager). So much for government policies then.

But luckily for me, a business associate of my father was able to write in a recommendation letter for me, to a bank that were already doing business with. This opened the gates and I did get the loan. This whole process is very similar to the pre-digital era where you would get a job , only if you knew there was an opening, via an insider, who would also be willing to recommend you.

In the case of my educational loan, human biases eventually cancelled out as I approached other sources of funds, who considered other data points and granted me the loan. Now imagine my potential scenario in the Digital world, if that manager who happened to turn down my loan happened to be the manager who is giving critical inputs to the BigData fueled model (that discerns a person’s credit worthiness) , then me and all others similar to me are doomed. We would be refused a loan, not just at that branch but at all branches of the bank across India, who would evaluate me on the same model. Admittedly — there are chances that the model might work to de-risk the bank off bad debts and the manager gets lauded for his model, but it also inadvertently increases the inequality in the system

So you see, these Big Data models can exponentially magnify a human bias, and do this at scale with relative ease. This for me is scary. Very scary

2. Corrupt Data: It is not just human bias that we are dealing against. Good data is a Unicorn. It isn’t information paucity, but information accuracy that is the problem. We have excessive information, but not necessarily accurate information, which is required to train our models. Many of the existing data such as say — probability/correlation of selection of a person from location x for a job at y, already is the systematic end result of previous human biases (not all, but a good majority suffer from human bias). This is further accentuated by the fact that in many areas — you can’t actually monitor/track the exact data that is required (Due to government regulations, high cost of monitoring exact data etc.) and thereby rely on “proxies”. This compounds the probability of corrupt data even more

3. Simpson’s Paradox: Simpson’s paradox has to do with Data interpretation , and is again a human error which can be multiplied to great effects by the system. It is when a whole body of data displays one trend, yet when broken into subgroups, the opposite trend emerges. A good data scientists/statistician always accounts for this and uses “stratification” effectively. But a vast majority are willing to jump to the conclusion, if it matches their initial hypothesis. A kind of “expectation bias” if you will.

4. No Feedback Loop: Nobody gets a model right the very first time. Don’t let anybody tell you otherwise. A good data scientist will need constant feedback from the real-world to consistently tweak the model. But seldom is a model consistently monitored for effectiveness based on real-world outcomes. Looking ahead — with more behavioral data being pumped in to AI driven decision engines, it slowly but surely has become an opaque blackbox for humans, thereby losing the ability to question the model/outcome and provide a valuable feedback loop

All the above 4 vital elements feed off each other and can quickly make a BigData model into one mammoth nefarious loop

When designing BigData systems always remember what Uncle Ben said — “With great power, comes great responsibility” . After all, he’s had enough experience with a equally intricate web spinning protagonist.

For more on the book — https://weaponsofmathdestructionbook.com

#BookReview #NotABookReview #BigData #WeaponsOfMathDestruction #CathyONeil #SimpsonsParadon

Book Review - Weapons of Math Destruction : How Big Data Increases Inequality and Threatens Democracy

Disclaimer: The below began as a book review, but quickly became an amalgamation of my own experience — both personal & professional (in an omnipresent IT services industry) along with the elements that I could relate from the book

Recent Posts

Comments

About Me

#TheCaffeinatedExplorer

Posts Archive