The first rule of Big Data
- Posted on October 25, 2016
The following blog post was written by Avanade alum Jamal Khawaja.
How many of you have watched the movie Fight Club? Great movie, replete with violence, ennui, and a lament for the narcissistic failings of the baby boomers. Anyways, there’s a scene in the beginning of the movie where Edward Norton talks about how has to assess risk associated with car accidents for an insurance company. His job is to figure out whether a recall is the right decision for an auto manufacturer to make – from a financial perspective. Using actuarial tables, he identifies the company’s financial exposure from accidents and compares this to the costs of a recall. If a problem will result in $50,000 per claim against 1,000 total claims ($50,000,000), and the recall cost is $75,000,000, then they don’t authorize a recall. Financially, it makes more sense to let people die in accidents than to correct the problem [CLICK TO TWEET].
Hello, weird precursor to Big Data.
This is a nuanced but substantive problem associated with Big Data – a company has so much information that morality, ethics, and even the rule of law are subsumed to analytical realities. Let’s fast forward ten years into the future. Big Data is 10,000x more advanced. Companies use it to measure and manage customer satisfaction in every facet of its organization. An imaginary company, BigCorp, is reviewing customer service costs. After correlating data from Facebook and its own returned merchandise receipts, it has found that Chinese consumers are 60% more likely to complain on the internet during a product return than everyone else. This information is an algorithmic compendium of data captured from credit card transactions, age/sex/racial data compiled from loyalty programs, and negative feedback from Facebook posts, tweets, and employee-documented information from the return itself. Consequently, a policy is issued for customer service departments of BigCorp located in Chinatowns of major cities to ensure that the return process is quick and painless. Well, as someone who is not from China, I don’t think that seems fair.
Here’s another angle: let’s say BigCorp determines that white, under-30 females in Louisiana complain on the internet about returns only 15% of the time. Sales data suggests that most returned products are baby carriages, baby onesies, and women’s shoes. Loyalty cards indicate purchases are made by married white women who make under $50,000/yr. The sales for this demographic are 15% of total sales. However, the costs associated with their returns are 75% of all returned products. Consequently, BigCorp issues a policy in Louisiana to make the return process in Louisiana more cumbersome.
What we are looking at is cultural clichés that are validated by empirical data. Once we overlay Big Data against predictive analytics, the result is (apologies for the pun) predictable. Customer segmentation is no longer just a function of increasing sales; it becomes a relevant factor in policies that discriminate between racial, gender, age, or socio-economic classes.
In America, we have laws that mitigate the effect of relevant data on the employment process. It is called “disparate impact.” In fact, many laws exist to mitigate the effect of data on hiring. For instance, it is illegal to refuse to hire a woman if she is pregnant. Similarly, you cannot refuse service to someone based upon a protected class. Even if you have empirical data to suggest that gay males are more likely to complain about service on social media than other customers, you cannot refuse to sell them a product.
But can you do the opposite? It is fair/legal to provide a protected class advantageous policies like reward programs, special discounts/promotions, or other preferential treatment? Should pregnant women get mommy discounts? Should gay males receive special perks in order to mitigate the effects of negative social media comments? After all, there is a difference between fair and profitable. Companies exist to make a profit. Although it is illegal to discriminate against a protected class, should it be illegal to give preferential treatment to one if it will result in higher profits?
I don’t know what the first rule of Big Data is, because there are no real rules for Big Data. It is a sea of information that can be used for the forces of good or evil. We need to start thinking about Big Data issues as they relate to how our society functions and is governed: security, privacy, and advantageous or discriminatory policies. The potential for the abuse of Big Data is enormous, even as it promises a better tomorrow.