In his work, “Everybody Lies,” Seth Stephens-Davidowitz presents an intriguing exploration into the world of big data, using it as a lens to better understand human nature. It’s a compelling synthesis of data science’s early achievements, although it arguably falls short in discussing the potential pitfalls of this field. With a writing style that’s both accessible and engaging, he successfully demystifies complex concepts, making them digestible for readers with varying levels of familiarity with data science.
Stephens-Davidowitz began his journey as a data scientist with Google Trends, leveraging its ability to track the frequency of particular searches in various locations at different times. He expanded his sources to include Google Adwords, Wikipedia, Facebook, and even the adult entertainment site, PornHub. This latter source, in particular, provided him with invaluable, anonymized data on human sexuality that would have been eagerly sought after by philosophers and thinkers such as Schopenhauer, Nietzsche, Freud, and Foucault.
The author uncovered some shocking truths through his analysis. For instance, he found a 30% spike in searches for racist jokes on Martin Luther King Day in the US and identified a correlation between regions that heavily supported Donald Trump during the Republican primaries and the frequency of racially derogatory Google searches. His research also highlighted intriguing patterns in loan applications on the Prosper website, where certain phrases like “God”, “promise”, “will pay”, “hospital”, and “thank you” were identified as potential red flags.
The book outlines four primary advantages of big data for social scientists, highlighting its role as a “digital truth serum,” its capacity to facilitate large-scale experiments, its precision in focusing on small population subsets, and its potential to provide new types of data. These insights were particularly appealing to me as an aspiring data scientist, providing a positive outlook on the field’s potential.
The exploration of shaping perceptions about unique groups through the use of associations, and provoking curiosity, rather than relying on overt statements to influence how people feel about a different group of people. Another equally intriguing proposition of upgrading medical diagnostics is through the comparison of one’s (confidential) medical data with others who possess similar characteristics.
The method used to track what people are interested in like the one the author uses on Pornhub, was not entirely persuasive, “what people really want and really do, not what they say they want and say they do.” Just because people search for something online, doesn’t mean they want the same thing in real life. For instance, a person who enjoys action movies doesn’t necessarily want to be involved in a high-speed car chase. Similarly, when a journalist investigates sensitive topics like drug trafficking, illegal wildlife trade, or cyber warfare, should they be concerned about law enforcement knocking on their door? It’s hard to differentiate between a scholarly search and one made out of curiosity or for darker purposes.
Stephens-Davidowitz remains optimistic about big data’s potential to revolutionize social science and improve our lives. He believes that the raw and honest information harvested from our digital footprints could lead to a greater understanding of human behavior. However, his perspective largely overlooks potential negatives, including the misuse of big data in predatory advertising and the undermining of democracy. While this candid information can foster a deeper understanding of human behavior, the ethical implications, privacy concerns, and potential for misuse of such data remain unexplored. A balanced examination that considers both the tremendous potential and the possible downsides of this transformative technology is needed.