The Exaggerated Promise of So-Called Unbiased Data Mining

The Feynman trap-ransacking data for patterns without any preconceived idea of what one is looking for-is the Achilles heel of studies based on data mining.

In 2018, a Yale economics professor and a graduate student calculated correlations between daily changes in Bitcoin prices and hundreds of other financial variables.

In a 2016 blog post titled “The Grad Student Who Never Said No,” he wrote about a PhD student who had been given data collected at an all-you-can-eat Italian buffet.

Email correspondence surfaced in which the professor advised the graduate student to separate the diners into “Males, females, lunch goers, dinner goers, people sitting alone, people eating with groups of 2, people eating in groups of 2+, people who order alcohol, people who order soft drinks, people who sit close to buffet, people who sit far away, and so on” Then she could look at different ways in which these subgroups might differ: “# pieces of pizza, # trips, fill level of plate, did they get dessert, did they order a drink, and so on”.

Data mining just looks for patterns and inevitably finds some.

Data miners have found correlations between Twitter words or Google search queries and criminal activity, heart attacks, stock prices, election outcomes, Bitcoin prices, and soccer matches.

Finding an unusual pattern in Big Data is no more convincing than finding an unusual license plate outside Feynman’s classroom.

This article was summarized automatically with AI / Article-Σ ™/ BuildR BOT™.

Original link