Kaggle’s algorithms show machines are getting too good at judging humans

Kaggle, a San Francisco-based startup that hosts data science competitions, has uncovered some disconcerting insights about human behavior in its two-year run. At times, its founders have been surprised by the accuracy of an algorithm, and the competitions continue to evoke controversy.

In short, data can be dangerous. I caught up with the company’s founder and CEO, Anthony Goldbloom, to find out more about recent data-driven discoveries that have rocked the boat.

1) “The Essay Scoring Competition”

Sponsor: Hewlett Foundation / Prize: $100,000
Goal: To get the computer to give an essay the same score a human grader would.

The idea was that by analyzing spelling and punctuation, as well as sentence structure, an algorithm could give an essay a reliable score, perhaps even more consistent than a human grader.

Martin O’Leary, a glacier scientist at the University of Michigan, was one of hundreds of competitors from around the world. He told Reuters that he discovered that human graders are rarely in agreement. They are swayed by irrelevant, aesthetic factors like how neatly a student writes. Unlike an algorithm, they award scores that seem random.

“The reality is, humans are not very good at doing this,” said Steve Graham, a Vanderbilt University professor who has researched essay grading techniques in an interview with Reuters.

A controversial study, indeed. Reflecting on the competition, Goldbloom was not initially convinced it could be done. “I remember thinking: Are we going to be falling flat on our face? It’s really hard to take an essay and give it a grade,” he recalled. The biggest obstacle was to find a team with the requisite machine learning expertise and the ability to deal with unstructured data, including text.

As critics pointed out, it’s easy to outsmart an algorithm and optimize for success. In response to the New York Times’ article which advocated “obfuscating mellifluously” when facing a robograder (i.e. disarm with big words), Goldbloom said the algorithm would need to be refreshed at least once a year.

Key insight: An algorithm is no less reliable at scoring essays than the average teacher. 

2) “The Twitter psychopathy competition”

Sponsor: Online Privacy Foundation / Prize: $1,000
Goal: Can Twitter detect psychopathy? How about personality traits?

In two separate competitions, the goal was to analyze a person’s social media flow to detect personality type and risk for psychopathy. Can we get a sense for your personality in a single tweet?

The data, drawn from the tweets of 3,000 people, may surprise you. “It turns out that with Twitter data alone, we can go quite some way into figuring out someone’s personality,” said Goldbloom. The signals for psychopathy: good grammar, an angry tone, use of swear words and conjunctions.

Before their tweets were analyzed, the users completed a personality test. The algorithm immediately rooted out the most extroverted of the lot. Those who reference others in a reply are more likely to be extroverted. One competitor, a Japanese ad-targeting expert, found that openness is the personality trait that, ironically, is the most difficult to detect.

Are you giving more information about yourself than you realize? Goldbloom said this information could potentially be used by potential employers, as well as for targeted online advertising.

Key insight: With only 140 characters, data scientists and statisticians can get a strong sense for your personality. That’s fairly worrying, considering that this information could get into the wrong hands.

Featured Image via Shutterstock