Researchers at Carnegie Mellon’s Language Technologies Institute say they’ve developed a system that taps machine learning to analyze online comments and pick out those that defend or sympathize with disenfranchised peoples. Although it hasn’t been commercialized, they’ve used it in experiments to search for nearly a million YouTube comments, focusing on the Rohingya refugee crisis and the February 2019 Pulwama terrorist attack in Kashmir. And they hope it will form the foundation of a future system that reduces the manual effort necessary to curate comments on publisher websites, social media, and elsewhere.
Tamping down on abusive online behavior is no easy feat, particularly considering the level of toxicity in some social circles. More than one in five respondents to a survey by the Anti-Defamation League, a nonprofit that tracks and fights anti-Semitism, reported having been subjected to threats of violence. Nearly one in five said they’d experienced sexual harassment or stalking and sustained harassment, while upwards of 20% said the harassment was the result of their gender identity, race, ethnicity, sexual orientation, religion, occupation, or disability.
As the researchers explain, improvements in AI language models — which learn from many examples to predict what words are likely to occur in a given sentence — made it possible to analyze such large quantities of text. The study’s contribution was a technique enabling those models to digest short texts originating from South Asia, which can be difficult to interpret because they often contain spelling and grammar mistakes and combine different languages and systems of writing.
Specifically, the researchers obtained embeddings — numerical representations of words — that revealed novel language groupings or clusters. Language models create these so that words with similar meanings are represented in the same way, making it possible to compute the proximity of a word to others in a comment or post.
The team reports that in experiments, their approach worked as well or better than commercially available solutions. Random samplings of the YouTube comments showed about 10% were positive, compared with the 88% found with the AI algorithm.
“Even if there’s lots of hateful content, we can still find positive comments,” said post-doctoral research Ashiqur R. KhudaBukhsh, a contributing author on a forthcoming paper describing the work. He’ll present his findings with coauthors Shriphani Palakodety and Jaime Carbonell at the Association for the Advancement of Artificial Intelligence annual conference next month in New York City.
The study follows the release of a data set by Jigsaw — the organization working under Google parent company Alphabet to tackle cyber bullying, censorship, disinformation, and other digital issues of the day — containing hundreds of thousands of comments and annotations with toxicity and identity labels. It’s intended to help measure bias in AI comment classification systems, which Jigsaw and others have historically measured using synthetic data from template sentences.
In a related development, researchers at the Georgia Institute of Technology and email marketing startup Mailchimp recently proposed RECAST, an interactive tool for examining toxicity detection models by visualizing explanations for predictions and providing alternative wordings for detected toxic speech. They plan to release an open source browser extension in the near future.