Jigsaw releases data set to help develop AI that detects toxic comments

Mitigating prejudicial and abusive behavior online is no easy feat, given the level of toxicity in some communities. More than one in five respondents in a recent survey reported being subjected to physical threats, and nearly one in five experienced sexual harassment, stalking, or sustained harassment. Of those who experienced harassment, upwards of 20% said it was the result of their gender identity, race, ethnicity, sexual orientation, religion, occupation, or disability.

In pursuit of a solution, Jigsaw -- the organization working under Google parent company Alphabet to tackle cyber bullying, censorship, disinformation, and other digital issues of the day -- today released what it claims is the largest public data set of comments and annotations with toxicity labels and identity labels. It's intended to help measure bias in AI comment classification systems, which Jigsaw and others have historically measured using synthetic data from template sentences.

"While synthetic comments are easy to create, they do not capture any of the complexity and variety of real comments from online discussion forums," wrote Jigsaw software engineers Daniel Borkan, Jeff Sorensen, and Lucy Vasserman in a Medium post. "By labeling identity mentions in real data, we are able to measure bias in our models in a more realistic setting, and we hope to enable further research into unintended bias across the field."

The corpus originates from a competition Jigsaw launched in April, which challenged entrants to build a model that recognizes toxicity and minimizes bias with respect to mentions of identities. The first release contained roughly 250,000 comments labeled for identities, where raters were asked to indicate references to gender, sexual orientation, religion, race, ethnicity, disability, and mental illness in a given comment. This version adds individual human annotations from almost 9,000 human raters -- annotations that effectively teach machine learning models the meaning of toxicity.

Each comment was shown to 3 to 10 human raters to obtain the annotations, though Jigsaw says that some comments were seen by up to thousands of raters due to "sampling and strategies used to improve ... accuracy." The idea is that data scientists will train models on these to predict the probability that an individual will find a given comment toxic. For instance, if 7 out of 10 people rate a comment as “toxic," a system might predict a 70% likelihood that someone will find the comment toxic.

Not every human rater in the data set settled on the same rating, and Jigsaw says that weighing individual annotators differently based on expertise or background could improve model accuracy. They leave this to future work.

"By releasing the individual annotations on the Civil Comments set, we're inviting the industry to join us in taking the first step in exploring [open] questions," wrote Borkan, Sorensen, and Vasserman. "Building effective models and capturing the nuance of human opinion is a complex challenge that can't be solved by any one team alone ... We're excited to see what we learn."

Data sets like the one released today underpin Jigsaw's products, like the comment-filtering Chrome extension it released in March and its Perspective API tool for web publishers. Beyond this work, the think tank conducts experiments that at times have proven controversial, like its assigning of a disinformation-for-hire service to attack a dummy website. Other projects underway include an open source tool, Outline, that lets news organizations provide journalists safer access to the internet; an anti-distributed denial-of-service solution; and a methodology to dissuade potential ISIS recruits from joining that group.