Jigsaw's AI-powered toxic language detector is now processing 500 million requests daily

In 2017, Google's Counter Abuse Technology team and Jigsaw, the organization working under Google parent company Alphabet to tackle cyberbullying and disinformation, released an AI-powered API for content moderation called Perspective. It's used by media organizations including the New York Times, Vox Media, OpenWeb, and Disqus, and today Jigsaw announced that it's now processing 500 million requests daily.

While studies have found Perspective to be susceptible to various forms of biases, including racial and ethnic biases, engineers at the company say the service has improved in its ability to detect comments containing objective hate speech and toxicity.

"Toxicity on the internet is a pervasive problem that disproportionately impacts marginalized groups, threatens independent journalism, and crowds out freedom of expression and healthy dialogue," Jigsaw CEO Jared Cohen said in a statement. "We're committed to working with our partners and academic institutions to continuously train and retrain our models to become even better at identifying toxicity while minimizing bias in support of healthier conversations."

Bias in the models

Perspective offers a score from zero to 100 based on how similar new comments are to others previously identified as toxic, defined as how likely a comment is to make someone leave a conversation. Publishers can use Perspective in a number of ways, from offering readers instant feedback on the toxicity of their comments to giving readers the power to filter conversations based on the level of toxicity they'd like to see. Jigsaw claims its AI can immediately spit out an assessment of a phrase's "toxicity" more accurately than any keyword blacklist, and more quickly than any human moderator.

But some auditors claim Perspective doesn't moderate hate and toxic speech equally across different groups of people. A study published by researchers at the University of Oxford, the Alan Turing Institute, Utrecht University, and the University of Sheffield found that the Perspective API particularly struggles with denouncements of hate that quote the hate speech or make direct references to it. According to the results of their experiments with a purpose-built dataset, Perspective classifies only 15.6% to 18.4% of these correctly and recognizes just 66% of hate speech that uses a slur and 62.9% of abuse targeted at "non-protected" groups like "artists" and "capitalists" (for example, in statements like "artists are parasites to our society" and "death to all capitalists"). Moreover, they say Perspective only recognizes 54% of "reclaimed" slurs like "queer" and can fail to catch spelling variations like missing characters, added spaces between characters, and spellings with numbers in place of words.

An earlier University of Washington study published in 2019 found that Perspective was more likely to label "Black-aligned English" offensive versus "white-aligned English." After feeding Perspective a sample of tweets from Black and white users, the coauthors saw correlations between dialects and groups in their datasets and the Perspective toxicity scores. All correlations were significant, they said, indicating potential racial bias for all of the datasets.

Bias mitigation

But Jigsaw claims to have made progress toward mitigating the biases in its models. In 2019, the company released what it claimed is the largest public dataset of comments and annotations with toxicity labels and identity labels. The corpus originated from a competition Jigsaw launched in April 2019 that challenged entrants to build a model that recognizes toxicity and minimizes bias with respect to any mention of identities. The first release contained roughly 250,000 comments labeled for identities, where raters were asked to indicate references to gender, sexual orientation, religion, race, ethnicity, disability, and mental illness in a given comment. The new version added individual human annotations from almost 9,000 human raters -- annotations that effectively teach machine learning models the meaning of toxicity.

Improved datasets alone might not be enough to correct for some biases in toxicity detection models like those at the core of Perspective. Recently, researchers at the Allen Institute investigated methods to address lexical and dialectal imbalances in hate speech training data, where "lexical biases" refer to associating toxicity with the presence of certain words (e.g., profanities) and "dialectal biases" correlate toxicity with "markers" of language variants, like African-American English (AAE). According to the researchers, even models debiased with state-of-the-art techniques disproportionately flagged text in certain snippets, particularly text from Black people, as toxic.

One fascinating area of research in the early stages at Jigsaw aims to investigate how annotators from different backgrounds and experiences classify things according to toxicity. The goal is to see to what extent someone's life history affects what they consider to be toxic and to leverage this to develop a superior version of Perspective by factoring this lens into the person's labeling decisions.

"We're looking to understand how people's experience affects their toxicity decisions," Jigsaw product manager Adesola Sanusi told VentureBeat in a phone interview last week. "We're hoping that we can do a better job in the future of matching individuals with data that [is] best suited to their background knowledge."

Jigsaw is also exploring uncertainty modeling, which could enable the models powering Perspective to understand when they might be wrong about a particular snippet of speech. Now, the models rate the toxicity of any speech given to them, but with uncertainty modeling, they might decide not to rate speech if there's a high chance it might be misclassified.

"A good portion of toxicity across the internet is not really people who are out to get each other, but somebody who's just having a bad day," Jigsaw exec Patricia Georgiou said. "A big part of the initial idea behind Perspective was that humans really do just want to have good conversations and want to connect with each other."

Multimodality and new languages

Unlike most AI systems, humans understand the meaning of text, videos, audio, and images together in context. For example, given text and an image that seem innocuous when considered apart (e.g., "Look how many people love you" and a picture of a barren desert), people recognize that these elements take on potentially hurtful connotations when they're paired or juxtaposed. Multimodal learning can carry complementary information or trends, which often only become evident when they're all included in the learning process. And this holds promise for applications from transcription to detecting hate speech in different languages.

When asked whether Jigsaw is pursuing multimodal research, perhaps toward a hateful meme-detecting system, Jigsaw engineer Lucy Vasserman said that while the company has explored toxicity detection across images, videos, and other mediums, it remains focused first and foremost on text. "Text is the most high-impact way that our team can make great digital technology at this moment," Vasserman said. "I don't think we're opposed to venturing into other mediums, as we feel confident that we're able to provide solid interventions. But I'd say that for now, we still feel very strongly that text is the medium that we want to focus on, as well as where we can have the best impact."

To that end, Jigsaw recently trained Perspective in new languages, ostensibly positioning the service to better help moderate conversations online at scale. It's currently available in 14 languages, including English, Spanish, French, German, Italian, and Portuguese.

Sanusi explained the process of adding support for a new language to Perspective. "For every language, we collect really high-quality testing data. We have data that was originally written in the language [that] comes from forums with the language that we're targeting," she said. "We have speakers annotate that language for toxicity -- that's one dataset that we used to test on. Then we gather a lot of additional data to build a multilingual model that we fine-tune for the specific target language. According to what we can measure from our test sets, we'll then usually put the model out into experiments and work with our partners to test the model."

Of course, language is used differently across different demographics, locales, and cultures. But Sanusi says that with the modeling techniques Jigsaw is currently using, there's often less of a boundary between languages than one might expect.

"You can have a whole set of languages all at once," she said. "Our strategy is not so much that we need a specific model for each different locale, but more that we need to make sure our data is representative of all of those different locales and base languages so that the model can perform well in multiple scenarios, even if it's a smaller set of models or one model for every language."

Future work

Beyond Perspective and the comment-filtering Chrome extension it released in March 2019, Jigsaw conducts experiments that have at times proven controversial, like assigning a disinformation-for-hire service to attack a dummy website. Other projects underway include an open source tool called Outline that lets news organizations provide journalists safer access to the internet; an anti-distributed denial-of-service solution; a methodology to dissuade potential recruits from joining extremist groups; and SimSquad, a virtual reality tool that aims to reduce police violence.

Jigsaw doesn't charge for these -- or for Perspective. And according to chief operating officer Dan Keyserling, that won't change anytime soon. "Jigsaw's mandate is to help people in the world and to [foster] open societies, so at the moment, all of our considerations are around how we can have the biggest impact -- how we can develop this technology further," he said. "You'll see this from how we share all of our technology with the community, open-sourcing as much as we can. It's sort of core to how we operate."