Facebook is using more AI to detect hate speech

In Q1 2020, 9.6 million pieces of content posted on Facebook were removed for violation of company hate speech policy, the "largest gain in a period of time," Facebook CTO Mike Schroepfer told journalists today. For context, as recently as four years ago, Facebook removed no content with AI. The data comes from Facebook's Community Standards Enforcement Report (CSER) report, which says AI detected 88.8% of the hate speech content removed by Facebook in Q1 2020, up from 80.2% in the previous quarter. Schroepfer attributes the growth to advances in language models like XLM. Another potential factor: As a result of COVID-19, Facebook also sent some of its human moderators home, though Schroepfer said Facebook moderators can now do some work from home.

"I'm not naive; AI is not the answer to every single problem," Schroepfer said. "I think humans are going to be in the loop for the indefinite future. I think these problems are fundamentally human problems about life and communication, and so we want humans in control and making the final decisions, especially when the problems are nuanced. But what we can do with AI is, you know, take the common tasks, the billion scale tasks, the drudgery out."

Facebook AI Research today also launched the Hateful Memes data set of 10,000 mean memes scraped from public Facebook groups in the U.S. The Hateful Memes challenge will offer $100,000 in prizes for top-performing networks, with a final competition at leading machine learning conference NeurIPS in December. Hateful Memes at NeurIPS follows the Facebook Deepfake Detection Challenge held at NeurIPS in 2019.

The Hateful Memes data set is made to assess the performance of models for removing hate speech and to fine-tune and test multimodal learning models, which take input from multiple forms of media to measure multimodal understanding and reasoning. The paper includes documentation on the performance of a range of BERT-derived unimodal and multimodal models. The most accurate AI-driven multimodal model -- Visual BERT COCO -- achieves 64.7% accuracy, while humans demonstrated 85% accuracy on the data set, reflecting the difficulty of the challenge.

Put together by an external team of annotators (not including Facebook moderators), the most common memes in the data set target race, ethnicity, or gender. Memes categorized as comparing people with animals, invoking negative stereotypes, or using mocking hate speech -- which Facebook community standards considers a form of hate speech -- are also common in the data set.

Facebook today also shared additional information about how it's using AI to combat COVID-19 misinformation and stop merchants scamming on the platform. Under development for years at Facebook, SimSearchNet is a convolutional neural network for recognizing duplicate content, and it's being used to apply warning labels to content deemed untrustworthy by dozens of independent human fact-checker organizations around the world. Warning labels were applied to 50 million posts in the month of April. Encouragingly, Facebook users click through to content with warning labels only 5% of the time, on average. Computer vision is also being used to automatically detect and reject ads for COVID-19 testing kits, medical face masks, and other items Facebook does not allow on its platform.

Multimodal learning

Machine learning experts like Google AI chief Jeff Dean called progress on multimodal models a trend in 2020. Indeed, multimodal learning has been used to do things like automatically comment on videos and caption images. Multimodal systems like CLEVRER from MIT-IBM Watson Lab are also applying NLP and computer vision to improve AI systems' ability to carry out accurate visual reasoning.

Excluded from the data set are memes that call for violence, self injury, or nudity or encourage terrorism or human trafficking.

The memes were made using a custom tool and text scraped from meme imagery in public Facebook groups. In order to overcome licensing issues common to memes, Getty Images API photos are used to replace the background image and create new memes. Annotators were required to verify that each new meme retained the meaning and intent of the original.

The Hateful Meme data set learns with what Facebook calls benign confounders, or memes whose meaning shifts based on changing images that appear behind meme text.

"Hate speech is an important societal problem, and addressing it requires improvements in the capabilities of modern machine learning systems. Detecting hate speech in memes requires reasoning about subtle cues, and the task was constructed such that unimodal models find it difficult, by including 'benign confounders' that flip the label of a multimodal hateful meme," Facebook AI Research coauthors said in a paper detailing the Hateful Memes data set that was shared with VentureBeat.

The evolution of visual reasoning like the kind sought by the Hateful Meme data set and challenge can help AI better detect hate speech and determine whether memes violate Facebook policy. Accurate multimodal systems may also mean Facebook avoids engaging in counterspeech, when human or AI moderators unintentionally censor content from activists speaking out against hate speech instead of actual hate speech.

Removing hate speech from the internet is the right thing to do, but quick hate speech detection is also in Facebook's economic interests. After EU regulators spent years urging Facebook to adopt stricter measures, German lawmakers passed a law requiring social media companies with more than 1 million users to quickly remove hate speech or face fines of up to €50 million.

Governments have urged Facebook to moderate content in order to address problems like terrorist propaganda and election meddling, particularly following backlash from the Cambridge Analytica scandal, and Facebook and its CEO Mark Zuckerberg have promised more human and AI moderation.

Multimodal learning

More