Facebook's redoubled AI efforts won't stop the spread of harmful content

Facebook says it's using AI to prioritize potentially problematic posts for human moderators to review as it works to more quickly remove content that violates its community guidelines. The social media giant previously leveraged machine learning models to proactively take down low-priority content and left high-priority content reported by users to human reviewers. But Facebook claims it now combines content identified by users and models into a single collection before filtering, ranking, deduplicating, and handing it off to thousands of moderators, many of whom are contract employees.

Facebook's continued investment in moderation comes as reports suggest the company is failing to stem the spread of misinformation, disinformation, and hate speech on its platform. Reuters recently found over three dozen pages and groups that featured discriminatory language about Rohingya refugees and undocumented migrants. In January, Seattle University associate professor Caitlin Carlson published results from an experiment in which she and a colleague collected more than 300 posts that appeared to violate Facebook's hate speech rules and reported them via the service's tools. According to the report, only about half of the posts were ultimately removed. More recently, civil rights groups including the Anti-Defamation League, the National Association for the Advancement of Colored People, and Color of Change claimed that Facebook fails to enforce its hate speech policies. The groups organized an advertising boycott in which over 1,000 companies reduced spending on social media advertising for a month.

Facebook says its AI systems now give potentially objectionable content that's being shared quickly on Facebook, Instagram, Facebook Messenger, and other Facebook properties greater weight than content with few shares or views. Messages, photos, and videos relating to real-world harm, like suicide, self-harm, terrorism, and child exploitation, are prioritized over other categories (like spam) as they're reported or detected. Beyond this, posts containing signals similar to content that previously violated Facebook's policies are more likely to reach the top of the moderation queue.

Using a technique called "whole post integrity embeddings," or WPIE, Facebook's systems ingest deluges of information, including images, videos, text titles and bodies, comments, text in images from optical character recognition, transcribed text from audio recordings, user profiles, interactions between users, external context from the web, and knowledge base information. A representation learning stage enables the systems to automatically discover representations needed to detect commonalities in harmful content from the data. Then fusion models combine the representations to create millions of content representations, or embeddings, which are used to train supervised multitask learning and self-supervised learning models that flag content for each category of violations.

One of these models is XLM-R, a natural language understanding algorithm Facebook is also using to match people in need through its Community Hub. Facebook says that XLM-R, which was trained on 2.5 terabytes of webpages and can perform translations between roughly 100 different human languages, allows its content moderation systems to learn across dialects so that "every new human review of a violation makes our system[s] better globally instead of just in the reviewer's language." (Facebook currently has about 15,000 content reviewers who speak over 50 languages combined.)

"It's important to note that all content violations ... still receive some substantial human review -- we're using our system[s] to better prioritize content," Facebook product manager Ryan Barnes told members of the press on Thursday. "We expect to use more automation when violating content is less severe, especially if the content isn't viral, or being ... quickly shared by a large number of people [on Facebook platforms]."

Across many of its divisions, Facebook has for years been moving broadly toward self-supervised learning, in which unlabeled data is used in conjunction with small amounts of labeled data to produce an improvement in learning accuracy. Facebook claims its deep entity classification (DEC) machine learning framework was responsible for a 20% reduction in abusive accounts on the platform in the two years since it was deployed and that its SybilEdge system can detect fake accounts less than a week old with fewer than 20 friend requests. In a separate experiment, Facebook researchers say they were able to train a language understanding model that made more precise predictions with just 80 hours of data compared with 12,000 hours of manually labeled data.

For virility prediction, Facebook relies on a supervised machine learning model that looks at past examples of posts and the number of views they racked up over time. Rather than analyzing the view history in isolation, the model takes into account things like trends and privacy settings on the post (i.e., whether it was only viewable by friends).

Virility prediction aside, Facebook asserts that this embrace of self-supervised techniques -- along with automatic content prioritization -- has allowed it to address harmful content faster while letting human review teams spend more time on complex decisions, like those involving bullying and harassment. Among other metrics, the company points to its Community Standards Enforcement Report, which covered April 2020 through June 2020 and showed that the company's AI detected 95% of hate speech taken down in Q2 2020. However, it's unclear the extent to which that's true.

Facebook admitted that much of the content flagged in the Wall Street Journal report would have been given low priority for review because it had less potential to go viral. Facebook failed to remove pages and accounts belonging to those who coordinated what resulted in deadly shootings in Kenosha, Wisconsin at the end of August, according to a lawsuit. Nonprofit activism group Avaaz found that misleading content generated an estimated 3.8 billion views on Facebook over the past year, with the spread of medical disinformation (particularly about COVID-19) outstripping that of information from trustworthy sources. And Facebook users in Papua New Guinea say the company has been slow or failed to remove child abuse content, with ABC Science identifying a naked image of a young girl on a page with over 6,000 followers.

There's a limit to what AI can accomplish, particularly with respect to content like memes and sophisticated deepfakes. The top-performing model of over 35,000 from more than 2,000 participants in Facebook's Deepfake Detection Challenge achieved only 82.56% accuracy against a public dataset of 100,000 videos created for the task. When Facebook launched the Hateful Memes dataset, a benchmark made to assess the performance of models for removing hate speech, the most accurate algorithm -- Visual BERT COCO -- achieved 64.7% accuracy, while humans demonstrated 85% accuracy on the dataset. And a New York University study published in July estimated that Facebook's AI systems make about 300,000 content moderation mistakes per day.

Potential bias and other shortcomings in Facebook's AI models and datasets threaten to further complicate matters. A recent NBC investigation revealed that on Instagram in the U.S. last year, Black users were about 50% more likely to have their accounts disabled by automated moderation systems than those whose activity indicated they were white. And when Facebook had to send content moderators home and rely more on AI during quarantine, CEO Mark Zuckerberg said mistakes were inevitable because the system often fails to understand context.

Technological challenges aside, groups have blamed Facebook's inconsistent, unclear, and in some cases controversial content moderation policies for stumbles in taking down abusive posts. According to the Wall Street Journal, Facebook often fails to handle user reports swiftly and enforce its own rules, allowing material -- including depictions and praise of "grisly violence" -- to stand, perhaps because many of its moderators are physically distant and don't recognize the gravity of the content they're reviewing. In one instance, 100 Facebook groups affiliated with QAnon, a conspiracy labeled by the FBI a domestic terrorist threat, grew at a combined pace of over 13,600 new followers a week this summer, according to a New York Times database.

In response to pressure, Facebook implemented rules this summer and fall aimed at tamping down on viral content that violates standards. Members and administrators belonging to groups removed for running afoul of its policies are temporarily unable to create any new groups. Facebook no longer includes any health-related groups in its recommendations, and QAnon is banned across all of the company's platforms. Facebook is applying labels to -- but not removing -- politicians' posts that break its rules. And the Facebook Oversight Board, an external group that will make decisions and influence precedents about what kind of content should and shouldn't be allowed on Facebook's platform, began reviewing content moderation cases in October.

Facebook has also adopted an ad hoc approach to hate speech moderation to meet political realities in certain regions around the world. The company's hate speech rules are stricter in Germany than in the U.S. In Singapore, Facebook agreed to append a "correction notice" to news stories deemed false by the government. And in Vietnam, Facebook said it would restrict access to "dissident" content deemed illegal in exchange for the government ending its practice of disrupting the company's local servers.

Meanwhile, problematic posts continue to slip through Facebook's filters. In one Facebook group that was created this past week and rapidly grew to nearly 400,000 people, members calling for a nationwide recount of the 2020 U.S. presidential election swapped unfounded accusations about alleged election fraud and state vote counts every few seconds.

"The system is about marrying AI and human reviewers to make less total mistakes," Facebook’s Chris Parlow, part of the company’s moderator engineering team, said during the briefing. "The AI is never going to be perfect."