Facebook's improved AI isn't preventing harmful content from spreading

Facebook claims it's becoming better at detecting -- and removing -- objectionable content from its platform, despite the fact that misleading, untrue, and otherwise harmful posts continue to make their way into millions of users' feeds. During a briefing with reporters ahead of Facebook's latest Community Standards Enforcement Report, which outlines the actions Facebook took between June and August to remove posts that violate its rules, the company said that it's deployed new AI systems optimized to identify hate speech and misinformation uploaded to Instagram and Facebook before it's reported by members of the community.

Facebook's continued investment in AI content-filtering technologies comes as reports suggest the company is failing to stem the spread of problematic photos, videos, and posts. Buzzfeed News this week reported that according to internal Facebook documents, labels being attached to misleading or false posts around the 2020 U.S. presidential election have had little to no impact on how the posts are being shared. Reuters recently found over three dozen pages and groups that featured discriminatory language about Rohingya refugees and undocumented migrants. In January, Seattle University associate professor Caitlin Carlson published results from an experiment in which she and a colleague collected more than 300 posts that appeared to violate Facebook's hate speech rules and reported them via the service's tools. According to the report, only about half of the posts were ultimately removed.

In its defense, Facebook says that it now proactively detects 94.7% of hate speech it ultimately removes, the same percentage as Q2 2020 and up from 80.5% in all of 2019. It claims 22.1 million hate speech posts were taken down from Facebook and Instagram in Q3, of which 232,400 were appealed and 4,700 were restored. Facebook says it couldn't always offer users the option to appeal decisions due to pandemic-related staffing shortages -- Facebook's moderators, roughly 15,000 of whom are contract employees, have encountered roadblocks while working from home related to the handling of sensitive data. But the company says that it gave people the ability to indicate they disagreed with decisions, which in some cases led to the overturning of takedowns.

To achieve the incremental performance gains and automatically place labels on 150 million pieces of content viewed from the U.S., Facebook says it launched an AI model architecture called Linformer, which is now used to analyze billions of Facebook and Instagram posts. With Linformer, which was made available in open source earlier this year, Facebook says the model's computations increase at a linear rate, making it possible to use larger pieces of training text and theoretically achieve better content detection performance.

Also new is SimSearchNet++, an improved version of Facebook's existing SimSearchNet computer vision algorithm that's trained to match variations of an image with a degree of precision. Deployed as part of a photo indexing system that runs on user-uploaded images, Facebook says it's resilient to manipulations such as crops, blurs, and screenshots and predictive of matching, allowing it to identify more matches while grouping collages of misinformation. For images containing text, moreover, the company claims that SimSearchNet++ can spot matches with "high" accuracy using optical character recognition.

Beyond SimSearchNet++, Facebook says it's developed algorithms to determine when two pieces of content convey the same meaning and that detect variations of content independent fact-checkers have already debunked. (It should be noted that Facebook has reportedly pressured at least a portion of its over 70 third-party international fact-checkers to change their rulings, potentially rendering the new algorithms less useful than they might be otherwise.) The approaches build on technologies including Facebook's ObjectDNA, which focuses on specific objects within an image while ignoring distracting clutter. This allows the algorithms to find reproductions of a claim that incorporates pieces from an image that's been flagged, even if the pictures seem different from each other. Facebook's LASER cross-language sentence-level embedding, meanwhile, represents 93 languages across text and images in ways that enable the algorithms to evaluate the semantic similarity of sentences.

To tackle disinformation, Facebook claims to have begun using a deepfake detection model trained on over 100,000 videos from a unique dataset commissioned for the Deepfake Detection Challenge, an open, collaborative initiative organized by Facebook and other corporations and academic institutions. When a new deepfake video is detected, Facebook taps multiple generative adversarial networks to create new, similar deepfake examples to serve as large-scale training data for its deepfake detection model.

Facebook declined to disclose the accuracy rate of its deepfake detection model, but the early results of the Deepfake Detection challenge imply that deepfakes are a moving target. The top-performing model of over 35,000 from more than 2,000 participants achieved only 82.56% accuracy against the public dataset created for the task.

Facebook also says it built and deployed a framework called Reinforcement Integrity Optimizer (RIO), which uses reinforcement learning to optimize the hate speech classifiers that review content uploaded to Facebook and Instagram. RIO, whose impact wasn't reflected in the newest enforcement report because it was deployed during Q3 2020, guides AI models to learn directly from millions of pieces of content and uses metrics as reward signals to optimize models throughout development. As opposed to Facebook's old classification systems, which were trained on fixed datasets and then deployed to production, RIO continuously evaluates how well it's doing and attempts to learn and adapt to new scenarios, according to Facebook.

Facebook points out that hate speech varies widely from region to region and group to group, and that it can evolve rapidly, drawing on current events and topics like elections. Users often try to disguise hate speech with sarcasm and slang, intentional misspellings, and photo alterations. The conspiracy movement known as QAnon infamously uses codenames and innocuous-sounding hashtags to hide their activities on Facebook and other social media platforms.

A data sampler within RIO estimates the value of rule-violating and rule-following Facebook posts as training examples, deciding which ones will produce the most effective hate speech classifier models. Facebook says it's working to deploy additional RIO modules, including a model optimizer that will enable engineers to write a customized search space of parameters and features; a "deep reinforced controller" that will generate candidate data sampling policies, features, and architectures; and hyperparameters and an enforcement and ranking system simulator to provide the right signals for candidates from the controller.

"In typical AI-powered integrity systems, prediction and enforcement are two separate steps. An AI model predicts whether something is hate speech or an incitement to violence, and then a separate system determines whether to take an action, such as deleting it, demoting it, or sending it for review by a human expert ... This approach has several significant drawbacks, [because] a system might be good at catching hate speech that reaches only very few people but fails to catch other content that is more widely distributed," Facebook explains in a blog post. "With RIO, we don't just have a better sampling of training data. Our system can focus directly on the bottom-line goal of protecting people from seeing this content."

There's a limit to what AI can accomplish, however, particularly with respect to content like memes. When Facebook launched the Hateful Memes dataset, a benchmark made to assess the performance of models for removing hate speech, the most accurate algorithm -- Visual BERT COCO -- achieved 64.7% accuracy, while humans demonstrated 85% accuracy on the dataset. A New York University study published in July estimated that Facebook's AI systems make about 300,000 content moderation mistakes per day, and problematic posts continue to slip through Facebook's filters. In one Facebook group that was created this month and rapidly grew to nearly 400,000 people, members calling for a nationwide recount of the 2020 U.S. presidential election swapped unfounded accusations about alleged election fraud and state vote counts every few seconds.

Countering this last assertion, Facebook says that during the lead-up to the U.S. elections, it removed more than 265,000 pieces of content from Facebook proper and Instagram for violating its voter interference policies. Moreover, the company claims that the prevalence of hate speech on its platform between July and September was as little as 0.10% to 0.11% equating to "10 to 11 views of hate speech for every 10,000 views of content." (It's important to note that the prevalence metric is based on a random sample of posts, measures the reach of content rather than pure post count, and hasn't been evaluated by external sources.)

Potential bias and other shortcomings in Facebook's AI models and datasets threaten to further complicate matters. A recent NBC investigation revealed that on Instagram in the U.S. last year, Black users were about 50% more likely to have their accounts disabled by automated moderation systems than those whose activity indicated they were white. And when Facebook had to send content moderators home and rely more on AI during quarantine, CEO Mark Zuckerberg said mistakes were inevitable because the system often fails to understand context.

Technological challenges aside, groups have blamed Facebook's inconsistent, unclear, and in some cases controversial content moderation policies for stumbles in taking down abusive posts. According to the Wall Street Journal, Facebook often fails to handle user reports swiftly and enforce its own rules, allowing material -- including depictions and praise of "grisly violence" -- to stand, perhaps because many of its moderators are physically distant.

In one instance, 100 Facebook groups affiliated with QAnon grew at a combined pace of over 13,600 new followers a week this summer, according to a New York Times database. In another, Facebook failed to enforce a year-old "call to arms" policy prohibiting pages from encouraging people to bring weapons to intimidate, allowing Facebook users to organize an event at which two protesters were killed in Kenosha, Wisconsin. Zuckerberg himself allegedly said that former White House advisor Steve Bannon's suggestion that Dr. Anthony Fauci and FBI Director Christopher Wray be beheaded was not enough of a violation of Facebook's rules to permanently suspend him from the platform -- even in light of Twitter's decision to permanently suspend Bannon's account.

Civil rights groups including the Anti-Defamation League, the National Association for the Advancement of Colored People, and Color of Change also claim that Facebook fails to enforce its hate speech policies both in the U.S. and in regions of the world like India and Myanmar, where Facebook has been used to promote violence against and interment of minorities. The groups organized an advertising boycott in which over 1,000 companies reduced spending on social media advertising for a month.

Last week, Facebook revealed that it now combines content identified by users and models into a single collection before filtering, ranking, deduplicating, and handing it off to its thousands of moderators. By using AI to prioritize potentially fraught posts for moderators to review, the idea is to delegate the removal of low-priority content to automatic systems. But a reliance on human moderation isn't necessarily better than leaning heavily on AI. Lawyers involved in a $52 million settlement with Facebook's content moderators earlier this year determined that as many as half of all Facebook moderators may develop mental health issues on the job attributable to exposure to graphic videos, hate speech, and other disturbing material.

Just this week, more than 200 Facebook contractors said in an open letter that the company is making content moderators return to the office during the pandemic because its attempt to rely more heavily on automated systems has "failed." The workers called on Facebook and its outsourcing partners including Accenture and CPL to improve safety and working conditions and offer hazard pay. They also want Facebook to hire all of its moderators directly, let those who live with high-risk people work from home indefinitely, and offer better health care and mental health support.

In response to pressure from lawmakers, the FCC, and others, Facebook implemented rules this summer and fall aimed at tamping down on viral content that violates standards. Members and administrators belonging to groups removed for running afoul of its policies are temporarily unable to create any new groups. Facebook no longer includes any health-related groups in its recommendations, and QAnon is banned across all of the company's platforms. The Facebook Oversight Board, an external group that will make decisions and influence precedents about what kind of content should and shouldn't be allowed on Facebook's platform, began reviewing content moderation cases in October. And Facebook agreed to provide mental health coaching to moderators as it rolls out changes to its moderation tools designed to reduce the impact of viewing harmful content.

But it's becoming increasingly evident that preventing the spread of harmful content on Facebook is an intractable problem -- a problem worsened by the company's purported political favoritism and reluctance to act on research suggesting its algorithms stoke polarization. For all its imperfections, AI could be a part of the solution, but it'll take more than novel algorithms to reverse Facebook's worrisome trend toward divisiveness.