Yelp built an AI system to identify spam and inappropriate photos

Malicious actors are constantly finding ways to circumvent platforms' policies and game their systems -- and 2020 was no exception. According to online harassment tracker L1ght, in the first few weeks of the pandemic, there was a 40% increase in toxicity on popular gaming services including Discord. Anti-fraud experts saw a rise in various types of fraud last year across online platforms, including bank and insurance fraud. And from March 2020 to April 2020, IBM observed a more than 6,000% increase in COVID-19-related spam.

Yelp wasn't immune from the uptick in problematic digital content. With a rise in travel cancellations, the company noticed an increase of images being uploaded with text to promote fake customer support numbers and other promotional spam. To mitigate the issue and automate a solution that relies relied on manual content reporting from its community of users, Yelp says its engineers built a custom, in-house system using machine learning algorithms to analyze hundreds of thousands of photo uploads per day -- detecting inappropriate and spammy photos at scale.

Automating content moderation

Yelp's use of AI and machine learning runs the gamut from advertising to restaurant, salon, and hotel recommendations. The app's Collections feature leverages a combination of machine learning, algorithmic sorting, and manual curation to put local hotspots at users' fingertips. (Deep learning-powered image analysis automatically identifies the color, texture, and shape of objects in user-submitted photos, allowing Yelp to predict attributes like "good for kids" and "ambiance is classy.") Yelp optimizes photos on businesses' listings to serve up the most relevant image for browsing potential customers. And advertisers can opt to have an AI system recommend photos and review content to use in banner ads based on their "impactfulness" with users.

There's also Popular Dishes, Yelp's feature that highlights the name, photos, and reviews of most-ordered restaurant menu items. More recently, the platform added tools to help reopening businesses indicate whether they're taking steps like enforcing distancing and sanitization, employing a combination of human moderation and machine learning to update sections with information businesses have posted elsewhere.

Building the new content moderation system was more challenging than previous AI projects because Yelp engineers had a limited dataset to work with, the company told VentureBeat. Most machine learning algorithms are trained on input data annotated for a particular output until they can detect the underlying relationships between the inputs and output results. During the training phase, the system is fed with labeled datasets, which tell it which output is related to each specific input value.

Yelp's annotated corpora of spam was limited prior to the pandemic and had to be augmented over time. "Ultimately, our engineers developed a multi-stage, multimodel approach for promotional spam and inappropriate content," a spokesperson said. In this context, "inappropriate" refers to spam that runs afoul of Yelp's Content Guidelines, including suggestive or explicit nudity (e.g., revealing clothes, sexual activity), violence (weapons, offensive gestures, hate symbols), and substances like drugs, tobacco, and alcohol.

Yelp also had to ensure that the system understood the context of uploaded content. Unlike most AI systems, humans understand the meaning of text, videos, audio, and images together in context. For example, given text and an image that seem innocuous when considered apart (e.g., "Look how many people love you" and a picture of a barren desert), people recognize that these elements take on potentially hurtful connotations when they're paired or juxtaposed.

Two-part framework

Yelp's anti-spam solution is a two-part framework that first identifies photos most likely to contain spam. During the second stage, flagged content is run through machine learning models tuned for precision, which send only a small amount of photos to be reviewed by human moderators. A set of heuristics play alongside the models to speed up the pipeline and react quickly to new potential spam and inappropriate content.

"We used a custom dataset of tens of thousands of Yelp photos and applied transfer learning to tune pre-trained large-scale models," Vivek Raman, Yelp's VP of engineering for trust and safety, told VentureBeat via email. "The models were trained on GPU-accelerated instances, which made the transfer-learning process training very efficient -- compared to training a deep neural network from scratch. The performance of the models in production is monitored to catch any drift and allow us to react quickly to any evolving threats."

In the case of promotional spam, the system searches for simple graphics that are text- or logo-heavy. Inappropriate content is a bit more complex, so the framework leverages a residual neural network to identify photos that violate Yelp's policies as well as a convolutional neural network model to spot photos containing people. Residual neural networks build on constructs known from pyramidal cells in the cerebral cortex, which transform inputs into outputs of action potentials. Convolutional neural networks, which are similarly inspired by biological processes, are adept at analyzing visual imagery.

When the system detects promotional spam, it extracts the text from the photos using another deep learning neural network and performs classification via a regular expression and a natural language processing service. For inappropriate content, a deep learning model is used to help the framework calibrate for precision based on confidence scores and a set of context heuristics, like business category, that take into account where the content is being displayed.

Combating adversaries

Yelp's heuristics help combat repeat spammers. Photos flagged as spam are tracked by a fuzzy matching service so that if users try to reupload spam, it's automatically discarded by the system. If there's no similar spam match, it could end up in the content moderation team queue.

While awaiting moderation, images are hidden from users so that they're not exposed to potentially unsafe content. And the content moderation team has the ability to act on user profiles instead of single pieces of content. For example, if a user is found to be generating spam, its user profile is closed and all associated content is removed.

AI is by no means a silver bullet when it comes to content moderation. Researchers have documented instances in which automated content moderation tools on platforms such as YouTube mistakenly categorized videos posted by nongovernmental organizations documenting human rights abuses by ISIS in Syria as extremist content and removed them. A New York University study estimates that Facebook's AI systems alone make about 300,000 content moderation mistakes per day, and that problematic posts continue to slip through Facebook's filters.

Raman acknowledges that AI moderation systems are susceptible to bias, but says that Yelp's engineers have taken steps to mitigate it. "[Bias] can come from the conscious or unconscious biases of their designers, or from the datasets themselves ... When designing this system, we used sophisticated sampling techniques specifically to produce balanced training sets with the explicit goal of reducing bias in the system. We also train the model for precision to minimize mistakes or the likelihood of removing false positives."

Raman also asserts that Yelp's new system augments, not replaces, its team of human moderators. The goal is to prioritize the items that moderation teams -- who have the power to restore falsely flagged content -- review rather than take down spam proactively.

"While it's important to leverage technology to create more efficient processes and manage content at scale, it's even more important to create checks and balances through human moderation," Raman said. "Business pages that receive less traffic are less likely to have a consumer or business owner catch and report the content to our moderators -- so, our photo moderation workflow helps weed out suspicious content in a more scalable way."