AI could make trolls think twice before retweeting offensive content

While hate speech in the U.S. dates back to the country's earliest days, the rise of social media has thrust hateful keyboard warriors into the spotlight. Many social media platforms have attempted to counter hate speech using AI-driven tactics, but the number of vile messages spread on sites like Twitter and Facebook remains steady. While there is no easy fix for this deep-rooted problem, a few third-party companies are taking a stab at it with anti-hate bots designed to clean up the sludge on some of the most popular social platforms.

We Counter Hate

Possible recently launched a new campaign called We Counter Hate that aims to curb the spread of hate speech on Twitter by turning retweets of hate speech into donations to an organization called Life After Hate. The company teamed up with Spredfast to train AI to identify Twitter users spreading hateful messages. Once the AI detects a hateful tweet, human moderators step in to determine the appropriate response.

If a moderator decides the tweet picked up by the AI is, in fact, hate speech, they send a counter message that says, "This hate tweet is now being countered. Think twice before retweeting. For every retweet, a donation will be committed to a non-profit fighting for equality, inclusion, and diversity." The message also links to the campaign's website to provide additional information.

In an email to VentureBeat, a Possible spokesperson said, "This reply permanently marks these messages of hate and makes it clear to those who wish to spread hate speech that each retweet of this message equals a $1 donation to U.S. non-profit Life After Hate, an organization that helps reform and remove people from violent extremist groups."

The team selected Twitter as their first target because it seems to be the megaphone of choice for hate groups.

We did some digging and found an example of the bot at work.

How the machine determines hate speech

The team at Possible adapted Gregory Staunton's 10 Stages of Genocide to build a system for identifying hate speech. They then took Staunton's structure and condensed its points to include only those relevant to the Twittersphere. The team also added contemporary situations found on social media, like coded language.

The company spokesperson shared the table below to explain the AI's structure for defining hate speech.

A look at the AI driving this campaign

Possible's technology employs machine learning to analyze thousands of tweets and return hate classifications within milliseconds. Possible's spokesperson noted that the platform is flexible enough to allow moderators to adapt the technology as they identify new terminologies used by hate groups on social media.

The first step the company took in the process of bringing this campaign to life was building the machine. To do this, they leveraged enterprise-level AI platforms for natural language processing and image recognition APIs to review and interpret tweets in real time.

The next step in the process was to train the machine. This is where Possible worked with Spredfast to use its intelligent social listening platform to moderate incoming messages and categorize them into streams of hate speech. The team at Possible feeds these streams into the machine on an ongoing basis so it can understand the linguistic nuances and continue learning.

Although AI can help filter through a massive number of tweets to find potentially hateful messages, machines are not perfect at identifying all instances of hate speech. Some innocent messages could be misinterpreted by the machine based on certain words or phrases. This is why human moderators step in to evaluate the machine's work and respond only to tweets that actually include hateful content.

Where does the money come from?

Obviously, this campaign requires some serious funding to work. The whole idea runs on donations from the public. Those who are interested in the cause can pledge to donate a certain dollar amount per month that goes toward sponsoring donations to counter hateful tweets. Excluding the service fee collected by the online fundraising platform, Public Good, the company spokesperson says all donations go directly to the campaign's beneficiary.

We Counter Hate also accepts suggestions from the public for Twitter handles they should watch for hate speech.

Will it work?

It's hard to put faith in a third-party solution when Twitter itself can't seem to find an effective way to curb hate speech on its platform. Possible has set itself a hefty goal with this project, and while its strategy is intriguing, the team will definitely have their work cut out for them. Here's hoping their efforts put at least a small dent in the amount of hate speech we see in our Twitter feeds.