Sentropy raises $13 million to develop AI tools that detect online abuse

Sentropy, a startup developing platform-agnostic tools to detect online abuse, emerged from stealth today with $13 million in venture capital. Its products aim to address the lack of oversight on social media platforms, where content moderators only review a small portion of the millions of examples of abuse flagged daily.

Sentropy counts among its workforce former Apple, Microsoft, and Lattice employees, as well as the former head of strategy and operations at Palantir. A number of the company's founders worked together at Apple acquisition target Lattice Data, which collaborated with the U.S. Defense Advanced Research Projects Agency (DARPA) through the Memex program to fight human trafficking. As a part of the effort, they looked at the signals from sources like Craigslist, Backpage, and dark web forums, like language commonly be associated with trafficking, and this inspired Sentropy's solution.

It's a timely launch. According to the Pew Research Center, 4 in 10 Americans have personally experienced some form of online harassment. And 37% of U.S.-based internet users say they've been the target of severe attacks -- including sexual harassment and stalking -- based on their sexual orientation, religion, race, ethnicity, gender identity, or disability.

Sentropy's flagship product, which has been in private beta testing since June of last year, provides API access to classification technologies, with recommendations for addressing harassment. Sentropy Defend, a browser-based interface, supports end-to-end moderation workflows, while Sentropy Detect furnishes tools to identify forms of abuse, discover behavioral trends, and make moderation decisions.

Both Defend and Detect ship with intuitive workflow tooling and "constantly evolving" detection models tuned to community norms. The platform monitors the web for new behaviors and adapts to individual community guidelines and norms, with the goal of reducing abuse and driving deeper engagement.

Given a brief sentence, Sentropy says it can identify attacks, threats, or hatred directed at people based on a shared identity or affiliation, including gender, race, nationality, sexual orientation, religion, government, country, or political group. By looking at dehumanizing speech and other vulgarities, expressions of contempt or disgust, and calls for violence and exclusion, the platform aims to spot:

Insults referring to a person's physical traits (including race, sex, appearance), intelligence, personality, or behavior.
Threats of physical violence -- for example, expressing a desire to physically harm a person or groups of people (including violent sexual acts), advocating for the death of a person or groups, or encouraging another person to commit self-injury or violence.
Self-harm, like mentioning wanting to deliberately harm one's own body via cutting, burning, or other dangerous behaviors; threatening suicide or conveying suicidal ideation; or advising another person on how to commit self-harm or suicide.
Sexual aggression, which refers to obscene, graphic, sexual language directed at a person, such as the threat of unwanted sexual acts.
White supremacists who seek to revive and implement ideologies like white racial superiority, white cultural supremacy and nostalgia, white nationalism, eugenics, Western traditional gender roles, racism, homophobia, xenophobia, anti-Semitism, Holocaust denial, Jewish conspiracy theories, and praise of Adolf Hitler.

It's worth noting that semi-automated moderation remains an unsolved challenge. Last year, researchers showed that Perceive, a tool developed by Google and its subsidiary Jigsaw, often classified online comments written in the African American vernacular as toxic. A separate study revealed that bad grammar and awkward spelling -- like "Ihateyou love" instead of "I hate you" -- make toxic content far more difficult for AI and machine detectors to spot.

Even perceived pack leaders in the abuse detection domain have attracted criticism for their approaches. Facebook, which claims it can now spot 88.8% of hate speech on its platform proactively, was the subject of a scathing report from NYU's Stern Center for Business and Human Rights that estimated the company makes around 300,000 moderation mistakes per day.

But Palo Alto-based Sentropy claims it has taken steps to minimize any potential bias in its systems with "embedded bias mitigation" and "deep bias" research. During the private beta, for example, it traced the speed at which anti-Asian racism grew within the first few months of the COVID-19 pandemic, and it fine-tuned its models to take into account newly coined racist phrases like "ching-demic," "Shanghai shivers," and "kung flu" surfacing around the web. (More than 100 variants of abusive language were directed toward Asian peoples and cultures, Sentropy found, 85% of which were specifically related to COVID-19.)

"Spending time in digital communities, one thing that stands out is the rapid pace at which language morphs and develops over time," a Sentropy spokesperson told VentureBeat via email. "Machine learning aids [communities] in detecting totally new linguistic signals -- all so that [they] can better protect those who are most vulnerable to abuse."

Alexis Ohanian's and Garry Tan's Initialized Capital contributed much of the backing for this initial round. Additional investors include King River Capital; Horizons; Playground Global; founders and leaders from Riot Games, Nextdoor, Twitch, OpenAI, and Twitter; and a former head of state.