OpenAI promotes GPT-4 as a way to reduce burden on human content moderators

One of the most unsung jobs of the internet era is that of the content moderator.

Casey Newton, Adrien Chen and others have previously reported eloquently and harrowingly on the plight of these laborers, who number in the thousands and are tasked by large social networks such as Facebook with reviewing troves of user-generated content for violations and removing it from said platforms.

The content they are exposed to often includes detailed descriptions and photographic or video evidence of humanity at its worst — such as depictions of child sexual abuse — not to mention various other crimes, atrocities and horrors.

Moderators charged with identifying and removing this content have reported struggling with post-traumatic stress disorder (PTSD), anxiety and various other mental illnesses and psychological maladies due to their exposure.

AI shouldering content moderation

Wouldn't it be an improvement of an artificial intelligence (AI) program could shoulder some, or potentially even most, of the load of online content moderation?

That's the hope of OpenAI, which today published a blog post detailing its findings that GPT-4 — its latest publicly available large language model (LLM) that forms the backbone of one version of ChatGPT — can be used effectively to moderate content for other companies and organizations.

"We believe this offers a more positive vision of the future of digital platforms, where AI can help moderate online traffic according to platform-specific policy and relieve the mental burden of a large number of human moderators," write OpenAI authors Lilian Weng View, Vik Goel and Andrea Vallone.

In fact, according to OpenAI's research, GPT-4 trained for content moderation performs better than human moderators with minimal training, although both are still outperformed by highly trained and experienced human mods.

_{Credit: OpenAI}

How GPT-4's content moderation works

OpenAI outlines a 3-step framework for training its LLMs, including ChatGPT 4, to moderate content according to a hypothetical organization's given policies.

The first step in the process includes drafting the content policy — presumably this is done by humans, although OpenAI's blog post does not specify this — then identifying a "golden set" of data that human moderators will label. This data could include content that is obviously in violation of policies or content that is more ambiguous, but still ultimately deemed by human moderators to be in violation. It might also include examples of data that is clearly in-line with the policies.

Whatever the golden data set, the labels will be used to compare the performance of an AI model. Step two is taking the model, in this case GPT-4, and prompting it to read the content policy and then review the same "golden" dataset, and assign it its own labels.

Finally, a human supervisor would compare GPT-4's labeling to those originally created by humans. If there are discrepancies, or examples of content that GPT-4 "got wrong" or labeled incorrectly, the human supervisors(s) could then ask GPT-4 to explain its reasoning for the label. Once the model describes its reasoning, the human may see a way to rewrite or clarify the original content policy to ensure GPT-4 reads it and follows this instruction going forward.

"This iterative process yields refined content policies that are translated into classifiers, enabling the deployment of the policy and content moderation at scale," write the OpenAI authors.

The OpenAI blog post also goes on to describe how this approach excels over "traditional approaches to content moderation," namely, by creating "more consistent labels" compared to an army of human moderators who may be interpreting content differently according to the same policy, a "faster feedback loop" for updating content policies to account for new violations, and, of course, a "reduced mental burden" on human content moderators, who might presumably be called in only to help train the LLM or diagnose issues with it, and leave all of the front-line and bulk of the moderation work to it.

Calling out Anthropic

OpenAI's blog post and promotion of content moderation as a good use case for its signature LLMs makes sense especially alongside its recent investment and partnership with media organizations including The Associated Press and the American Journalism Project. Media organizations have long struggled with effectively moderating reader comments on articles, while still allowing for freedom of speech, discussion and debate.

Interestingly, OpenAI's blog post also took the time to call out the "Constitutional AI" framework espoused by rival Anthropic for its Claude and Claude 2 LLMs, in which an AI is trained to follow a single human-derived ethical framework in all of its responses.

"Different from Constitutional AI (Bai, et al. 2022) which mainly relies on the model's own internalized judgment of what is safe vs. not, our approach makes platform-specific content policy iteration much faster and less effortful," write the Open AI authors. "We encourage trust and safety practitioners to try out this process for content moderation, as anyone with OpenAI API access can implement the same experiments today."

The dig comes just one day after Anthropic, arguably the leading proponent of Constitutional AI, received a $100 million investment to create a telecom-specific LLM.

A noteworthy irony

There is of course a noteworthy irony to OpenAI's promotion of GPT-4 as a way to ease the mental burden of human content moderators: according to detailed investigative reports published in Time magazine and The Wall Street Journal, OpenAI itself employed human content moderators in Kenya through contractors and subcontractors such as Sama, to read content, including AI-generated content, and label it according to the severity of the severity of the harms described.

As Time reported, these human laborers were paid less than $2 (USD) per hour for their work, and both reports indicate that workers experienced lasting trauma and mental illness from it.

"One Sama worker tasked with reading and labeling text for OpenAI told Time he suffered from recurring visions after reading a graphic description of a man having sex with a dog in the presence of a young child," the Time article states.

Workers recently petitioned the government of Kenya to enact new laws that would further protect and provide for content moderators.

Perhaps then, OpenAI's automated content moderation push is in some sense, a way of making amends or preventing future harms like the ones that were involved in its creation.

AI shouldering content moderation

How GPT-4's content moderation works

Calling out Anthropic

A noteworthy irony

More