Legions of DEF CON hackers will attack generative AI models

At the 31st annual DEF CON this weekend, thousands of hackers will join the AI Village to attack some of the world's top large language models — in the largest red-teaming exercise ever for any group of AI models: the Generative Red Team (GRT) Challenge.

According to the National Institute of Standards and Technology (NIST), "red-teaming" refers to "a group of people authorized and organized to emulate a potential adversary's attack or exploitation capabilities against an enterprise's security posture." This is the first public generative AI red team event at DEF CON, which is partnering with organizations Humane Intelligence, SeedAI, and the AI Village. Models provided by Anthropic, Cohere, Google, Hugging Face, Meta, Nvidia, OpenAI and Stability will be tested on an evaluation platform developed by Scale AI.

This challenge was announced by the Biden-Harris administration in May — it is supported by the White House Office of Science, Technology, and Policy (OSTP) and is aligned with the goals of the Biden-Harris Blueprint for an AI Bill of Rights and the NIST AI Risk Management Framework. It will also be adapted into educational programming for the Congressional AI Caucus and other officials.

An OpenAI spokesperson confirmed that GPT-4 will be one of the models available for red-teaming as part of the GRT Challenge.

"Red-teaming has long been a critical part of deployment at OpenAI and we’re pleased to see it becoming a norm across the industry," the spokesperson said. "Not only does it allow us to gather valuable feedback that can make our models stronger and safer, red-teaming also provides different perspectives and more voices to help guide the development of AI.”

DEF CON hackers seek to identify AI model weaknesses

A red-teamer's job is to simulate an adversary, and to do adversarial emulation and simulation against the systems that they're trying to red team, said Alex Levinson, Scale AI's head of security, who has over a decade of experience running red-teaming exercises and events.

"in this context, what we're trying to do is actually emulate behaviors that people might take and identify weaknesses in the models and how they work," he explained. "Every one of these companies develops their models in different ways — they have secret sauces." But, he cautioned, the challenge is not a competition between the models. "This is really an exercise to identify what wasn't known before — it's that unpredictability and being able to say we never thought of that," he said.

The challenge will provide 150 laptop stations and timed access to multiple LLMs from the vendors — the models and AI companies will not be identified in the challenge. The challenge also provides a capture-the-flag (CTF) style point system to promote testing a wide range of harms.

And there's a not-too-shabby grand prize at the end: The individual who gets the highest number of points wins a high-end Nvidia GPU (which sells for over $40,000).

AI companies seeking feedback on embedded harms

Rumman Chowdhury, cofounder of the nonprofit Humane Intelligence, which offers safety, ethics and subject-specific expertise to AI model owners, said in a media briefing that the AI companies providing their models are most excited about the kind of feedback they will get, particularly about the embedded harms and emergent risks that come from automating these new technologies at scale.

Chowdhury pointed to challenges focusing on multilingual harms of AI models: "If you can imagine the breadth of complexity in not just identifying trust and safety mechanisms in English for every kind of nuance, but then trying to translate that into many many languages — that's something that is quite difficult thing to do," she said.

Another challenge, she said, is internal consistency of the models. "It's very difficult to try to create the kinds of safeguards that will perform consistently across a wide range of issues," she explained.

A large-scale red-teaming event

The AI Village organizers said in a press release that they are bringing in hundreds of students from "overlooked institutions and communities" to be among the thousands who will experience the hands-on LLM red-teaming for the first time.

Scale AI's Levinson said that while others have run red-team exercises with one model, the scale of the challenge with so many testers and so many models becomes far more complex — as well as the fact that the organizers want to make sure to cover various principles in the AI Bill of Rights.

"That's what makes the scale of this unique," he said. "I'm sure there are other AI events that have happened, but they've probably been very targeted, like finding great prompt injection. But there's so many more dimensions to safety and security with AI — that's what we're trying to cover here."

That scale, as well as the DEF CON format, which brings together diverse participants, including among those who typically have not participated in the development and deployment of LLMs, is key to the success of the challenge, said Michael Sellitto, interim head of policy and societal impacts at Anthropic.

"Red-teaming is an important part of our work, as was highlighted in the recent AI company commitments announced by the White House, and it is just as important to do externally ... to better understand the risks and limitations of AI technology at scale," he said.

DEF CON hackers seek to identify AI model weaknesses

AI companies seeking feedback on embedded harms

A large-scale red-teaming event

More