It’s generative AI models vs. hackers at DEF CON’s AI Village

One of the most effective ways of testing an application’s security is through the use of adversarial attacks. In this method, security researchers actively attack the technology — in a controlled environment — to try and find previously unknown vulnerabilities.

It's an approach that’s now being advocated by the Biden-Harris administration to help secure generative artificial intelligence (AI). As part of its Actions to Promote Responsible AI announcement yesterday, the administration called for the conducting of public assessments on existing generative AI systems. As a result, this year’s DEF CON 31 security conference, being held August 10–13, will feature a public assessment of generative AI at the AI Village.

"This independent exercise will provide critical information to researchers and the public about the impacts of these models, and will enable AI companies and developers to take steps to fix issues found in those models," the White House stated in a release.

Some of the leading vendors in the generative AI space will be participating in the AI Village hack, including: Anthropic, Google, Hugging Face, Microsoft, Nvidia, OpenAI and Stability AI.

DEF CON villages have a history of advancing security knowledge

The DEF CON security conference is one of the largest gatherings of security researchers in any given year and has long been a location where new vulnerabilities have been discovered and disclosed.

This won't be the first time that a village at DEF CON will be taking aim at a technology that is making national headlines, either. In years past, especially after the 2016 U.S. election and fears over election interference, a Voting Village was set up at DEF CON in an effort to look at the security (or lack thereof) in voting machine technologies, infrastructure and processes.

_{Image source: AI Village.}

With the villages at DEF CON, attendees are able to discuss and probe into technologies in a responsible disclosure model that aims to help improve the state of security overall. With AI, there is a particular need to examine the technology for risks as it becomes more widely deployed into society at large.

How the generative AI hack will work

Sven Cattell, the founder of AI Village, commented in a statement that, traditionally, companies have solved the problem of identifying risks by using specialized red teams.

A red team is a type of cybersecurity group that simulates attacks in an effort to detect potential issues. The challenge with generative AI, according to Cattell, is that a lot of the work around generative AI has happened in private, without the benefit of a red team evaluation.

"The diverse issues with these models will not be resolved until more people know how to red team and assess them," Cattell said.

In terms of specifics, the AI Village generative AI attack simulation will consist of on-site access to large language models (LLMs) from the participating vendors. The event will have a capture the flag point-system approach where attackers gain points for achieving certain objectives that will demonstrate a range of potentially harmful activities. The individual with the highest number of points will win a "high-end Nvidia GPU."

The evaluation platform the event will run on is being developed by Scale AI. "As foundation model use becomes widespread, it’s critical to ensure that they are evaluated carefully for reliability and accuracy," Alexandr Wang, founder and CEO of Scale, told VentureBeat.

Wang noted that Scale has spent more than seven years building AI systems from the ground up. He claims that his company is also unbiased and not beholden to any single ecosystem. As such, Wang said Scale is able to independently test and evaluate systems to ensure they’re ready to be deployed into production.

"By bringing our expertise to a wider audience at DEF CON, we hope to ensure progress in foundation model capabilities happens alongside progress in model evaluation and safety," Wang said.

DEF CON villages have a history of advancing security knowledge

How the generative AI hack will work

More