Tenable report shows how generative AI is changing security research

Today, vulnerability management provider Tenable published a new report demonstrating how its research team is experimenting with large language models (LLMs) and generative AI to enhance security research.

The research focuses on four new tools designed to help human researchers streamline reverse engineering, vulnerability analysis, code debugging and web application security, and identify cloud-based misconfigurations.

These tools, now available on GitHub, demonstrate that generative AI tools like ChatGPT have a valuable role to play in defensive use cases, particularly when it comes to analyzing code and translating it into human-readable explanations so that defenders can better understand how the code works and its potential vulnerabilities.

“Tenable has already used LLMs to build new tools that are speeding out processes and helping us identify vulnerabilities faster and more efficiently,” the report said. “While these tools are far from replacing security engineers, they can act as a force multiplier and reduce some labor-intensive and complex work when used by experienced researchers.”

Automating reverse engineering with G-3PO

One of the key tools outlined in the research is G-3PO, a translation script for the reverse engineering framework Ghidra. Developed by the NSA, G-3PO is a tool that disassembles code and decompiles it into “something resembling source code” in the C programming language.

Traditionally, a human analyst would need to analyze this against the original assembly listing to ascertain how a piece of code functions. G-3PO automates the process by sending Ghidra’s decompiled C code to an LLM (supporting models from OpenAI and Anthropic) and requests an explanation for what the function does. As a result the researcher can understand the code’s function without having to analyze it manually.

While this can save time, in a YouTube video explaining how G-3PO works, Olivia Fraser, Tenable's zero-day researcher, warns that researchers should always double-check the output for accuracy.

“It goes without saying of course that the output of G-3PO, just like any automated tool, should be taken with a grain of salt and in the case of this tool, probably with several tablespoons of salt,” Fraser said. “Its output should of course always be checked against the decompiled code and against the disassembly, but this is par for the course for the reverse engineer.”

BurpGPT: The web app security AI assistant

Another promising solution is BurpGPT, an extension for application testing software Burp Suite that enables users to use GPT to analyze HTTP requests and responses.

BurpGPT intercepts HTTP traffic and forwards it to the OpenAI API, at which point the traffic is analyzed to identify risks and potential fixes. In the report, Tenable noted that BurpGPT has proved successful at identifying cross site scripting (XSS) vulnerabilities and misconfigured HTTP headers.

This tool therefore demonstrates how LLMs can play a role in reducing manual testing for web application developers, and can be used to partially automate the vulnerability discovery process.

“EscalateGPT appears to be a very promising tool. IAM policies often represent a tangled complex web of privilege assignments. Oversights during policy creation and maintenance often creep in, creating unintentional vulnerabilities that criminals exploit to their advantage. Past breaches against cloud-based data and applications proves this point over and over again,” said Avivah Litan, VP analyst at Gartner in an email to VentureBeat.

EscalateGPT: Identify IAM policy issues with AI

In an attempt to identify IAM policy misconfigurations, Tenable’s research team developed EscalateGPT, a Python tool designed to identify privilege-escalation opportunities in Amazon Web Services IAM.

Essentially, EscalateGPT collects the IAM policies associated with individual users or groups and submits them to the OpenAI API to be processed, asking the LLM to identify potential privilege escalation opportunities and mitigations.

Once this is done, EscalateGPT shares an output detailing the path of privilege escalation and the Amazon Resource Name (ARN) of the policy that could be exploited, and recommends mitigation strategies to fix the vulnerabilities.

More broadly, this use case illustrates how LLMs like GPT-4 can be used to identify misconfigurations in cloud-based environments. For instance, the report notes GPT-4 successfully identified complex scenarios of privilege escalation based on non-trivial policies through multi-IAM accounts.

When taken together, these use cases highlight that LLMs and generative AI can act as a force multiplier for security teams to identify vulnerabilities and process code, but that their output still needs to be checked manually to ensure reliability.