OpenAI's leadership drama underscores why its GPT model security needs fixing

The leadership drama unfolding at OpenAI underscores how important it is to have security built into the company’s GPT model creation process.

The drastic action by the OpenAI board Friday to fire CEO Sam Altman led to the reported possible departure of senior architects responsible for AI security, which heightens concerns by potential enterprise users of GPT models about their risks.

Security must be built into the creation process of AI models for them to scale and outlast any leader and their team, but that hasn’t happened yet.

Indeed, the OpenAI board fired CEO Sam Altman Friday, apparently partly for moving too fast on the product and business side, and neglecting the company’s mandate for ensuring safety and security in the company’s models.

This is a part of the new wild west of AI: Tension and conflict is created when boards with independent directors want greater control over safety and need, and need to balance the commerce about risks with pressures to grow.

So if co-founder Ilya Sutskever and the independent board members supporting him in the leadership change Friday manage to hang on – in the face of significant blowback over the weekend from investors and other supporters of Altman – here are some of security issues that researchers and others have found that underscore how security needs to be injected much earlier in the GPT software development lifecycle.

Data privacy and leakage security

Brian Roemmele, editor of the award-winning expert prompt engineer, wrote Saturday about a security hole he discovered in GPTs made by OpenAI. The vulnerability enables ChatGPT to download or display the prompt information and the uploaded files of a given session. He advises what should be added to GPT prompts to alleviate the risk in the session below:

A related problem was observed in March, when Open AI admitted to, and then patched, a bug in an open-source library that allowed users to see titles from another active user's chat history. It was also possible that the first message of a newly-created conversation was visible in someone else's chat history if both users were active around the same time. OpenAI said the vulnerability was in the Redis memory database, which the company uses to store user information. "The bug also unintentionally provided visibility of payment-related information of 1.2% of active ChatGPT Plus subscribers during a specific nine-hour window," OpenAI said.

Data manipulation and misuse cases are increasing

Despite claims of guardrails for GPT sessions, attackers are fine-tuning their tradecraft in prompt engineering to overcome them. One is creating hypothetical situations and asking GTP models for guidance on how to solve the problem or using languages. Brown University researchers found that "using less common languages like Zulu and Gaelic, they could bypass various restrictions. The researchers claim they had a 79% success rate running typically restricted prompts in those non-English tongues versus a less than 1% success rate using English alone." The team observed that "we find that simply translating unsafe inputs to low-resource natural languages using Google Translate is sufficient to bypass safeguards and elicit harmful responses from GPT-4."OpenAI's leadership drama underscores why its GPT model security needs fixing

Growing vulnerability to jailbreaks is common

Microsoft researchers evaluated the trustworthiness of GPT models in their research paper, DecodingTrust: A Comprehensive Assessment of Trustworthiness in GPT Models, and found that GPT models "can be easily misled to generate toxic and biased outputs and leak private information in both training data and conversation history. We also find that although GPT-4 is usually more trustworthy than GPT-3.5 on standard benchmarks, GPT-4 is more vulnerable given jailbreaking system or user prompts, which are maliciously designed to bypass the security measures of LLMs, potentially because GPT-4 follows (misleading) instructions more precisely," the researchers concluded.

Researchers found that through carefully scripted dialogues, they could successfully steal internal system prompts of GPT-4V and mislead its answering logic. The finding shows potential exploitable security risks with multimodal large language models (MLLMs). Jailbreaking GPT-4V via Self-Adversarial Attacks with System Prompts published this month show MLLMs' vulnerability to deception and fraudulent activity. The researchers deployed GPT-4 as a red teaming tool against itself, looking to search for potential jailbreak prompts leveraging stolen system prompts. To strengthen the attacks, the researchers included human modifications, which led to an attack success rate of 98.7%. The following GPT-4V session illustrates the researchers' findings.

GPT-4V is vulnerable to multimodal prompt injection image attacks

OpenAI's GPT-4V release supports image uploads, making the company's large language models (LLMs) vulnerable to multimodal injection image attacks. By embedding commands, malicious scripts, and code in images, bad actors can get the LLMs to comply and execute tasks. LLMs don't yet have a data sanitization step in their processing workflow, which leads to every image being trusted. GPT-4V is a primary attack vector for prompt injection attacks and LLMs are fundamentally gullible, programmer Simon Willison writes in a blog post. "(LLMs) only source of information is their training data combined with the information you feed them. If you feed them a prompt that includes malicious instructions—however those instructions are presented—they will follow those instructions," he writes. Willison has also shown how prompt injection can hijack autonomous AI agents like Auto-GPT. He explained how a simple visual prompt injection could start with commands embedded in a single image, followed by an example of a visual prompt injection exfiltration attack.

GPT needs to achieve continuous security

Teams developing the next-generation GPT models are already under enough pressure to get code releases out, achieve aggressive timelines for new features, and respond to bug fixes. Security must be automated and designed from the first phases of new app and code development. It needs to be integral to how a product comes together.

The goal needs to be improving code deployment rates while reducing security risks and improving code quality. Making security a core part of the software development lifecycle (SDLC), along with core metrics and workflows tailored to the unique challenges of iterating GPT, LLM, and MLLM code, needs to happen. Undoubtedly, the GPT devops leaders have years of experience in these areas from previous roles. What makes it so hard in the world of GPT development is that the concepts of software quality assurance and reliability are so new and being defined simultaneously.

High-performing devops teams deploy code 208 times more frequently than low performers. Creating the foundation for devops teams to achieve that needs to start by including security from the initial design phases of any new project. Security must be defined in the initial product specs and across every devops cycle. The goal is to iteratively improve security as a core part of any software product.

By integrating security into the SDLC devops, leaders gain valuable time that would have been spent on stage gate reviews and follow-on meetings. The goal is to get devops and security teams continually collaborating by breaking down the system and process roadblocks that hold each team back.

The greater the collaboration, the greater the shared ownership of deployment rates, improvements in software quality, and security metrics — core measures of each team's performance.

Additional reading:

Ekwere, Paul. Multimodal LLM Security, GPT-4V(ision), and LLM Prompt Injection Attacks. GoPenAI, Medium. Published October 17, 2023.

Liu, Y., Deng, G., Li, Y., Wang, K., Zhang, T., Liu, Y., Wang, H., Zheng, Y., & Liu, Y. (2023). Prompt Injection attack against LLM-integrated Applications. arXiv preprint arXiv:2306.05499. Link: https://arxiv.org/pdf/2306.05499.pdf

OpenAI GPT-4V(ision) system card white paper. Published September 23, 2023

Simon Willison's Weblog, Multimodal prompt injection image attacks against GPT-4V, October 14, 2023.