Generative AI: A pragmatic blueprint for data security

The rapid rise of large language models (LLMs) and generative AI has presented new challenges for security teams everywhere. In creating new ways for data to be accessed, gen AI doesn’t fit traditional security paradigms focused on preventing data from going to people who aren’t supposed to have it.

To enable organizations to move quickly on gen AI without introducing undue risk, security providers need to update their programs, taking into account the new types of risk and how they put pressure on their existing programs.

Untrusted middlemen: A new source of shadow IT

An entire industry is currently being built and expanded on top of LLMs hosted by such services as OpenAI, Hugging Face and Anthropic. In addition, there are a number of open models available such as LLaMA from Meta and GPT-2 from OpenAI.

Access to these models could help employees in an organization solve business challenges. But for a variety of reasons, not everybody is in a position to access these models directly. Instead, employees often look for tools — such as browser extensions, SaaS productivity applications, Slack apps and paid APIs — that promise easy use of the models.

These intermediaries are quickly becoming a new source of shadow IT. Using a Chrome extension to write a better sales email doesn’t feel like using a vendor; it feels like a productivity hack. It’s not obvious to many employees that they are introducing a leak of important sensitive data by sharing all of this with a third party, even if your organization is comfortable with the underlying models and providers themselves.

Training across security boundaries

This type of risk is relatively new to most organizations. Three potential boundaries play into this risk:

Boundaries between users of a foundational model
Boundaries between customers of a company that is fine-tuning on top of a foundational model
Boundaries between users within an organization with different access rights to data used to fine-tune a model

In each of these cases, the issue is understanding what data is going into a model. Only the individuals with access to the training, or fine-tuning, data should have access to the resulting model.

As an example, let's say that an organization uses a product that fine-tunes an LLM using the contents of its productivity suite. How would that tool ensure that I can’t use the model to retrieve information originally sourced from documents I don’t have permission to access? In addition, how would it update that mechanism after the access I originally had was revoked?

These are tractable problems, but they require special consideration.

Privacy violations: Using AI and PII

While privacy considerations aren’t new, using gen AI with personal information can make these issues especially challenging.

In many jurisdictions, automated processing of personal information in order to analyze or predict certain aspects of that person is a regulated activity. Using AI tools can add nuance to these processes and make it more difficult to comply with requirements like offering opt-out.

Another consideration is how training or fine-tuning models on personal information might affect your ability to honor deletion requests, restrictions on repurposing of data, data residency and other challenging privacy and regulatory requirements.

Adapting security programs to AI risks

Vendor security, enterprise security and product security are particularly stretched by the new types of risk introduced by gen AI. Each of these programs needs to adapt to manage risk effectively going forward. Here’s how.

Vendor security: Treat AI tools like those from any other vendor

The starting point for vendor security when it comes to gen AI tools is to treat these tools like the tools you adopt from any other vendor. Ensure that they meet your usual requirements for security and privacy. Your goal is to ensure that they will be a trustworthy steward of your data.

Given the novelty of these tools, many of your vendors may be using them in ways that aren’t the most responsible. As such, you should add considerations into your due diligence process.

You might consider adding questions to your standard questionnaire, for example:

Will data provided by our company be used to train or fine-tune machine learning (ML) models?
How will those models be hosted and deployed?
How will you ensure that models trained or fine-tuned with our data are only accessible to individuals who are both within our organization and have access to that data?
How do you approach the problem of hallucinations in gen AI models?

Your due diligence may take another form, and I’m sure many standard compliance frameworks like SOC 2 and ISO 27001 will be building relevant controls into future versions of their frameworks. Now is the right time to start considering these questions and ensuring that your vendors consider them too.

Enterprise security: Set the right expectations

Each organization has its own approach to the balance between friction and usability. Your organization may have already implemented strict controls around browser extensions and OAuth applications in your SaaS environment. Now is a great time to take another look at your approach to make sure it still strikes the right balance.

Untrusted intermediary applications often take the form of easy-to-install browser extensions or OAuth applications that connect to your existing SaaS applications. These are vectors that can be observed and controlled. The risk of employees using tools that send customer data to an unapproved third party is especially potent now that so many of these tools are offering impressive solutions using gen AI.

In addition to technical controls, it’s important to set expectations with your employees and assume good intentions. Ensure that your colleagues know what is appropriate and what is not when it comes to using these tools. Collaborate with your legal and privacy teams to develop a formal AI policy for employees.

Product security: Transparency builds trust

The biggest change to product security is ensuring that you aren’t becoming an untrusted middleman for your customers. Make it clear in your product how you use customer data with gen AI. Transparency is the first and most powerful tool in building trust.

Your product should also respect the same security boundaries your customers have come to expect. Don’t let individuals access models trained on data they can’t access directly. It’s possible in the future there will be more mainstream technologies to apply fine-grained authorization policies to model access, but we’re still very early in this sea change. Prompt engineering and prompt injection are fascinating new areas of offensive security, and you don’t want your use of these models to become a source of security breaches.

Give your customers options, allowing them to opt in or opt out of your gen AI features. This puts the tools in their hands to choose how they want their data to be used.

At the end of the day, it’s important that you don’t stand in the way of progress. If these tools will make your company more successful, then avoiding them due to fear, uncertainty and doubt may be more of a risk than diving headlong into the conversation.

Rob Picard is head of security at Vanta.

Welcome to the VentureBeat community!

Our guest posting program is where technical experts share insights and provide neutral, non-vested deep dives on AI, data infrastructure, cybersecurity and other cutting-edge technologies shaping the future of enterprise.

Read more from our guest post program — and check out our guidelines if you’re interested in contributing an article of your own!