How ChatGPT and other advanced AI tools are helping secure the software supply chain

The software supply chain is the infrastructure of the modern world — so the importance of securing it cannot be overstated.

This is, however, complicated by the fact that it is so widespread and disparate, a cobbling together of various open-source code and tools. In fact, 97% of applications are estimated to contain open-source code.

But, experts say, increasingly evolving AI tools such as ChatGPT and other large language models (LLMs) are a boon to software supply chain security — from vulnerability detection and management, to vulnerability patching and real-time intelligence gathering.

"These new technologies offer exciting possibilities for improving software security," said Mikaela Pisani-Leal, ML lead at product development company Rootstrap, "and are sure to become an increasingly important tool for developers and security professionals."

Identifying vulnerabilities not otherwise seen

For starters, experts say, AI can be used to more quickly and accurately identify vulnerabilities in open-source code.

One example is DroidGPT from open-source developer tool platform Endor Labs. The tool is overlaid with risk scores revealing the quality, popularity, trustworthiness and security of each software package, according to the company. Developers can question code validity to GPT in a conversational manner. For example:

“What are the best logging packages for Java?”
“What packages in Go have a similar function as log4j?”
“What packages are similar to go-memdb?”
“Which Go packages have the least known vulnerabilities?”

Generally speaking, AI tools like these can scan code for vulnerabilities at scale and can learn to identify new vulnerabilities as they emerge, explained Marshall Jung, lead solutions architect at AI code and development platform company Tabnine. This is, of course, with some help from human supervisors, he emphasized.

One example of this is an autoencoder, or an unsupervised learning technique using neural networks for representational learning, he said. Another is one-class support vector machines (SVMs), or supervised models with algorithms that analyze data for classification and regression.

With such automated code analysis, developers can analyze code for potential vulnerabilities quickly and accurately, providing suggestions for improvements and fixes, said Pisani-Leal. This automated process is particularly useful in identifying common security issues like buffer overflows, injection attacks and other flaws that could be exploited by cybercriminals, she said.

Similarly, automation can help speed up the testing process by allowing integration and end-to-end tests to run continuously and quickly identify issues in production. Also, by automating compliance monitoring (such as for GDPR and HIPAA), organizations can identify issues early on and avoid costly fines and reputational damage, she said.

“By automating testing, developers can be confident that their code is secure and robust before it is deployed,” said Pisani-Leal.

Patch vulnerabilities, real-time intelligence

Furthermore, AI can be used to patch vulnerabilities in open-source code, said Jung. It can automate the process of identifying and applying patches via neural networks for natural language processing (NLP) pattern matching or KNN on code embeddings, which can save time and resources.

Perhaps most importantly, AI can be used to educate developers about security best practices, he said. This can help developers write more secure code and identify and mitigate vulnerabilities.

“I believe this is where LLM technologies really shine,” said Jung.

When trained on secure and reviewed repositories, LLM AI tools can recommend best practices to developers in real time, negating the need to catch and fix vulnerabilities in an automatic pull/merge request (PR/MR).

“An ounce of prevention is worth a pound of bug fixes, as they say,” said Jung.

Putting GPT to the security test

The advent of LLMs including GPT-4 and ChatGPT empowers developers to test the security of open-source projects — and very quickly yield high-quality results, said Jason Kent, hacker in residence at API security platform Cequence Security.

It makes sense for the automation to occur on the user end (rather than in a top-down fashion), he said. An LLM can be brought into an open-source project; it can process, suggest and automatically deploy it internally; then a system can consume the ChatGPT output and integrate that into the project.

“It could be a nice workflow that would create a much better project in the long run,” Kent said.

As part of this process, developers can continue to ask ChatGPT if code or libraries are secure.

Kent put this to the test, asking ChatGPT to analyze some code and identify any flaws and how to fix them:

"Do you see anything wrong with this? String pw = "123456"; // this would come from the user String query = "SELECT * from users where name = 'USER' " + "and password = '" + pw + "'"

ChatGPT replied, “Yes, there are potential security issues with this code snippet.”

The model explained that the code was concatenating (linking) a user-supplied string pw directly into the SQL query without any input validation or sanitization.

“This makes the code vulnerable to SQL injection attacks,” the model said, “where an attacker can manipulate the user input to execute malicious SQL code and potentially compromise the database.”

A better approach, according to ChatGPT, would be to use prepared statements and parameterized queries to safely pass user inputs to the SQL query. Java, for instance, allows users to provide PreparedStatement to create parameterized queries. (ChatGPT then provided an example.)

“Don’t let me oversell this, it isn’t perfect,” said Kent. “It has learned from humans after all. But, what if we could take an open-source project and cleave off 80% of its vulnerabilities?”

Identifying vulnerabilities not otherwise seen

Patch vulnerabilities, real-time intelligence

Putting GPT to the security test

More