Mind your language: The risks of using AI-powered chatbots like ChatGPT in an organization

Millions of users have flocked to ChatGPT since its mainstream launch in November 2022. Thanks to its exceptional human-like language generation capabilities, its aptitude for coding software, and its lightning-fast text analysis, ChatGPT has quickly emerged as a go-to tool for developers, researchers and everyday users.

But as with any disruptive technology, generative AI systems like ChatGPT come with potential risks. In particular, major players in the tech industry, state intelligence agencies and other governmental bodies have all raised red flags about sensitive information being fed into AI systems like ChatGPT.

The concern stems from the possibility of such information eventually leaking into the public domain, whether through security breaches or the use of user-generated content to “train” chatbots.

In response to these concerns, tech organizations are taking action to mitigate the security risks associated with large language models (LLMs) and conversational AI (CAI). Several organizations have opted to prohibit the use of ChatGPT altogether, while others have cautioned their staff about the hazards of inputting confidential data into such models.

ChatGPT: A scary AI out in the open?

The AI-powered ChatGPT has become a popular tool for businesses looking to optimize their operations and simplify complex tasks. However, recent incidents have underscored the potential dangers of sharing confidential information through the platform.

In a disturbing development, three instances of sensitive data leakage via ChatGPT were reported in less than a month. The most recent occurred last week. Smartphone manufacturer Samsung was embroiled in controversy when Korean media reported that employees at its main semiconductor plants had entered confidential information, including highly-sensitive source code used to resolve programming errors, into the AI chatbot.

Source code is one of any technology firm’s most closely guarded secrets, as it serves as the foundational building block for any software or operating system. Consequently, prized trade secrets have now inadvertently fallen into the possession of OpenAI, the formidable AI service provider that has taken the tech world by storm.

Despite requests by VentureBeat, Samsung did not comment on the matter, but sources close to the firm revealed that the company has apparently curtailed access for its personnel to ChatGPT.

Other Fortune 500 conglomerates, including Amazon, Walmart and JPMorgan, encountered similar instances of employees accidentally pushing sensitive data into the chatbot.

Reports of Amazon employees using ChatGPT to access confidential customer information prompted the tech behemoth to swiftly restrict the use of the tool and sternly warn workers not to input any sensitive data into it.

Knowledge without wisdom

Mathieu Fortier, director of machine learning at AI-driven digital experience platform Coveo, said that LLMs such as GPT-4 and LLaMA suffer from several imperfections and warned that despite their prowess in language comprehension, these models lack the ability to discern accuracy, immutable laws, physical realities and other non-lingual aspects.

“While LLMs construct extensive intrinsic knowledge repositories through training data, they have no explicit concept of truth or factual accuracy. Additionally, they are susceptible to security breaches and data extraction attacks, and are prone to deviating from intended responses or exhibiting ‘unhinged personalities,’” Fortier told VentureBeat.

Fortier highlighted the high stakes involved for enterprises. The ramifications can severely erode customer trust and inflict irreparable harm to brand reputation, leading to major legal and financial woes.

Following in the footsteps of other tech giants, Walmart Global Tech, the technology division of the retail behemoth, has implemented measures to mitigate the risk of data breaches. In an internal memo to employees, the company directed staff to block ChatGPT after detecting suspicious activity that could potentially compromise the enterprise’s data and security.

A Walmart spokesperson stated that although the retailer is creating its own chatbots on the capabilities of GPT-4, it has implemented several measures to protect employee and customer data from being disseminated on generative AI tools such as ChatGPT.

“Most new technologies present new benefits as well as new risks. So it’s not uncommon for us to assess these new technologies and provide our associates with usage guidelines to protect our customers', members' and associates’ data,” the spokesperson told VentureBeat. “Leveraging available technology, like Open AI, and building a layer on top that speaks retail more effectively enables us to develop new customer experiences and improve existing capabilities.”

Other firms, such as Verizon and Accenture, have also adopted steps to curtail the use of ChatGPT, with Verizon instructing its workers to restrict the chatbot to non-sensitive tasks, and Accenture implementing tighter controls to ensure compliance with data privacy regulations.

How ChatGPT uses conversational data

Compounding these concerns is the fact that ChatGPT retains user input data to train the model further, raising questions about the potential for sensitive information being exposed through data breaches or other security incidents.

OpenAI, the company behind the popular generative AI models ChatGPT and DALL-E, has recently implemented a new policy to improve user data privacy and security.

As of March 1 of this year, API users must explicitly opt in to sharing their data for training or improving OpenAI’s models.

In contrast, for non-API services, such as ChatGPT and DALL-E, users must opt out if they do not wish to have their data used by OpenAI.

“When you use our non-API consumer services ChatGPT or DALL-E, we may use the data you provide us to improve our models,” according to the OpenAI blog, recently updated. “Sharing your data with us not only helps our models become more accurate and better at solving your specific problem, it also helps improve their general capabilities and safety … You can request to opt-out of having your data used to improve our non-API services by filling out this form with your organization ID and email address associated with the owner of the account.”

This announcement comes amid concerns about the risks described above and the need for companies to be cautious when handling sensitive information. The Italian government recently joined the fray by banning the use of ChatGPT across the country, citing concerns about data privacy and security.

OpenAI states that it removes any personally identifiable information from data used to improve its AI models, and only uses a small sample of data from each customer for this purpose.

Government warning

The U.K.’s Government Communications Headquarters (GCHQ) intelligence agency, through its National Cyber Security Centre (NCSC), has issued a cautionary note about the limitations and risks of large language models (LLMs) like ChatGPT. While these models have been lauded for their impressive natural language processing capabilities, the NCSC warns that they are not infallible and may contain serious flaws.

According to the NCSC, LLMs can generate incorrect or “hallucinated” facts, as demonstrated during Google’s Bard chatbot’s first demo. They can also exhibit biases and gullibility, particularly when responding to leading questions. Additionally, these models require significant computational resources and vast amounts of data to train from scratch, and they are vulnerable to injection attacks and toxic content creation.

“LLMs generate responses to prompts based on the intrinsic similarity of that prompt to their internal knowledge, which memorized patterns seen in training data,” said Coveo’s Fortier. “However, given they have no intrinsic internal ‘hard rules’ or reasoning abilities, they can’t comply with 100% success to constraints that would command them not to disclose sensitive information.”

He added that despite efforts to reduce the generation of sensitive information, if the LLM is trained with such data, it can generate it back.

“The only solution is not to train these models with sensitive material,” he said. “Users should also refrain from providing them with sensitive information in the prompt, as most of the services in place today will keep that information in their logs.”

Best practices for safe and ethical use of generative AI

As companies continue to embrace AI and other emerging technologies, it will be crucial to ensure proper safeguards to protect sensitive data and prevent inadvertent disclosures of confidential information.

The actions taken by these companies highlight the importance of remaining vigilant when using AI language models such as ChatGPT. While these tools can greatly improve efficiency and productivity, they pose significant risks if not used appropriately.

“The best approach is to take every new development in the raw advancement of language models and fit it into an enterprise policy-driven architecture that surrounds a language model with pre-processors and post-processors for guard rails, fine-tune them for enterprise-specific data, and then maybe even go to on-prem deployment as well,” Peter Relan, chairman of conversational AI startup Got It AI, told VentureBeat. “Otherwise, raw language models are too powerful and sometimes harmful to deal with in the enterprise.”

For his part, Prasanna Arikala, CTO of Nvidia-backed conversational AI platform Kore .ai, says that moving forward, it will be essential for companies to limit LLMs’ access to sensitive and personal information to avoid breaches.

“Implementing strict access controls, such as multifactor authentication, and encrypting sensitive data can help to mitigate these risks. Regular security audits and vulnerability assessments can also be conducted to identify and address potential vulnerabilities,” Arikala told VentureBeat. “While LLMs are valuable tools if used correctly, it is crucial for companies to take the necessary precautions to protect sensitive data and maintain the trust of their customers and stakeholders.”

It remains to be seen how these regulations will evolve, but businesses must remain vigilant and informed to stay ahead of the curve. With the potential benefits of generative AI come new responsibilities and challenges, and it is up to the tech industry to work alongside policymakers to ensure that the technology is developed and implemented responsibly and ethically.