How to create generative AI confidence for enterprise success

During her 2023 TED Talk, computer scientist Yejin Choi made a seemingly contradictory statement when she said, “AI today is unbelievably intelligent and then shockingly stupid." How could something intelligent be stupid?

On its own, AI — including generative AI — isn't built to deliver accurate, context-specific information oriented to a particular task. In fact, measuring a model in this way is a fool's errand. Think of these models as being geared toward relevancy based on what it has experienced and then generating responses on these probable theories.

That's why, while generative AI continues to dazzle us with creativity, it often falls short when it comes to B2B requirements. Sure, it's clever to have ChatGPT spin out social media copy as a rap, but if not kept on a short leash, generative AI can hallucinate. This is when the model produces false information masquerading as the truth. No matter what industry a company is in, these dramatic flaws are definitely not good for business.

The key to enterprise-ready generative AI is in rigorously structuring data so that it provides proper context, which can then be leveraged to train highly refined large language models (LLMs). A well-choreographed balance between polished LLMs, actionable automation and select human checkpoints forms strong anti-hallucination frameworks that allow generative AI to deliver correct results that create real B2B enterprise value.

For any business that wants to take advantage of generative AI's unlimited potential, here are three vital frameworks to incorporate into your technology stack.

Build strong anti-hallucination frameworks

Got It AI, a company that can identify generative falsehoods, ran a test and determined that ChatGPT’s LLM produced incorrect responses roughly 20% of the time. That high failure rate doesn’t serve a business’s goals. So, to solve this issue and keep generative AI from hallucinating, you can't let it work in a vacuum. It's essential that the system is trained on high-quality data to derive outputs, and that it’s regularly monitored by humans. Over time, these feedback loops can help correct errors and improve model accuracy.

It’s imperative that generative AI's beautiful writing is plugged into a context-oriented, outcome-driven system. The initial phase of any company’s system is the blank slate that ingests information tailored to a company and its specific goals. The middle phase is the heart of a well-engineered system, which includes rigorous LLM fine-tuning. OpenAI describes fine-tuning models as “a powerful technique to create a new model that's specific to your use case.” This occurs by taking generative AI’s normal approach and training models on many more case-specific examples, thus achieving better results.

In this phase, companies have a choice between using a mix of hard-coded automation and fine-tuned LLMs. While choreography may be different from company to company, leveraging each technology to its strength ensures the most context-oriented outputs.

Then, after everything on the back end is set up, it’s time to let generative AI really shine in external-facing communication. Not only are answers rapidly created and highly accurate, they also provide a personal tone without suffering from empathy fatigue.

Orchestrate technology with human checkpoints

By orchestrating various technology levers, any company can provide the structured facts and context needed to let LLMs do what they do best. First, leaders must identify tasks that are computationally intense for humans but easy for automation — and vice versa. Then, factor in where AI is better than both. Essentially, don't use AI when a simpler solution, like automation or even human effort, will suffice.

In a conversation with OpenAI's CEO Sam Altman at Stripe Sessions in San Francisco, Stripe’s founder John Collison said that Stripe uses OpenAI’s GPT-4 “anywhere someone is doing manual work or working on a series of tasks.” Businesses should use automation to conduct grunt work, like aggregating information and combing through company-specific documents. They can also hard-code definitive, black-and-white mandates, like return policies.

Only after setting up this strong base is it generative AI-ready. Because the inputs are highly curated before generative AI touches the information, systems are set up to accurately tackle more complexity. Keeping humans in the loop is still crucial to verify model output accuracy, as well as provide model feedback and correct results if need be.

Measure outcomes via transparency

At present, LLMs are black boxes. Upon releasing GPT-4, OpenAI stated that “Given both the competitive landscape and the safety implications of large-scale models like GPT-4, this report contains no further details about the architecture (including model size), hardware, training compute, dataset construction, training method, or similar.” While there have been some strides toward making models less opaque, how the model functions is still somewhat of a mystery. Not only is it unclear what is under the hood, it's also ambiguous what the difference is between models — other than cost and how you interact with them — because the industry as a whole doesn't have standardized efficacy measurements.

There are now companies changing this and bringing clarity across generative AI models. These standardizing efficacy measurements have downstream enterprise benefits. Companies like Gentrace link data back to customer feedback so that anyone can see how well an LLM performed for generative AI outputs. Other companies like Paperplane.ai take it a step further by capturing generative AI data and linking it with user feedback so leaders can evaluate deployment quality, speed and cost over time.

Liz Tsai is founder and CEO of HiOperator.

Welcome to the VentureBeat community!

Our guest posting program is where technical experts share insights and provide neutral, non-vested deep dives on AI, data infrastructure, cybersecurity and other cutting-edge technologies shaping the future of enterprise.

Read more from our guest post program — and check out our guidelines if you’re interested in contributing an article of your own!

Build strong anti-hallucination frameworks

Orchestrate technology with human checkpoints

Measure outcomes via transparency

More