Achieving reliable generative AI

The term “generative AI” has been all the buzz recently. Generative AI comes in several flavors, but common to all of them is the idea that the computer can automatically generate a lot of clever, useful content based on relatively little input from the user. If not something for nothing, at least a lot for a little.

The initial recent excitement has been fueled by visual generative AI systems, such as DALL·E 2 and Stable Diffusion, in which the machine generates novel images based on brief textual descriptions. Want an image of “a donkey on the moon reading Tolstoy?" Voila! In a few seconds, you get a never-before-seen image of this well-read, well-traveled donkey. And then there’s the compelling value exchange – you input a few words and, in return, get a picture that’s worth a thousand.

But this is misleading since it reinforces the image of the computer doing all the work. If indeed all you want is any aesthetic image of a lunary erudite donkey, chances are you’ll be satisfied with the output of the system; there are many such images, and the systems are good enough to be able to produce one of them. But as an artist, you have a more nuanced intent in mind, and at best, you’d use the generative system as an interactive tool to generate images based on many prompts you experiment with and are also likely to afterward massage the image yourself.

This is even more striking in the case of textual generative AI, and here, of course, chatGPT has been all the rage. Here too, the promise is that the user jots down some key ideas, and the system takes over and does most of the writing. And indeed, systems such as chatGPT are impressive. They write poems, blog posts, emails, marketing copy, and the list goes on. The systems sometimes produce long-form text that’s surprisingly coherent, on message, and includes many correct and relevant facts not mentioned in the instructions.

Except when they don’t. And often enough, they won’t. In practice, textual generative AI, when deployed without proper controls, generates as much wrong content as it does useful content. And "wrong" doesn’t mean "slightly off." It means downright nonsensical. The internet is replete with such examples of chatGPT behavior; it will explain why 1000 is greater than 1062, will say it doesn’t know whether Lincoln and his assassin were on the same continent at the time of the assassination, will explain at length that the University of Alabama prohibited admitting black students in 1973 while Emory University never discriminated (both wrong), and claim that GPUs, CPUs, DNA computing and the abacus are increasingly more powerful for the purpose of deep learning. All in fluent, convincing prose.

This is not a shortcoming specific to chatGPT; it is endemic to all current textual generative systems. Only a month ago, Meta unveiled Galactica, which claimed the ability to generate insightful scientific content and was taken down after two days when it became apparent that it was producing as much pseudo-science as it did credible scientific content.

The brittleness of textual generative AI was recognized early on. When GPT-2 was introduced in 2019, columnist Tiernan Ray wrote, "[GPT-2 displays] flashes of brilliance mixed with [...] gibberish." And when a year later GPT-3 was released, my colleague Andrew Ng wrote, "Sometimes GPT-3 writes like a passable essayist, [but] it seems a lot like some public figures who pontificate confidently on topics they know little about.”

This brittleness of current generative AI limits its impact in the real world. As a well-known publisher recently complained to me, the time his company saved by using a certain generative system was offset by the time it needed to spend fixing the nonsense it produced.

To fully realize its potential, generative AI, especially the textual kind, must become more reliable. There are several technological developments that hold promise in this regard. One of them is increasing the degree to which the output is firmly anchored in trusted sources. By “firmly anchored,” I don’t mean merely being trained on trusted sources (which is already an issue in current systems), but in addition, that important parts of the output can be reliably traced back to the sources on which they were based. Current so-called “retrieval-augmented language models,” which access trusted sources to help guide the output of the neural network, point in a promising direction.

Another key element is increasing the degree to which the systems exhibit basic common sense and reasoning and avoid egregious mistakes. Long-form text tells a story, and the story must have internal logic, be factually correct, and have a point. Current systems don’t have these properties, at least not reliably. The statistical nature of the neural networks, which power the current systems, makes the systems capable of producing cogent passages some of the time, but they inevitably fall off the cliff when pushed beyond a certain limit. They make blatant factual or logical errors and can easily veer off-topic. There are several strands of work aimed at mitigating this. They include purely neural approaches, such as so-called “prompt decomposition” and “hierarchical generation.” Other approaches follow the so-called “neuro-symbolic” direction, which augments the neural machinery with explicit symbolic reasoning.

But I think the most important development is achieving what I call product-algo fit. The temptation to “get something for nothing” seduces people into not providing enough guidance to the generative systems and demanding an output that is too ambitious. Generative AI will never be perfect, and a good product manager understands the limitations of the underlying technology; she designs the product to compensate for those and in particular, crafts the best division of labor between the user and the machine. Galactica, as mentioned earlier, is actually an interesting engineering artifact. But asking it to reliably produce scientific papers is just too much. Generative AI needs more guidance -- if you don’t know where you’re going, you’ll get there. If you don’t strongly care where you’re going – when any donkey on the moon or any generic birthday greeting to grandma will do – you’re on relatively safe ground. But if you’re writing a letter to your boss, your prized client, or your loved one, you want to get it just right, and for this, the system needs more guidance. The guidance can be given upfront, such as by an enriched set of prompts, but also interactively in the product itself.

The jury is out on which combination of techniques will prove most useful, but I believe that the shortcomings of generative AI will be dramatically reduced. I also believe that this will happen sooner rather than later because of the enormous economic benefits of reliable generative AI.

Does that mean the end of human writing? I don’t believe so. Certainly, some aspects of writing will be automated. Already today, we can’t live without spell-checking and grammar correction software; copy editing has been automated. But we still write, and I don’t think that will change. What will change is that, as we write, we’ll have built-in research assistants and editors (in the sense of a book editor, not the software artifact). These functions, which have been a luxury afforded by the very few, will be democratized.

And that's a good thing.

Yoav Shoham is the co-founder and co-CEO of AI21 Labs.

Welcome to the VentureBeat community!

Our guest posting program is where technical experts share insights and provide neutral, non-vested deep dives on AI, data infrastructure, cybersecurity and other cutting-edge technologies shaping the future of enterprise.

Read more from our guest post program — and check out our guidelines if you’re interested in contributing an article of your own!

More