Inside the race to build an ‘operating system’ for generative AI

Generative AI, the technology that can auto-generate anything from text, to images, to full application code, is reshaping the business world. It promises to unlock new sources of value and innovation, potentially adding $4.4 trillion to the global economy, according to a recent report by McKinsey.

But for many enterprises, the journey to harness generative AI is just beginning. They face daunting challenges in transforming their processes, systems and cultures to embrace this new paradigm. And they need to act fast, before their competitors gain an edge.

One of the biggest hurdles is how to orchestrate the complex interactions between generative AI applications and other enterprise assets. These applications, powered by large language models (LLMs), are capable not only of generating content and responses, but of making autonomous decisions that affect the entire organization. They need a new kind of infrastructure that can support their intelligence and autonomy.

Ashok Srivastava, chief data officer of Intuit, a company that has been using LLMs for years in the accounting and tax industries, told VentureBeat in an extensive interview that this infrastructure could be likened to an operating system for generative AI: "Think of a real operating system, like MacOS or Windows," he said, referring to assistant, management and monitoring capabilities. Similarly, LLMs need a way to coordinate their actions and access the resources they need. "I think this is a revolutionary idea," Srivastava said.

The operating-system analogy helps to illustrate the magnitude of the change that generative AI is bringing to enterprises. It is not just about adding a new layer of software tools and frameworks on top of existing systems. It is also about giving the system the authority and agency to run its own process, for example deciding which LLM to use in real time to answer a user's question, and when to hand off the conversation to a human expert. In other words, an AI managing an AI, according to Intuit’s Srivastava. Finally, it's about allowing developers to leverage LLMs to rapidly build generative AI applications.

This is similar to the way operating systems revolutionized computing by abstracting away the low-level details and enabling users to perform complex tasks with ease. Enterprises need to do the same for generative AI app development. Microsoft CEO Satya Nadella recently compared this transition to the shift from steam engines to electric power. "You couldn’t just put the electric motor where the steam engine was and leave everything else the same, you had to rewire the entire factory," he told Wired.

What does it take to build an operating system for generative AI?

According to Intuit’s Srivastava, there are four main layers that enterprises need to consider.

First, there is the data layer, which ensures that the company has a unified and accessible data system. This includes having a knowledge base that contains all the relevant information about the company's domain, such as — for Intuit — tax code and accounting rules. It also includes having a data governance process that protects customer privacy and complies with regulations.

Second, there is the development layer, which provides a consistent and standardized way for employees to create and deploy generative AI applications. Intuit calls this GenStudio, a platform that offers templates, frameworks, models and libraries for LLM app development. It also includes tools for prompt design and testing of LLMs, as well as safeguards and governance rules to mitigate potential risks. The goal is to streamline and standardize the development process, and to enable faster and easier scaling.

Third, there is the runtime layer, which enables LLMs to learn and improve autonomously, to optimize their performance and cost, and to leverage enterprise data. This is the most exciting and innovative area, Srivastava said. Here new open frameworks like LangChain are leading the way. LangChain provides an interface where developers can pull in LLMs through APIs, and connect them with data sources and tools. It can chain multiple LLMs together, and specify when to use one model versus another.

Fourth, there is the user experience layer, which delivers value and satisfaction to the customers who interact with the generative AI applications. This includes designing user interfaces that are consistent, intuitive and engaging. It also includes monitoring user feedback and behavior, and adjusting the LLM outputs accordingly.

Intuit recently announced a platform that encompasses all these layers, called GenOS, making it one of the first companies to embrace a full-fledged gen OS for its business. The news got limited attention, partly because the platform is mostly internal to Intuit and not open to outside developers.

How are other companies competing in the generative AI space?

While enterprises like Intuit are building their own gen OS platforms internally, there is also a vibrant and dynamic ecosystem of open software frameworks and platforms that are advancing the state of the art of LLMs. These frameworks and platforms are enabling enterprise developers to create more intelligent and autonomous generative AI applications for various domains.

One key trend: Developers are piggy-backing on the hard work of a few companies that have built out so-called foundational LLMs. These developers are finding ways to affordably leverage and improve those foundational LLMs, which have already been trained on massive amounts of data and billions of parameters by other organizations, at significant expense. These models, such as OpenAI’s GPT-4 or Google’s PaLM 2, are called foundational LLMs because they provide a general-purpose foundation for generative AI. However, they also have some limitations and trade-offs, depending on the type and quality of data they are trained on, and the task they are designed for. For example, some models focus on text-to-text generation, while others focus on text-to-image generation. Some do better at summarization, while others are better at classification tasks.

Developers can access these foundational large language models through APIs and integrate them into their existing infrastructure. But they can also customize them for their specific needs and goals, by using techniques such as fine-tuning, domain adaptation and data augmentation. These techniques allow developers to optimize the LLMs' performance and accuracy for their target domain or task, by using additional data or parameters that are relevant to their context. For example, a developer who wants to create a generative AI application for accounting can fine-tune an LLM model with accounting data and rules, to make it more knowledgeable and reliable in that domain.

Another way that developers are enhancing the intelligence and autonomy of LLMs is by using frameworks that allow them to query both structured and unstructured data sources, depending on the user’s input or context. For example, if a user asks for specific company accounting data for the month of June, the framework can direct the LLM to query an internal SQL database or API, and generate a response based on the data.

Unstructured data sources, such as text or images, require a different approach. Developers use embeddings, which are representations of the semantic relationships between data points, to convert unstructured data into formats that can be processed efficiently by LLMs. Embeddings are stored in vector databases, which are one of the hottest areas of investment right now. One company, Pinecone, has raised over $100 million in funding at a valuation of at least $750 million, thanks to its compatibility with data lakehouse technologies like Databricks.

Tim Tully, former CTO of data monitoring company Splunk, who is now an investor at Menlo Ventures, invested in Pinecone after seeing the enterprise surge toward the technology. "That’s why you have 100 companies popping up trying to do vector embeddings," he told VentureBeat. "That’s the way the world is headed," he said. Other companies in this space include Zilliz, Weaviate and Chroma.

_{The New Language Model Stack, courtesy of Michelle Fradin and Lauren Reeder of Sequoia Capital}

What are the next steps toward enterprise LLM intelligence?

To be sure, the big-model leaders, like OpenAI and Google, are working on loading intelligence into their models from the get-go, so that enterprise developers can rely on their APIs, and avoid having to build proprietary LLMs themselves. Google’s Bard chatbot, based on Google’s PaLM LLM, has introduced something called implicit code execution, for example, that identifies prompts that indicate a user needs an answer to a complex math problem. Bard identifies this, and generates code to solve the problem using a calculator.

OpenAI, meanwhile, introduced function calling and plugins, which are similar in they can turn natural language into API calls or database queries, so that if a user asks a chatbot about stock performance, the bot can return accurate stock information from relevant databases needed to answer the question.

Still, these models can only be so all-encompassing, and since they’re closed they can’t be fine-tuned for specific enterprise purposes. Enterprise companies like Intuit have the resources to fine-tune existing foundational models, or even build their own models, specialized around tasks where Intuit has a competitive edge — for example with its extensive accounting data or tax code knowledge base.

Intuit and other leading developers are now moving to new ground, experimenting with self-guided, automated LLM “agents” that are even smarter. These agents use what is called the context window within LLMs to remember where they are in fulfilling tasks, essentially using their own scratchpad and reflecting after each step. For example, if a user wants a plan to close the monthly accounting books by a certain date, the automated agent can list out the discrete tasks needed to do this, and then work through those individual tasks without asking for help. One popular open-source automated agent, AutoGPT, rocketed to more than 140,000 stars on Github. Intuit, meanwhile, has built its own agent, GenOrchestrator. It supports hundreds of plugins and meets Intuit’s accuracy requirements.

_{Another depiction of the LLM app stack, courtesy of Matt Bornstein and Raiko Radovanovic of a16z}

The future of generative AI is here

The race to build an operating system for generative AI is not just a technical challenge, but a strategic one. Enterprises that can master this new paradigm will gain a significant advantage over their rivals, and will be able to deliver more value and innovation to their customers. They arguably will also be able to attract and retain the best talent, as developers will flock to work on the most cutting-edge and impactful generative AI applications.

Intuit is one of the pioneers and is now reaping the benefits of its foresight and vision, as it is able to create and deploy generative AI applications at scale and with speed. Last year, even before it brought some of these OS pieces together, Intuit says it saved a million hours in customer call time using LLMs.

Most other companies will be a lot slower, because they’re only now putting the first layer — the data layer — in place. The challenge of putting the next layers in place will be at the center of VB Transform, a networking event on July 11 and 12 in San Francisco. The event focuses on the enterprise generative AI agenda, and presents a unique opportunity for enterprise tech executives to learn from each other and from the industry experts, innovators and leaders who are shaping the future of business and technology.

Intuit’s Srivastava has been invited to discuss the burgeoning GenOS and its trajectory. Other speakers and attendees include executives from McDonalds, Walmart, Citi, Mastercard, Hyatt, Kaiser Permanente, CapitalOne, Verizon and more. Representatives from large vendors will be present too, including Amazon’s Matt Wood, VP of product, Google’s Gerrit Kazmaier, VP and GM, data and analytics, and Naveen Rao, CEO of MosaicML, which helps enterprise companies build their own LLMs and just got acquired by Databricks for $1.3 billion. The conference will also showcase emerging companies and their products, with investors like Sequoia's Laura Reeder and Menlo's Tim Tully providing feedback.

I’m excited about the event because it’s one of the first independent conferences to focus on the enterprise case of generative AI. We look forward to the conversation.

What does it take to build an operating system for generative AI?

How are other companies competing in the generative AI space?

What are the next steps toward enterprise LLM intelligence?

The future of generative AI is here

More