Weights and Biases debuts LLMOps tools to support prompt engineers

The era of generative AI and large language models (LLMs) is spawning a new category of tooling known as LLMOps to support the needs of users.

San Francisco startup Weights and Biases announced a major update this week of its MLOps platform, geared to enable LLMOps. With LLM-based operations, organizations and users are typically not building entirely new models; rather, they are often fine-tuning and using prompts to generate the results they want. The need to support that use case is behind today's launch of the W&B Prompts feature on the Weights and Biases platform. The new feature includes capabilities to help users to quickly build LLM-based applications with a series of chained prompts that lead to an optimized output.

"Our mission has always been to build the best tools for machine learning practitioners," Lukas Biewald, CEO and cofounder of Weights and Biases, said during a livestreamed user meetup from London. "We define machine learning practitioners broadly as anyone trying to make machine learning models work in the real world."

The path from machine learning to prompt engineering

Since 2017, W&B has been building out its MLOps platform and evolving it as the needs and types of users have changed.

Biewald noted the first thing the company built was a capability called experiments that was designed to help machine learning engineers do experiment tracking. That initial feature helped to track all the models an organization was building and understand how they progress or regress over time.

W&B has expanded the platform from those beginnings to add in parameter-optimization for models, a reporting feature to help groups of developers collaborate, and a series of advanced features for artifact tracking and model workflow management and deployment.

There’s been a rise in prompt engineering in recent months. The catalyst for this change is organizations’ increasing reliance on LLMs from vendors, including OpenAI and Cohere, instead of trying to build their own entirely unique models.

"Prompt engineering is the most popular way to use large language models right now. You don't fine-tune it, you don't build it yourself; you just take something off the shelf and then figure out how to make it useful," Biewald said.

Biewald noted that in the past it could take a data scientist or machine learning engineer significant time and effort to apply sentiment analysis to a dataset. In the era of LLMs, executing sentiment analysis is often as easy as just having the right prompt.

"The market has just massively expanded and I think that every software developer — maybe every person now — can be a machine learning practitioner," he said. "Everyone can use machine learning models for real-world applications without needing a lot of training."

New tools for prompt engineering

The new W&B Prompts tools fit into the emerging LLMOps landscape by helping companies build accurate and effective prompts for complex tasks.

In a series of rapid-fire demos, Biewald showed what the new tools can do. First up was a set of tools for debugging that can be used to help a prompt engineer track, trace and debug potential errors in a prompt chain (a set of prompts); the prompt chain is used together or in succession to get the ideal result.

LangChain, a framework for developing applications powered by language models, is also now integrated with W&B Prompts. For OpenAI-based LLMs, W&B offers integrated support to score prompts for effectiveness with the OpenAI Evals framework.

"We can look at how well different models are working, and hopefully know if models are improving or degrading as you change your prompts," Biewald said.

The path from machine learning to prompt engineering

New tools for prompt engineering

More