WhyLabs launches LangKit to make large language models safe and responsible

WhyLabs, a Seattle-based startup that provides monitoring tools for AI and data applications, today announced the release of LangKit, an open-source technology that helps enterprises monitor and safeguard their large language models (LLMs). LangKit enables users to detect and prevent risks and issues in LLMs, such as toxic language, data leakage, hallucinations and jailbreaks.

WhyLabs cofounder and CEO Alessya Visnjic told VentureBeat in an exclusive interview ahead of today's launch that the product is designed to help enterprises monitor how their AI systems are functioning and catch problems before they affect customers or users.

“LangKit is a culmination of metrics that are critical to monitor for LLM models,” she said. “Essentially, what we have done is we’ve taken this wide range of popular metrics that our customers have been using to monitor LLMs, and we built them into LangKit.”

Meeting rapidly evolving LLM standards

LangKit is built on two core principles: open sourcing and extensibility. Visnjic believes that by leveraging the open-source community and creating a highly extensible platform, WhyLabs can keep pace with the evolving AI landscape and accommodate diverse customer needs, particularly in industries such as healthcare and fintech, which have higher safety standards.

Some of the metrics that LangKit provides are sentiment analysis, toxicity detection, topic extraction, text quality assessment, personally identifiable information (PII) detection and jailbreak detection. These metrics can help users validate and safeguard individual prompts and responses, evaluate the compliance of the LLM behavior with policy, monitor user interactions inside an LLM-powered application, and A/B test across different LLM and prompt versions.

Visnjic says LangKit is relatively easy to use and integrates with several popular platforms and frameworks, including OpenAI GPT-4, Hugging Face Transformers, AWS Boto3 and more. Users can get started with just a few lines of Python code and use the platform to track the metrics over time and set up alerts and guardrails. Users can also customize and extend LangKit with their own models and metrics to suit their specific use cases.

Early users have praised the solution's out-of-the-box metrics, ease of use and plug-and-play capabilities, according to Visnjic. These features have proved particularly valuable for stakeholders in regulated industries, as LangKit provides understandable insights into language models, enabling more accessible conversations about the technology.

An emerging market for AI monitoring

Visnjic said that LangKit is based on the feedback and collaboration of WhyLabs’ customers, who range from Fortune 100 companies to AI-first startups in various industries. She said that LangKit helps them gain visibility and control over their LLMs in production.

"With LangKit, what they're able to do is run ... very specialized LLM integration tests, where they specify a range of prompts like a golden set of prompts, that their model should be good at responding. And then they run this golden set of prompts every time they make small changes to either the model itself, or to some of the prompt engineering aspects," Visnjic explained.

Early adopters of LangKit include Symbl.AI and Tryolabs, both of which have provided valuable feedback to help refine the product. Tryolabs, a company focused on helping enterprises adopt large language models, offers insights from a variety of use cases. Symbl.AI, on the other hand, is a prototypical customer using LangKit to monitor its LLM-powered application in production.

"In their [Symbl.AI's] case, they have an LLM-powered application, it's running in production, they have customers that are interacting with it. And they would like to have that transparency into how it's doing. How is it behaving over time? And they would like to have an ability to set up guardrails," Visnjic said.

Model monitoring built for enterprises

LangKit is specifically designed to handle high-throughput, real-time, and automated systems that require a wide range of metrics and alerts to track LLM behavior and performance. Unlike the embedding-based approach that is commonly used for LLM monitoring and evaluation, LangKit uses a metrics-based approach that is more suitable for scalable and operational use cases.

“When you’re dealing with high-throughput systems in production you need to look at metrics,” said Visnjic. “You need to crunch down to what types of signals you would like to track or potentially have a really wide range of signals. Then you want these metrics to be extracted, you want some kind of baseline, and you want it to be monitored over time with as much automation as possible.”

LangKit will be integrated into WhyLabs’ AI observability platform, which also offers solutions for monitoring other types of AI applications, such as embeddings, model performance and unstructured data drift.

WhyLabs was founded in 2019 by former Amazon Machine Learning engineers and is backed by Andrew Ng’s AI Fund, Madrona Venture Group, Defy Partners and Bezos Expeditions. The company was also incubated at the Allen Institute for Artificial Intelligence (AI2).

LangKit is available today as an open-source library on GitHub and as an SaaS solution on WhyLabs’ website. Users can also check out a demo notebook and an overview video to learn more about LangKit’s features and capabilities.

Meeting rapidly evolving LLM standards

An emerging market for AI monitoring

Model monitoring built for enterprises

More