Today, Copenhagen-based healthcare AI Corti is launching Symphony for Speech-to-Text, a new generation of clinical-grade speech recognition models engineered specifically for real-time dictation, conversational transcription, and batch audio processing — and their accuracy rate is the highest for this specific use case yet recorded.

"We are focused on ensuring our AI scribes can be trusted by physicians, medical practitioners and patients...the entire healthcare system," said Andreas Cleve, co-founder and CEO of Corti, in an exclusive video call interview with VentureBeat.

The performance data the company is bringing to the table paints a stark picture of the current state of enterprise AI: when it comes to highly regulated, specialized industries, domain-specific models can beat out the foundation model providers.

In a newly published research paper, Corti revealed that its new clinical-grade speech models reduced word error rates (WER) by up to 93% when compared against leading generalist speech models and APIs on medical terminology.

On English medical terminology, its Symphony for Speech-to-Text achieved a remarkably low 1.4% WER. By comparison, OpenAI’s speech model registered a 17.7% WER, ElevenLabs hit 18.1%, Whisper recorded 17.4%, and Parakeet scored 18.9%.

Corti’s announcement serves as a critical inflection point for healthcare builders. While general-purpose APIs like OpenAI’s whisper are sufficient for broad-domain transcription, they frequently stumble over medical acronyms, complex medication dosages, shorthand, and noisy emergency room environments. Symphony for Speech-to-Text aims to solve this by providing developers with a highly specialized, production-grade API designed from the ground up for clinical workflows.

The agentic era demands flawless data inputs

The launch of Symphony for Speech-to-Text highlights a fundamental shift in how healthcare uses voice technology. For decades, medical speech recognition was primarily about generating a static text document for human doctors to review—a digital replacement for a notepad.

But as the healthcare industry hurtles into what technologists call the "agentic era," where autonomous AI agents actively assist in clinical decision-making, EHR navigation, and real-time support, the transcript is no longer the final product. It is the foundational data layer.

“Speech has always been one of healthcare’s most important inputs,” Cleve said in a statement provided to VentureBeat. “What is changing is what happens after the words are captured. In the agentic era, speech recognition requires more than simply producing a transcript - we need to give AI systems accurate clinical facts to reason from. If a model mishears a medication, dosage, or symptom, every downstream step becomes less reliable. Symphony for Speech-to-Text gives healthcare builders a speech layer accurate enough to thrive in clinical reality.”

This is where the compounding danger of high word error rates comes into play. If a general-purpose AI model hallucinates a transcription—turning "hyperthyroidism" into "hypothyroidism," or misinterpreting a critical medication dosage—every subsequent AI agent relying on that transcript will operate on corrupted data. Corti’s architecture mitigates this risk by producing structured, clinically usable output directly from the API, helping downstream AI applications reason over clean facts rather than messy, unformatted text.

Nowhere is this more evident than in Corti’s entity recall benchmarks. Symphony for Speech-to-Text reached an astonishing 98.3% recall rate on formatted clinical entities—such as dosages, measurements, and dates. In contrast, Corti reported that the strongest general-purpose baseline model maxed out at just 44.3% recall for the same entities.

For developers building ambient AI documentation tools, that 54% gap is the difference between a tool that saves a physician time and a tool that constitutes a medical liability.

Dethroning the industry ldears

While Corti’s benchmarks against modern LLM builders like OpenAI and ElevenLabs are striking, the company is also taking aim at legacy medical transcription giants.

For years, the gold standard for dedicated clinician dictation has been Dragon Medical One. However, these legacy systems were historically optimized strictly for intentional clinician dictation, not as underlying infrastructure for ambient AI, complex multi-party conversations, or real-time clinical support tools.

In evaluations of real-world English medical dictation, Corti achieved a 4.6% WER, outperforming Dragon’s 5.7% (a 19% relative improvement).

Furthermore, Corti demonstrated a higher medical term recall than Dragon (93.5% versus 92.9%).

By providing this level of accuracy via an API endpoint, Corti is enabling third-party developers, EHR vendors, and virtual care platforms to build their own custom dictation and ambient listening tools that outperform the industry's legacy incumbent.

"We want people to build apps atop our models," Cleve said. "The goal is to diffuse the technology as widely as it is needed so it can be as helpful as possible to patients and their doctors and professionals."

For Cleve and his co-founders, the mission is a personal one: Cleve's own mother was a healthcare professional attacked by a patient and spent years struggling to recover. He sought to improve healthcare processes as a way of honoring her sacrifice.

Solving the healthcare model puzzle

The demands of healthcare extend far beyond English-speaking hospitals, and global health systems have historically been underserved by clinical NLP models. Early adopters are already leveraging Corti’s new models in linguistically demanding environments, proving the technology's viability in complex international markets.

Switzerland, for instance, requires care delivery across multiple languages—often simultaneously within a single medical institution. It serves as one of the most stringent proving grounds for multilingual medical speech models in the world. Corti’s Symphony models demonstrated massive performance gains in these non-English tests, achieving a 2.4% WER in German (compared to 13.0% for the next-best system) and a 3.9% WER in French (versus 10.6%).

“In a clinical conversation, every word matters - a missed medication name, a misheard dosage, or a mistranscribed symptom can change the meaning of an encounter," said Pierre Corboz, Head of Solutions & Business Development at Voicepoint, a Swiss healthcare technology provider, in a statement provided to VentureBeat. "Symphony’s accuracy on clinical terminology gives us the foundation to bring more trusted AI capabilities into clinical workflows with our Voicepoint Xenon platform. When Corti improves the speech layer, the workflows we build together become sharper, safer, and more useful for clinicians in Switzerland.”

AI vrticalization and specialization are yielding gains

Today’s announcement of Symphony for Speech-to-Text is not an isolated event; it is the culmination of a strategic narrative Corti has been aggressively pushing over the last several weeks.

The broader Symphony platform—which powers clinical and administrative applications for a global network of EHR vendors and life sciences organizations—has been systematically proving the defensibility of vertical AI labs against horizontal tech giants.

This marks the third major benchmark Corti has released in just six weeks, touching different layers of healthcare AI performance.

In April, the company revealed that its Symphony for Medical Coding system outperformed general-purpose models by more than 25% in clinical accuracy benchmarks, tackling one of healthcare’s most notoriously complex workflows.

And just last week, Corti announced that its flagship clinical-grade model outscored OpenAI on HealthBench Professional, OpenAI’s own healthcare benchmark.

Taken together, these three data points—medical coding, clinical reasoning, and speech-to-text accuracy—illustrate a growing consensus in the enterprise technology sector: generalized models are hitting a ceiling in regulated industries.

Models deployed in hospitals must inherently understand complex acronyms, sudden interruptions, medical shorthand, specialty-specific language, and strict compliance constraints. By training specifically on these unique edge cases, vertical AI labs like Corti are building a formidable moat that companies relying solely on API calls to generalized large language models cannot easily cross.

Availability and product lineup

Developers are clearly taking notice of the performance gap. According to momentum data provided to VentureBeat, Corti is seeing a 30% growth in new sign-ups for its platform in quarter-to-date comparisons, signaling that developers and healthcare builders are actively gravitating toward vertical, clinical-grade models over generalist APIs.

Corti, which already serves over 100 million patients annually across major health systems including the UK’s National Health Service (NHS), is positioning Symphony for Speech-to-Text as the default engine for the next generation of healthcare software.

It is important to note that Corti is not launching the overarching Symphony platform itself today; rather, Symphony for Speech-to-Text operates as a new, distinct capability within that broader ecosystem, accessible via its own API endpoints.

Symphony for Speech-to-Text is generally available starting today. Developers and enterprise architects can access the models via the Corti API console, with full technical documentation available to help integrate the clinical-grade speech layer into their existing applications.

In a move toward research transparency, Corti has also published its full research paper detailing its methodology, along with a separate comparison tool designed to support transparent evaluation of medical speech recognition systems across the industry.

As the healthcare industry continues its rapid embrace of AI-driven automation, the foundational data layer has never been more critical. Corti’s latest launch is a stark reminder that in the medical field, generic AI simply isn't good enough. The future belongs to the specialists.