Google debuts better transcription, endless streaming, and more in Contact Center AI

Last July, during its Cloud Next conference in San Francisco, Google unveiled Contact Center AI. The machine learning-powered customer support toolkit taps Dialogflow (a conversational experiences development platform) and Cloud Speech-to-Text (a suite of audio-to-text technologies) to interact with callers over the phone. It has been a long time coming, but this week the tech giant bolstered the nascent service with a raft of features that vastly improve speech recognition accuracy.

"Contact centers are critical to many businesses, and the right technologies play an important role in helping them provide outstanding customer care," wrote product managers Dan Aharon and Shantanu Misra in a blog post. "We're excited to see how these improvements to speech recognition improve the customer experience for contact centers of all shapes and sizes."

Automatic speech adaptation

Contact Center AI's new Auto Speech Adaptation feature, which is available in beta, targets scenarios where Dialogflow agents' speech recognition systems might confuse similar-sounding words. It takes into account context -- specifically training phrases, entities, and other agent-specific information -- to respond appropriately using a learning process known as speech adaptation. For instance, if a caller attempts to arrange a product return, Contact Center AI will leverage its knowledge of the returns process to avoid mistaking the word "mail" for "nail."

Auto Speech Adaptation is switched off by default. You'll find it in the Dialogflow console.

Baseline model improvements

Google recently launched in preview premium speech-to-text models tuned to specific use cases, and in February it made one of them -- a phone model optimized for two- to four-person conversations -- generally available. The Mountain View company claimed at the time that this model had 62% fewer transcription errors compared with its predecessor's 54%. Today, Google revealed that its engineers have further optimized the model for short utterances in U.S. English. The model is now 15% more accurate relative to the previously announced improvements.

"Applying speech adaptation can also provide additional improvements on top of that gain," wrote Aharon and Misra. "We're constantly adding more quality improvements to the roadmap -- an automatic benefit to any IVR or phone-based virtual agent, without any code changes needed -- and will share more about these updates in [the] future."

Better transcription and endless streaming

Increased contextual awareness and enhanced speech-to-text aren't the only new natural language understanding improvements coming down the Contact Center AI pipeline. Google debuted in beta today "richer" manual speed adaptation and entity classes, in addition to expanded phrase limits, endless streaming, and more.

There's a trio of new features within SpeechContext parameters, the collection of Cloud Speech-to-Text settings and toggles that tailor transcriptions to businesses' and verticals' vernaculars. SpeechContext classes -- prebuilt entities reflecting concepts like digit sequences, addresses, numbers, and money denominations -- optimize ASR for a list of words at once. As for SpeechContext boost, it helps adjust speech adaptation strength while cutting down on the number of false positives -- i.e., when a phrase wasn't mentioned but appears in a transcript. Lastly, SpeechContext now supports up to 5,000 phrase hints per API request (up from 500), increasing the probability uncommon words or phrases will be captured by ASR.

Perhaps more significantly, Cloud Speech-to-Text, which since launch has only supported streaming audio in one-minute increments, can now process sessions up to five minutes in length and resume streaming where the previous sessions left off. (Google notes that this effectively makes live automatic transcription infinite in length.) Additionally, Cloud Speech-to-Text now natively supports the MP3 file format; previously, MP3 files had to be expanded into the LINEAR16 format prior to processing.

"We're excited to see how these improvements to speech recognition improve the customer experience for contact centers of all shapes and sizes -- whether you're working with one of our partners to deploy the Contact Center AI solution or taking a DIY approach using our conversational AI suite," wrote Aharon and Misra.

The veritable slew of announcements follows the debut of Calljoy, a graduate from Google's Area 120 incubator that aims to help small businesses harness language models to automate incoming call management. More recently, Google made available in beta Document Understanding AI, a serverless platform that automatically classifies and structures data within scanned physical and digital documents, and Vision Product Search, which uses the company's Cloud Vision technology to enable stores to create Google Lens-type smartphone experiences.

Contact Center AI remains in beta, with partners including 8×8, Avaya, Salesforce, Accenture, Cisco, Five9, Genesys, Mitel, Twilio, and Vonage.