Google AI researchers working with the ALS Therapy Development Institute today shared details about Project Euphonia, a speech-to-text transcription service for people with speaking impairments. Researchers also say their approach can improve automatic speech recognition for people with non-native English accents.
People with amyotrophic lateral sclerosis (ALS) often have slurred speech, but existing AI systems are typically trained on voice data from those without any speech difficulty or accent.
The new approach is successful primarily due to the introduction of small quantities of data representing people with accents and ALS.
“We show that 71% of the improvement comes from only five minutes of training data,” researchers wrote in a paper titled “Personalizing ASR for Dysarthric and Accented Speech with Limited Data,” published on arXiv July 31.
Personalized models were able to achieve 62% and 35% relative word error rate (WER) improvement for ALS and accents, respectively.
The ALS speech data set consists of 36 hours of audio from 67 people with ALS, thanks to work with the ALS Therapy Development Institute.
The non-native English speaker data set is called L2 Arctic and includes 20 recordings of utterances that last one hour each.
Project Euphonia also utilizes techniques from Parrotron, an AI tool for people with speech impediments that was introduced in July, in addition to employing fine-tuning techniques.
Written by 12 coauthors, the paper will be presented at the International Speech Communication Association, or Interspeech 2019, which takes place September 15-19 in Graz, Austria.
“This paper’s approach overcomes data scarcity by beginning with a base model trained on thousands of hours of standard speech. It gets around sub-group heterogeneity by training personalized models,” the paper reads.
The research, which a Google AI blog post highlighted today, follows the introduction in May of Project Euphonia and other initiatives, such as Live Relay, a feature that makes phone calls easier for deaf people, and Project Diva, an effort to make Google Assistant accessible to people who are nonverbal.
Google is soliciting data from people with ALS to improve its model’s accuracy and is working on next steps for Project Euphonia, such as using phoneme mistakes to reduce word error rates.