Speech recognition software has gotten really good in the past few years. It’s so good, in fact, that you can actually dictate on your phone faster than you can type out your words.
At least one recent study suggests that. Results of the study, published on August 25, show that an iOS app that uses Baidu speech recognition software inputs three times faster than human typing into the standard iOS keyboard. Baidu conducted the study in collaboration with researchers at the University of Washington and Stanford University.
The rise of Siri, Google Now, Cortana, and Alexa is proof that speech recognition has gotten to be very good. Yet many of us still tap the keys on our phone and tablet keyboards when we need to write texts, emails, and documents — perhaps because we think we’re faster because of dictation errors. The new study shows that maybe it’s time to start relying more on software if we want to save time with text input.
Computers can carry out speech-to-text dictation three times faster than people can type. Tests were carried out in English and Mandarin Chinese, and while the computer’s conversion of speech to text was slightly slower in Mandarin than in English, it was still three times faster than when the humans typed it in.
The experiment pitted 32 people — ages 19 to 32 — against an experimental app that uses a Baidu program called Deep Speech 2.
Participants were asked to dictate or type with a smartphone keyboard more than 100 randomly selected phrases like “Wear a crown of many jewels” and “This person is a disaster.”
People speaking to Deep Speech 2 in English created 172 words a minute, while QWERTY keyboards and fingers made 53 words a minute. In Mandarin, people typed 38 words a minute, and with speech produced 124 words a minute.
Including all the auto-corrects and typos, the keyboard-finger combo made fewer mistakes than speech-to-computer efforts, but was also slower.
Speech is far more natural for people than typing, Baidu chief scientist Andrew Ng told NPR.
“Humanity was never designed to communicate by using our fingers to poke at a tiny little keyboard on a mobile phone,” he said. “Speech has always been a much more natural way for humans to communicate with each other.”
People have been promised great speech recognition for 40 years, said Stanford University computer science professor James Landay in a video accompanying the study. The study’s impressive findings are due to advances in machine learning and big data.
“The implications of this result is we should expect to put speech and a lot more of our user interfaces beyond just typing out an email or text message,” Landay said.
“We can imagine interfaces where you use speech and then you get the results and switch to a graphic interface and poke on it with a finger, or other situations in your car or your home where speech might make sense,” Landay explained. “How do we combine that with other interfaces in the future?”
Datasets and the complete study can be seen on the Stanford University website.