Join gaming leaders online at GamesBeat Summit Next this upcoming November 9-10. Learn more about what comes next.
Google CEO Sundar Pichai today announced that the company’s speech recognition technology has now achieved a 4.9 percent word error rate. Put another way, Google transcribes every 20th word incorrectly. That’s a big improvement from the 23 percent the company saw in 2013 and the 8 percent it shared two years ago at I/O 2015.
The tidbit was revealed at Google’s I/O 2017 developer conference, where a big emphasis is on artificial intelligence. Deep learning, a type of AI, is used to achieve accurate image recognition and speech recognition. The method involves ingesting lots of data to train systems called neural networks, and then feeding new data to those systems in an attempt to make predictions.
“We’ve been using voice as an input across many of our products,” Pichai said onstage. “That’s because computers are getting much better at understanding speech. We have had significant breakthroughs, but the pace even since last year has been pretty amazing to see. Our word error rate continues to improve even in very noisy environments. This is why if you speak to Google on your phone or Google Home, we can pick up your voice accurately.”
For the sake of comparison, Microsoft declared in October 2016 that it had reached speech recognition parity with humans. Its word error rate at the time was 5.9 percent, though it’s not clear if the two companies are following the same standards of evaluation.
Google has been touting its speech recognition improvements for a while now. Earlier this year, the company said it had slashed its speech recognition word error rate by more than 30 percent since 2012. The main reason for the drastic improvement? Google confirmed that it’s the use of neural networks.
Pichai also shared an interesting tidbit about Home’s development: “When we were shipping Google Home, we were originally planning to include eight microphones… But thanks to neural networks, using a technique called ‘neural beam forming’, we were able to ship it with just two microphones and achieve the same quality.”
So if you’re surprised at how well (or poorly) Google understands what you’re saying, this is why. Recognition is getting better and better, but there’s still room to get that word error rate closer to 0 percent.