Back in 2013, Google’s speech recognition technology had a 23% word error rate. At I/O 2015, the company shared it had dropped to an 8% word error rate. At I/O 2017, it had fallen to a 4.9% word error rate, as you can see above. Put another way, Google transcribes every 25th word incorrectly.
Deep learning, a type of AI, is used to achieve accurate image recognition and speech recognition. The method involves ingesting lots of data to train systems called neural networks, and then feeding new data to those systems in an attempt to make predictions. Google has been touting its speech recognition improvements for years, and points to the use of neural networks for the drastic improvement.
But Google CEO Sundar Pichai didn’t announce any progress at I/O 2018, nor at I/O 2019. Furthermore, Google executives and engineers seemed to avoid the topic altogether. And on top of that, the only place I did manage to find a mention of word error rate, it hadn’t changed:
VB Event
The AI Impact Tour
Connect with the enterprise AI community at VentureBeat’s AI Impact Tour coming to a city near you!
I asked Google whether this number was accurate or just a typo. A company spokesperson confirmed that 4.9% is the latest announced metric that Google has shared.
The question is: Does that matter?
I find myself wondering whether Google hit a wall in recent years with its cloud-powered speech recognition. It would thus make sense for the company to shift resources to improving offline, on-device speech recognition solutions. There are benefits to doing so, and tradeoffs.
Or did Google see the privacy firestorm coming first and shifted focus accordingly? Maybe it was both.
Regardless the reason, I’m quite happy Google is prioritizing on-device solutions that are “good enough.” That’s partly because I have little interest in sending even more data back to Google. I also happen to agree with the company’s push to bring this technology to more people. Getting imperfect speech recognition technology into the hands of millions of people is simply a more laudable goal than trying to perfect speech recognition for the few.
But this is Google we’re talking about. The company will likely try to do both.
ProBeat is a column in which Emil rants about whatever crosses him that week.