Following up on last year’s impressive comparison of personal AI solutions, Loup Ventures today released the results of its 2019 Digital Assistant IQ Test, and there’s good news if you enjoy giving voice commands to your phone, tablet, or speaker: All of the leading digital assistants are getting better at their jobs.
Using a test comprised of the same 800 questions across each AI system, Google Assistant once again led the pack, understanding a full 100% of the questions it was asked, just like last year, and correctly answering 92.9% of them. That’s up from 85.5% correct last year, and rapidly approaching a level of accuracy where errors won’t be a common occurrence.
By contrast, Apple’s Siri jumped in both categories, rising from a 99% understanding level last year to 99.8% this year, and 2018’s 78.5% correct answer level to a 83.1% correct level for 2019. Another way of looking at that — even though it may conflict with real-world Siri user experiences — is that Siri is nearly as likely to respond correctly this year as Google Assistant was last year.
Amazon’s Alexa once again took third place, but made major strides this year, understanding 99.9% of the questions and answering them correctly 79.8% of the time, better than last year’s Siri performance. That’s a sharp rise in correct answers for Alexa, which jumped from a surprising low of 61.4% last year, and Loup notes that it’s the largest jump it has seen between years since it started recording results.
Notably, Loup left out Microsoft’s Cortana this year, which isn’t hugely surprising as the fourth-place AI has been disappearing from Microsoft’s products and third-party accessories. Cortana had only answered 52.4% of last year’s questions correctly, which is to say that you’d be just as well off flipping a coin or guessing if your question could be answered in a binary fashion.
One of the interesting aspects of Loup’s testing is that it covers five different categories: “local,” “commerce,” “navigation,” “information,” and “command,” each designed to test a different area of potential AI assistance. Top scores therefore go to assistants that are well-rounded rather than merely proficient in a single area, so when Alexa was heavily focused on Amazon commerce but not dialed into local information or navigation, it would suffer.
Google Assistant dominated four of those five categories, opening a particularly large gap in commerce, where its 92% accuracy outperformed Alexa (71%) and Siri (68%). It actually achieved top scores in everything except “command,” where Siri beat it by a 93% to 86% margin — the only time Assistant dropped below 92% in correct responses.
Alexa ranked behind both rivals in the “local,” “navigation,” and “command” departments, while only slightly edging Siri out in “commerce.” Siri otherwise finished twice in second place and twice in third place, with its second biggest gap in “information,” where it was markedly worse than the other AIs: 76% correct answers compared to Alexa’s 93% and Google’s 96%.
As Loup has mentioned before, the continued march toward 100% scores is impressive, but shouldn’t be taken to mean that the assistants are in fact “intelligent.” While they can understand “within reason, everything you say to them,” they are only getting good at responses within their primary use cases, and aren’t exhibiting higher-level reasoning skills. The next steps forward for digital assistants, Loup says, are adding additional use cases that “voice is uniquely suited to solve,” and providing simple user experiences to solve them.
VentureBeatVentureBeat's mission is to be a digital town square for technical decision-makers to gain knowledge about transformative technology and transact. Our site delivers essential information on data technologies and strategies to guide you as you lead your organizations. We invite you to become a member of our community, to access:
- up-to-date information on the subjects of interest to you
- our newsletters
- gated thought-leader content and discounted access to our prized events, such as Transform
- networking features, and more