Following up on February’s revealing comparison of smart speaker AI assistants, Loup Ventures today published its “annual digital assistant IQ test,” tracking the performance of the four major AI assistants on smartphones. The firm suggested that Google’s Assistant, Apple’s Siri, and Microsoft’s Cortana have all improved over the past year, but surprisingly claimed a big improvement in Siri, while saying that Cortana fell well behind Amazon’s Alexa.
Loup used the same set of 800 questions for each digital assistant, and spread them across five categories: local, commerce, navigation, information, and command. To better reflect the capabilities of modern assistants, this year’s questions were modified from ones used in a similar April 2017 test, somewhat muddying direct year-to-year comparisons.
Overall, the clear winner of the test was Google’s Assistant, which correctly understood 100 percent of questions, and had the most correct answers at 85.5 percent. Assistant won four out of the five categories, falling behind Siri only in “command,” which tests the AI’s ability to execute a specific feature. Loup offered special praise for Assistant’s “information” performance, which it said offered several advantages when searching for information: It confirms that it searched properly, finds the right information, and reads answers aloud.
Despite suffering through what virtually everyone would describe as a dismal year, Siri ranked second in the testing. Loup said that Siri understood 99 percent of queries and answered 78.5 percent of the 800 questions correctly. “[N]early every misunderstood question involved a proper noun,” said Loup, “often the name of a local town or restaurant.” With the exception of those nontrivial details, it claimed, virtually all of the AI assistants “will understand everything you say to them.”
Siri’s biggest strengths were in answering music-related queries, as well as its versatility in controlling the phone, smart home accessories, and other features, which Loup said included greater flexibility in determining a user’s intent. Notably, unlike the limited versions of Siri found on the HomePod and Apple TV, Siri on the iPhone is fully featured, enabling it to reach higher scores than in the earlier smart speaker test.
Alexa and Cortana lagged behind the others; both had 98 percent success rates in understanding queries, but Alexa only answered 61.4 percent correctly, versus Cortana’s 52.4 percent rate. Loup criticized Alexa for too often responding to commerce-related inquiries with “Amazon’s choice” in a product category rather than a broader listing, forcing users to do additional research. Cortana’s issue was mediocre performance across the board, and particular weakness in the “commerce” category, where it suffered from incredibly poor ability to respond correctly — only 20 percent.
Loup mentioned that Google’s and Apple’s ability to integrate their assistants directly into phones gave them natural advantages compared with Alexa and Cortana, which run solely as third-party apps on Android and iOS phones; this translates into superior navigation skills for pocket devices and deeper integration at the OS level.
Since all of the assistants showed major improvements in language processing and multi-device support, Loup said that it didn’t expect further huge jumps in interpretive accuracy — instead, the AI assistants will learn to do more things, and control a wider range of devices. Loop expects that new services such as Siri Shortcuts will eliminate friction by letting users create custom voice commands for apps and functions, while existing features such as ride hailing and making payments will become ubiquitous.