Last September, Amazon unveiled the Voice Interoperability Initiative, a program aimed at ensuring voice-enabled products like smart speakers and displays allow users to choose among multiple voice assistants. Today, the company announced the addition of 38 new members including Dolby, Facebook, Garmin, and Xiaomi to the initiative, bringing the total number of member companies to 77. (Google remains conspicuously absent from the list.) To mark the milestone, Amazon published what it’s calling the Multi-Agent design guide, a whitepaper outlining design recommendations Voice Interoperability Initiative members should use in building multi-assistant products.

The Voice Interoperability Initiative is organized around four core principles, the first of which is developing voice services that work “seamlessly” with others while ostensibly preserving privacy. (Amazon in particular has a spotty track record when it comes to voice privacy, but the company claims to have made strides in recent months.) Members seek to build devices that ship with multiple assistants as they work to accelerate conversational AI research, with the goal of enabling users to leverage the capabilities afforded by Alexa, Cortana, and other services on a single platform.

The newly published Multi-Agent design guide covers three key topic areas, specifically (1) customer choice and agent invocation, (2) multi-agent experiences, and (3) privacy and security. It recommends that multi-assistant products help customers explore assistants’ capabilities, and it lays out suggestions for agent transfer and universal device commands (UDCs), which address user requests one assistant can’t fulfill without summoning another assistant. (UDCs are commands any assistant recognizes even if the assistant wasn’t used to kick off the experience, like volume and timer controls.)

Voice Interoperability Initiative

In a device with agent transfer and UDCs, asking Alexa to reserve a restaurant using Google Duplex (a service that Alexa can’t access) could summon Google Assistant automatically, and asking Google Assistant to stop a timer could affect timers started by Alexa. “During an agent transfer, the [user] makes a request of an agent (Agent 1) who cannot directly fulfill their request (e.g. “I can’t do that”),” the design guide explains. “However, if Agent 1 is aware of another agent (Agent 2) on the device which can likely fulfill that request, Agent 1 can summon the other agent to assist the customer. No data or context is passed between agents during a transfer, and the [user] repeats their request directly to Agent 2 without needing to say the wake word.”

Beyond this, the Multi-Agent design guide recommends coexisting agents convey at least three core attention states — listening, thinking, or speaking — with visual and sound cues. This paradigm, it says, will make it easier for users to see which assistants are listening and when their state changes.

Voice Interoperability Initiative

The Voice Interoperability Initiative’s launch comes a year after Microsoft and Amazon brought Alexa and Cortana to all Echo speakers and Windows 10 users in the U.S., following the formation of a partnership first made public in a 2017 announcement featuring Microsoft CEO Satya Nadella and Amazon CEO Jeff Bezos. Each of the assistants brought distinctive features to the table. Cortana, for example, can schedule a meeting with Outlook or draw on LinkedIn to tell you about people in your next meeting. And Amazon has more than 100,000 voice apps made to tackle a broad range of use cases.


The audio problem: Learn how new cloud-based API solutions are solving imperfect, frustrating audio in video conferences. Access here