At Amazon’s re:Mars conference last week, the company rolled out Alexa Conversations in preview. Conversations is a module within the Alexa Skills Kit that stitches together Alexa voice apps into experiences that help you accomplish complex tasks.
Alexa Conversations may be Amazon’s most intriguing and substantial pitch to voice developers in years. Conversations will make creating skills possible with fewer lines of code. It will also do away with the need to understand the many different ways a person can ask to complete an action, as a recurrent neural network will automatically generate dialogue flow.
For users, Alexa Conversations will make it easier to complete tasks that require the incorporation of multiple skills and will cut down on the number of interactions needed to do things like reserve a movie ticket or order food.
Amazon VP David Limp refers to Conversations as a great next step forward. “It has been sort of the holy grail of voice science, which is how can you make a conversation string together when you didn’t actually programmatically think about it end-to-end. […] I think a year or two ago I would have said we didn’t see a way out of that tunnel, but now I think the science is showing us that [although] it will take us years to get more and more conversational, […] this breakthrough is very big for us, tip of the iceberg,” Limp said.
It begins with a night out and casual conversation
The Alexa Conversations journey is first emerging with a night-out scenario. In an onstage demo last week at re:Mars, a woman buys a movie ticket, makes dinner reservations, and hails a ride in about one minute. (Atom tickets, Uber, and OpenTable are early Alexa Conversations partners.)
The night-out scenario is the first of what Amazon says will become a collection of bundled experiences to get things done.
Conversations may someday power more difficult tasks such as a weekend trip scenario that Limp demonstrated last fall at an event to introduce nearly a dozen new Alexa-powered devices. Limp’s talk of a holy grail is a transformation that every major tech company in the world with an AI assistant is chasing: to evolve assistants from a voice interface that completes basic tasks one at a time to an assistant that can handle complex and complicated tasks.
Two years ago, during a rare onstage gathering of current or former leaders from Alexa, Google Assistant, Siri, and Cortana teams, Viv cofounder and Siri co-creator Adam Cheyer — a person who’s pondered the future of voice assistants since the 1990s – wondered aloud about an assistant that can guide you through the scenario of planning for your sister’s wedding. (Samsung acquired Viv in October 2016 to enhance their Bixby AI assistant.)
At the event, Cheyer talked about how voice will define the next decade of computing and the importance of bridging first-party AI assistant services with a third-party voice app ecosystem. “I don’t want to have to remember what a car assistant can do, the TV system do, the Alexa versus Cortana versus … too much. I want one assistant on every device to access every service without any differentiation between what’s core and what’s third-party,” Cheyer said.
Amazon is working towards that end, starting by reducing the number of interactions you need to get things done with Alexa. Last fall, Amazon introduced Follow-Up Mode, so you can engage in multiple interactions but only have to say the “Alexa” wake word once. With Conversations, the number of interactions necessary to execute the night-out scenario is cut down from 40 to about a dozen back-and-forth interactions.
To further increase the perception that Alexa is capable of natural conversation, the AI assistant learned to whisper when a person is whispering, and can now respond to name-free skill invocation. That means you can say “Get me a ride” instead of first having to launch the skill by saying, “Alexa, launch the Uber skill.”
Creating the perception of intelligence
Amazon isn’t alone in its ambition to make an assistant capable of fluid conversation like the kind you’d expect from another person. Google introduced Continued Conversations so you don’t have the say the wake word to continue to talk about something. Alexa Conversations also gives Amazon’s AI assistant the power to quickly take care of things or engage in commerce akin to Google Assistant’s new food ordering powers and Google’s Duplex. Duplex for the Web and deep connections between Android apps and Google Assistant made their debut last month. Microsoft is also bringing similar intelligence to workplace assistants with Semantic Machines, a startup it acquired in 2018.
It all points to the issue that more complex tasks require more than a single exchange, which Alexa AI senior product manager Sanju Pancholi emphasized. “When you’re starting to solve more complex problems, there is more give and take of information, there are more decisions at each point in time, and hence there are multiple actions that can come in context of the same conversation with different individuals,” he said.
He led a session at re:Mars to make a pitch for Alexa Conversations for businesses and developers, and talked about an assistant that can “solve their product and service needs in the moment of recognition when they realize they need it.”
To be seen as intelligent, Amazon thinks an assistant should understand natural language, remember context, and make proactive predictive suggestions, traits that can prove an assistant is smart enough to accomplish more complex tasks. Doing away with a need to repeat yourself is also critical.
“If you make [customers] repeat information again and again and again, you are forcing them to believe that they are talking to a dumb entity, and if that’s the rapport you’re building with them from the get-go, the chances are they’re never going to delegate higher order tasks to you, because they will never think you’re capable of solving higher-order problems for them,” he said.
The Alexa Skills Store now has more than 90,000 skills, and 325,000 developers have used the Alexa Skills Kit, Pancholi said. Alexa is now available in 100 million devices.
Pancholi shared with developers that potential next steps for Alexa Conversations scenarios may include collections of skills to help people watch content at home, get food delivered, or buy a gift.
Skills on skills
In an interview with VentureBeat, Alexa chief scientist Rohit Prasad declined to share details about use cases that may be taken up next, but believes this could include ways to help plan a weekend. Prasad, who has led Alexa AI initiatives for language understanding and emotional intelligence, said Conversations is designed to stitch together the voice ecosystem for engagement increases for skills and Alexa alike.
“The developer proposition is that you start getting more traffic and more discovery as the more cross skilled we become, like the fact that night out experience is now getting you to order a cab. So Uber and Lyft will see more traffic as well and more customer engagement. So that, and plus skill discovery will happen naturally as part of that. So that’s a huge piece of our value proposition in this case.
Even Blueprints — voice app templates for private, custom Echo skills — may soon incorporate Conversations, Prasad said. Batches of custom skills for the home could, for example, walk kids through multi-step routines, do chores, and help countdown to important dates.
The first proactive Alexa features — Hunches, which suggests event reminders and smart home actions, and Alexa Guard for detecting the sound of broken glass or smoke alarm — were rolled out last fall.
Brands and indie developers
In January 2018, CNBC reported that Amazon was in talks with brands like Procter & Gamble and Clorox to ink deals to promote their products to Alexa users.
Amazon Alexa VP Steve Rabuchin insists there’s no way for businesses or developers to get prioritized by Alexa’s voice app recommendation system, but the Alexa voice app ecosystem may face another problem. Because of the nature of how voice apps work often without a screen, packaging skills means some skills may inevitably be left out or won’t be ranked.
This is especially important for voice apps. Unlike searching for apps on a smartphone, Alexa delivers voice app recommendation engine only serves up three skills at a time.
“Our vision isn’t to end up where it’s just the biggest brands or most popular,” Rabuchin said in an interview with VentureBeat. “A lot of our most popular skills are indie developers, individuals developers.”
Amazon’s skills recommendation engine that responds when you say things like “Alexa, get me a ride,” recommends voice apps based on measurements like engagement levels, which Amazon started paying developers for in 2017. Whether a skill works
Conversations will incorporate skill quality measurements like user ratings, engagement levels Factors like regional significance, whether a skill works on a smart display, and personal information may also decide which skills appear during Alexa Conversations interactions.
“I think we have a good playbook to start from like, I don’t think it’s a perfect playbook, but it’s a great one to start with,” Prasad said.