Amazon launches Alexa Conversations in beta, lets developers deep-link skills to mobile apps

Today during Alexa Live, a virtual event for Alexa vendors and developer partners, Amazon unveiled tools and resources designed to enable new Alexa voice app experiences. Among others, the company rolled out deep neural networks aimed at making Alexa natural language understanding more accurate for custom apps, as well as an API that allows the use of web technologies to build gaming apps for select Alexa devices. Amazon also launched Alexa Conversations in beta, a deep learning-based way to help developers create more natural-feeling apps with fewer lines of code. And it debuted a new service in preview -- Alexa for Apps -- that lets Alexa apps trigger actions like searches within smartphone apps.

The reveals come as the pandemic supercharges voice app usage, which was already on an upswing. According to a study by NPR and Edison Research, the percentage of voice-enabled device owners who use commands at least once a day rose between the beginning of 2020 and the start of April. Just over a third of smart speaker owners say they listen to more music, entertainment, and news from their devices than they did before, and owners report requesting an average of 10.8 tasks per week from their assistant this year compared with 9.4 different tasks in 2019.

Amazon says the deep neural networks for natural language understanding improve intent and slot value recognition accuracy by 15% on average. Intents represent actions that fulfill users' requests, and they specify names and utterances a user would say to invoke the intent. Slot values are intent arguments like dates, phrases, and lists of items. "This essentially changes the modeling technology used by Alexa apps behind the scenes," Nedim Fresko, vice president of Alexa devices, told VentureBeat in a phone interview. "We're expanding it to cover more of the apps ... that are out there."

The use of deep neural networks -- which can currently generalize from phrases like "buy me an apple" to "order an orange for me" -- will expand to 400 eligible skills in the U.S., Great Britain, India, and Germany by later this year, according to Amazon.

Thanks to the new NFI Toolkit (in preview), developers can choose to provide Alexa with additional signals about requests their apps can handle. For example, they can provide alternate launch phrases customers might use to launch the app and intents that Alexa can consider when routing name-free requests, and then see the paths customers use to invoke the app from a dashboard. Fresko says early adopters have seen a 15% increase in usage.

Alexa Conversations

Alexa Conversations, which was announced last June in developer preview at Amazon's re:MARS conference, shrinks the lines of code necessary to create voice apps from 5,500 down to about 1,700. Leveraging AI to better understand intents and utterances so that developers don't have to define them, Amazon also says Conversations reduces Alexa interactions that might have taken 40 exchanges to a dozen or so.

Conversations' dialog manager is powered by two innovations, according to Amazon: a dialogue simulator and a "conversations-first" modeling architecture. The dialog simulator generalizes a small number of sample dialogues provided by a developer into tens of thousands of annotated dialogues, while the modeling architecture leverages the generated dialogues to train deep-learning-based models to support dialogues beyond the simple paths provided by the sample dialogues.

Developers supply things like API access and entities the API has access to, in effect describing the app's functionality. Once given these and a few example exchanges, the Conversations dialog manager can extrapolate the possible dialog turns.

Conversations' first use case, demoed last year, seamlessly strung Alexa apps together to let people buy movie tickets, summon rides, and book dinner reservations. (OpenTable, Uber, and Atom Tickets were among Conversations' early adopters.) In light of the pandemic, that scenario seems less useful. But Fresko said it merely illustrates how Conversations can combine elements from multiple apps without much effort on developers' parts; companies like iRobot and Philosophical Creations (which publishes the Big Sky app) are already using it.

"Dialogues are really difficult to emulate with brute force techniques. Usually, developers resort to dialog trees and flow charts to anticipate every turn the conversation can take, and the complexity can get blown out of proportion," Fresko said. "With Conversations, you don't have build context manually -- we'll just do it for you."

'Immersive' audio and visuals

Alexa Presentation Language (APL), a toolset designed to make it easier for developers to create visual Alexa apps, is expanding to sound with APL for Audio. APL for Audio includes new mixing capabilities that support the creation of audio and soundscapes in Alexa apps; audio can be mixed with Alexa speech, multiple voices can be mixed together with sound effects, or visuals can be synced with clips that dynamically respond to users.

"This reflects the reality that Alexa has become useful not only in speakers but in a variety of devices," Fresko said. "It's a big improvement in the workflow for developers -- particularly developers of ambiance or meditation apps, that sort of thing."

Joining APL for Audio is the web API for games, which makes available open standards like Canvas 2D, WebAudio, WebGL, JavaScript, and CSS to Alexa developers. On Echo Show and select Fire TV devices, developers can use the web API for games to create experiences that launch web apps, which display on-device to handle voice requests and react to local events like the microphone listening and muting. End users can interact with the web app through voice, touch, or remote controls (on Fire TV).

On the go

The new Skill Resumption feature, which launches this week in preview, allows developers to experiment with running apps in the background on Alexa devices. It keeps an app's logic intact to let customers engage with it as needed for an extended period of time or resume with it where they left off.

Fresko gave this example: A user tells the Uber app for Alexa to hail a car, then switches away from the Uber app to music, the weather report, and news. As the car comes nearer, the Uber app comes back to the surface to notify them. "Skill Resumption ... lets apps inform users from the background proactively," Fresko said. "Think meditation or workout apps that keep a timer going while the user is performing other tasks."

Skill Resumption dovetails with Alexa for Apps, which integrates iOS and Android apps' content and functionality with Alexa. Through deep linking, developers can assign tasks like opening a mobile app's home page, rendering search results, and other key features to Alexa app voice commands. A yellow pages-type app could take advantage of deep linking to pull up a restaurant's information when a user asks Alexa about it, Fresko explained, while a camera app could tie an Alexa command to the shutter button. TikTok publisher ByteDance worked with Amazon to support the command "Alexa, ask TikTok to start my recording."

Using Quick Links for Alexa (in beta for U.S. English and U.S. Spanish), developers can further leverage deep linking to drive traffic to voice apps from websites and mobile apps. They're able to deep-link to specific content in their apps using URL query string parameters and add attribution parameters to measure online ad campaign performance. "This makes it easier for customers to find skills, and for developers to promote their skill on a variety of media. We expect it'll lead to new opportunities," Fresko said.

Also announced today: In select regions, customers can now purchase premium in-app content -- like expansion packs, monthly subscriptions, and consumables -- on Amazon.com and on the displays of Echo devices with screens. Previously, the only way to make those purchases was through voice. (Amazon remains tight-lipped about exactly how much consumers spend on Alexa skills, but by some estimates, it's at least $2 billion per year.)

Alexa Conversations

'Immersive' audio and visuals

On the go

More