Elevate your enterprise data technology and strategy at Transform 2021.

As most people who have used Alexa will recognize, the experience of using your voice is very different from using a screen, which is why voice user experience design — or VUX design – requires a different set of skills. Here are 10 tips to help you make a new skill.

1. Manage user expectations

People’s expectations of what digital assistants can do varies greatly. Although the technology has progressed over the past 10 years, films like Her and Ex-Machina are far from reality. We’re still in the equivalent of the Nokia 3310 days of the mobile phone revolution. Therefore, it’s important to align your users’ expectations with what your voice app can deliver, if you want to avoid those one-star reviews. You can do this through the skill description, Alexa Cards, and a landing page for your skill that includes a how-to video, like Capital One has for its Skill.

Choosing a list of features for your app with a common theme can also play an important role in the design process.

For example, it doesn’t make sense for a travel voice app to have 90 percent of its features centered on holiday inspiration and 10 percent on FAQs. Chances are as soon as a user has one question answered, they’ll ask more and be severely disappointed when the others aren’t supported. Keep it simple from the start.

2. There doesn’t need to be a hierarchy!

Screen-based applications have a hierarchical GUI, which users have to tap through, always starting from the home screen or the menu button. The delight of voice apps is you don’t have to do this — they can be designed so a user can reach any part of the experience on first launch. This is what differentiates a great voice app from one that sounds like an IVR system.

3. Consider the linguistics

The most charming moments in voice occur when you ask Alexa something in a niche and personalized way, and she still gets it. This is why a great voice app needs to accommodate differences in linguistics. One person may say “I’d like to order a taxi,” while their friend might ask “Please can you book me a ride?” This doesn’t apply to mobile, where launching an app is always done by tapping on it. In voice design, catering to how everyone speaks — also referred to as utterance expansion — is complex. So use the tools available, and establish a logical process for adding utterances, since a single voice app can have 40,000 of them!

4. Keep Alexa’s responses short

One of the problems when designing the voice interaction model between a user and Alexa is that it is done in writing but experienced with voice. There aren’t yet Alexa mock-up tools (that we know about). One thing you’ll notice is Alexa speaks a lot slower than a human. What looks like a short response on paper is far longer when read by Alexa. It can make the user impatient, so try to keep Alexa’s responses as concise as possible.

5. Don’t put too many steps in the conversation

This goes hand in hand with the point above. Alexa speaks slower than us, so she’s not like the advanced conversational digital assistants depicted in films. Try not to have too many steps in the conversation. For example, only have confirmation steps for important actions, such as transferring money or buying something. This will help to make your voice experience smooth and engaging.

6. Try not to answer a question with a question

Even in human-to-human conversations, having your question answered with a question can feel frustrating. In the cases where you want a back and forth conversation, have Alexa give a useful piece of information before she asks a counter question.

For example, say you’re designing a passenger train app, and the user asks, “How much does it cost to get from London to Leeds?” Instead of Alexa responding with the question “When did you want to travel,” you could make some assumptions, and have her say: “A standard ticket from London Kings Cross to Leeds leaving during peak hours tomorrow costs £110. An off-peak train costs £88. Just let me know the date you want to travel.”

7. Spend time on the edge cases

It’s easy for users to learn what they can and can’t do on a new mobile app by quickly navigating through all the different options on the screen. The same can’t be said for voice. Through Opearlo Analytics we’ve noticed users will often “stress test” a skill when they use it for the first time — for example, by cycling through all the options, or asking questions they half expect not to be supported. Spending time on the edge cases is super important. You want to make sure to safely guide experimental or new users back into the core functionality of the skill so they avoid getting trapped in an error loop, and quitting in frustration!

8. Minimize choice

We’re not used to remembering options as they are read out loud. This is why the Alexa certification team recommend giving the user a maximum of three choices at a time. Through our user testing workshops, we discovered that numbering the options also really helps. The user just needs to recall the number of the option they want, rather than the option itself, which could be a whole sentence. The A Cloud Guru skill demonstrates this well.

9. Minimize pressure

The maximum amount of time Alexa will wait before shutting off after speaking is eight seconds, which isn’t very long. As a user, it’s easy to feel pressure while engaging with a skill. To avoid this, always give the user an option that buys them more deciding time.

For example, in a recipe skill we were building, our original design had Alexa read out the recipe title, then say “Would you like to hear the ingredients, another recipe, or start cooking?” But in our user testing workshops, we found users were unsure which option they wanted and often didn’t say any of them, leaving the session to finish unsuccessfully.

We experimented with the design, and after further user testing chose to amend Alexa’s response to “Would you like to start cooking, hear another recipe, or hear the details.” The “hear the details” option meant the user had more time to decide what they wanted to do next. This made for a much better experience.

10. Audio, audio, audio

By far the best skills on Alexa contain audio content — this is when a piece of audio is played back through Alexa without using her voice.

The Grand Tour skill features the automotive Amazon Prime series’ presenters, and the Jamie Oliver skill (U.K.) has a short message from him at the end of each recipe.

Even if you don’t have a character who could feature in your skill, there are plenty of other ways to incorporate audio. For example, try an audio logo like the U.K. skill On My Way or background music like Inspire Me. Whichever way you do it, incorporating audio into your skill will make it stand out!

Oscar Merry is the cofounder and head of technology at Opearlo, a voice design agency that specializes in designing and building voice applications for digital assistants.

Above: The Machine Intelligence Landscape This article is part of our Artificial Intelligence series. You can download a high-resolution version of the landscape featuring 288 companies by clicking the image.


VentureBeat's mission is to be a digital town square for technical decision-makers to gain knowledge about transformative technology and transact. Our site delivers essential information on data technologies and strategies to guide you as you lead your organizations. We invite you to become a member of our community, to access:
  • up-to-date information on the subjects of interest to you
  • our newsletters
  • gated thought-leader content and discounted access to our prized events, such as Transform 2021: Learn More
  • networking features, and more
Become a member