7 lessons from developing a voicebot using Amazon Lex

I spent the past five months developing an AI-powered VoiceOps interface for the enterprise. Since Amazon Lex wasn't available at the time, my team open-sourced the project. We successfully launched it into production in February; it is now live with several customers, helping them prioritize problem resolution for enterprise apps. More recently, we started beta testing with Lex. Now that Lex is widely available to developers, I wanted to share with others some advice and best practices, based on our experiences.

1. Limit intent scope

One of the foundations of building a voicebot is to be able to infer the user's intention from a spoken phrase. One of the things we quickly learned was to avoid combining too many related phrases into a single intent. For example, we originally mapped questions like "Are there any problems at the moment?" or "Have any problems affected travel this week?" to the same problem intent, which just overloaded the bot's logic. A better approach is to keep the scope of each intent as limited as possible.

2. Use natural-ish language

You're going to be tempted to add as many phrases as possible so that the user can speak naturally without having to worry about a specific required syntax. However, too many similar but different phrases can confuse the classifier and cause unexpected results. We found that it's much better to have a smaller subset of specific phrases that are sufficiently distinct. Lex may be more restrictive than true natural language, but it makes the results more consistent.

3. Context matters

It might seem that matching a phrase with a specific intent would be straightforward but, without context, it becomes confusing. For example, if a user says "yes" to a question, you need to store enough contexts in the voicebot to know exactly which question the user is answering.

4. Accommodate accents and special words

Accents and different pronunciations can be tricky for voicebots. In my experience, Lex works best with a Midwest American accent! But if you know your application is going to be used by people with a range of accents, it all comes down to training the system. You have to accurately train the voicebot to learn each very specific phrase and pronunciation so that the system knows how to match it with the desired intent. The same goes for specific words. For example, our company's name, Dynatrace, is not in Lex's dictionary. It hears "diner" or the name "Dinah," so we had to train it to recognize not just the word, but also the specific actions and intents associated with its use when spoken.

5. Be extensible

The key to being successful with voicebot development is to think broadly and laterally about all its potential use cases. Don't limit yourself.

6. Prepare to fail fast and often

There's plenty of documentation that comes with Lex, but it can be technical and overwhelming -- so much so that you might not want to experiment. But the good news is that, despite the complexity of building voicebots, Lex makes it simple to create test apps and quickly discover what works and what doesn't so you can learn as you go.

7. Look back to the future

Most parsing services such as Amazon Lex will assume a future intent. For instance, if a user says "What happened on Thursday?" she is obviously asking about something that happened in the past. However, Lex doesn't inherently understand that. This is more of a challenge with the current state of natural language processing (NLP) and not a Lex-specific problem. As companies like Amazon invest in NLP, I'm hopeful we'll be able to distinguish timeframes with more clarity. But in the meantime, making tense super-specific is something developers will need to factor in.

Michael Beemer is a DevOps engineer at Dynatrace and led the VoiceOps development team for Davis, Dynatrace's AI-powered digital performance virtual assistant.