The gap between Siri and Star Trek voice assistants

The future of search is conversational assistance. Instead of typing in keywords to get back a set of results, users will converse with software agents using natural language to specify increasingly complex tasks.

Now, truth be told, this has been "the future" for over 50 years, from the Star Trek computer to Iron Man's Jarvis. This Apple concept video depicts a helpful conversational assistant you can use for any number of tasks, complete with bow tie. It's from 1987.

Everyone knows the technology behind conversational assistance is hard. There's a lot of excitement in this area right now because we're at a tipping point where these assistants, long seen as cool tech toys, are starting to become genuinely useful.

What has been far less discussed is the difficult product design challenge we have with conversational assistants. Many assume that, because the UI depends on natural language, these assistants will by default be "natural" for people to use.

The UIs are indeed natural. My 4-year-old son doesn't know how to Google something, but he understands how to "ask the lady." But as users get more comfortable conversing with software agents, they will expect them to be able to perform more complex tasks. What becomes much more challenging in such a paradigm is setting user expectations on what's possible versus what's not.

A visual UI can guide users more readily, by essentially showing what's possible in a series of icons or menus. It's much harder for a conversational assistant to gracefully explain that it's able to book restaurants, but not haircuts. What's more, users often explore the bounds of what these agents can handle, exacerbating the problem.

Here are three strategies for setting users' expectations.

1. Restrict to one domain

The simplest way to signal what your conversational agent should be used for is to bound its functionality within a single known domain. If you tell users you only handle finance use cases, then people will accept if you don't understand a football question. Even so, this strategy isn't a panacea, as there are always limits within a domain. Does your finance assistant know about earnings calendars? Can it help with personal finance?

2. Make the set of functionality coherent

Whether you restrict functionality to a single domain or not, what the agent can handle should be as consistent as possible. Users can quickly become confused if, for example, the assistant understands how to recognize the user's home city for the weather forecast, but not how to recognize the user's location when suggesting flights. One useful exercise is to create a framework that categorizes what the agent will and won't handle, and stick to it. That way, users can also map the agent's functionality to their own mental model over time. Consistency is key.

3. Hint and give examples

This strategy, while obvious, becomes particularly useful when combined with the two principles above. In Siri, this takes the form of "Some things you can ask me." In most messaging platforms, this can now include giving suggested responses. Regardless of the shape these examples take, their goal is to train the user on how to best engage with the conversational agent.

Combining these strategies can help optimize the user experience with conversational assistants. These assistants should train users to expect domain-bounded expertise from assistants, present a set of features that make sense together, and guide users through hints so that users don't come away disappointed.

Conversing with a software agent in natural language has long been an aspiration of technologists -- this is the basis for the famous Turing test, for example. While the industry has been making tremendous progress on this front, the goalposts are actually moving to be even more ambitious. Not only should the agent be able to converse intelligibly with us humans, but now we want to be able to delegate tasks to it. Until we reach this long-term future, which still "may be 50 Nobel Prizes away," we need to figure out how to make conversational interfaces not just natural, but more transparent as well. That way, it becomes clear what users can and cannot do with these agents.

1. Restrict to one domain

2. Make the set of functionality coherent

3. Hint and give examples

More