Conversation is the ultimate user interface

We may be living in the golden age of information, but finding the right information is still a pain in the neck. To tackle this challenge, my team and I at Amazon Alexa are building what we believe is the next-generation user interface that will redefine how we interact with technology and find information.

We spend hours every day hunched over phones and laptops. We open and close and reopen apps. We scroll. We type on tiny QWERTY keyboards. And we click through an endless sea of blue links every time we search the web. The Internet is indeed amazing. The user interface is not.

We have accepted these conditions because, since the dawn of the digital age, this is all we have known. But these methods of interacting with the digital universe were developed in service of business models, not user experience. They are designed to increase the amount of time you spend online, drive click-throughs, and maximize engagement time. But it’s unfair to make humans find information this way. And it’s time to move on.

Conversation: The age-old interface

The first step is changing the way we interact with the Internet. And fortunately, recent advances in AI are making an entirely new user interface possible. In fact, it’s the original interface, the one we’ve been using for nearly two million years. It’s called “conversation.”

Not speech, mind you. We’ve already been using that for almost a decade, interacting with our phones and digital assistants like Alexa. I’m talking about actual, human-like conversation. The kind you might have with a friend over a beer, in which vague or poorly worded questions are understood. Conversations in which intent is inferred and answers to questions are summarized and personalized.

When two people converse, they understand each others’ context and incorporate visual cues. Conversations can be concise and efficient. Or they can range across a variety of topics, change direction, and lead to serendipitous discovery. Humans do this without even thinking about it. But to teach a machine to do this requires significant advances in the science of AI. This is not just about natural language processing (NLP) capabilities, which are improving rapidly with every voice interaction (Alexa alone gets more than a billion requests every week from hundreds of millions of devices in more than 17 languages.)

AI within milliseconds

Rather, for a machine to learn the give-and-take nature of conversation requires a fundamental rethinking of our current system of information retrieval, including the ability to crawl billions of web pages in real time (web-scale neural information retrieval), concisely summarize information from the enormity of the Web (automated summarization), and the ability to recognize an end-user’s intent and recommend additional relevant content (using contextual recommendation engines.)

Conversational interfaces require these systems (and more) to work together seamlessly and instantly. For example, if you ask an AI assistant, “Where is the world’s oldest living tree?” it should be able to not only answer that question quickly and concisely but also understand that you are currently only an hour’s drive from said tree, and follow up with directions and recommendations on hiking trails in the area.

Or if you’re watching the Dallas Cowboys on Thursday Night Football and vaguely ask, “Who just caught that pass?” it should be able to infer which game you’re watching, which team is on offense, who caught the pass and for how many yards. All within milliseconds.

These are difficult, unprecedented problems. As such, Amazon has assembled a team of world-class AI scientists dedicated to solving them. We’re investing in these resources because we believe these capabilities represent the future of human-machine interaction. And we’re not the only ones.

“These give-and-take interactions build relationships that will shape both the user and the system,” said Hae Won Park, a research scientist with MIT’s personal robots group. “Relational agents can disrupt domains like personal assistance, healthcare, aging, education, and more. We’re just beginning to realize the user benefits.”

Moving toward "ambient intelligence"

Indeed, conversational AI can benefit any company interested in changing the way their customers or employees interact with digital information. And like so many of the AI advances first developed in service of Alexa — like Amazon Lex and Amazon Polly — we fully expect to make these capabilities available to any company, in any industry, through the AI services available on AWS.

The end goal is to shift the burden of retrieving and distilling relevant information from humans to AIs. And by embedding this conversational capability into the spaces we live and work — our kitchens, cars, and offices — we can reduce the amount of time we spend peering into phones and laptops. We call this concept “ambient intelligence,” in which AI is available everywhere around you, assists you when you need it, and even anticipates your needs, but fades into the background when you don’t need it.

In other words, we can still benefit from the full awesomeness of the internet while spending far less time with it. As for the business models that depend on tiny screens, endless scrolling, and a sea of blue links? It’s time for them to adapt to us, not the other way around.

Vishal Sharma is VP of Amazon Alexa AI Information.

Welcome to the VentureBeat community!

Our guest posting program is where technical experts share insights and provide neutral, non-vested deep dives on AI, data infrastructure, cybersecurity and other cutting-edge technologies shaping the future of enterprise.

Read more from our guest post program — and check out our guidelines if you’re interested in contributing an article of your own!

Conversation: The age-old interface

AI within milliseconds

Moving toward "ambient intelligence"

More