Alexa and Google Assistant execs on future trends for AI assistants

Businesses and developers making conversational AI experiences should start with the understanding that you're going to have to use unsupervised learning to scale, said Prem Natarajan, Amazon head of product and VP of Alexa AI and NLP. He spoke with Barak Turovsky, Google AI director of product for the NLU team, at VentureBeat's Transform 2020 AI conference today as part of a conversation about future trends for AI assistants.

Natarajan called unsupervised learning for language models an important trend for AI assistants and an essential part of creating conversational AI that works for everyone. "Don't wait for the unsupervised learning realization to come to you yet again. Start from the understanding that you're going to have to use unsupervised learning at some level of scale," he said.

Unsupervised learning uses raw, unlabeled data to draw inferences from raw, unclassified data. A complementary trend, Natarajan said, is the development of self-learning systems that can adapt based on signals received from interacting with a person speaking with Alexa.

"It's the old thing, you know: If you fail once, that's OK, but don't make the same failures multiple times. And we're trying to build systems that learn from their past failures," he said. Members of Amazon's machine learning team and conversational AI teams told VentureBeat last fall that self-learning and unsupervised learning could be key to more humanlike interactions with AI assistants.

Another continuing trend is the evolution of trying to weave features into experiences. Last summer, Amazon launched Alexa Conversations in preview, which fuses together Alexa skills into a single cohesive experience using a recurrent neural network to predict dialog paths. For example, the proverbial night out scenario involves skills for buying tickets, making dinner reservations, and making arrangements with a ridesharing app. At the June 2019 launch, Amazon VP of devices David Limp referred to Amazon's work on the feature "the holy grail of voice science." Additional Alexa Conversations news is slated for an Amazon event next week.

Natarajan and Turovsky agreed that multimodal experience design is an another emerging trend. Multimodal models combine input from multiple mediums like text and photos or videos. Some examples of models that combine language and imagery include Google's VisualBERT and OpenAI's ImageGPT, which received an honorable mention from the International Conference on Machine Learning (ICML) this week.

Turovsky talked about advances in surfacing the limited number of answers voice alone can offer. Without a screen, he said, there's no infinite scroll or first page of Google search results, and so responses should be limited to three potential results, tops. For both Amazon and Google, this means building smart displays and emphasizing AI assistants that can both share visual content and respond with voice.

In a conversation with VentureBeat in January, Google AI chief Jeff Dean predicted progress in multimodal models in 2020. The advancement of multimodal models could lead to a number of benefits for image recognition and language models, including more robust inference from models receiving input from more than a single medium.

Another continuing trend, Turovsky said, is the growth of access to smart assistants thanks to the maturation of translation models. Google Assistant is currently able to speak and translate 44 languages,

In a separate presentation earlier today, Turovsky detailed steps Google has taken to remove gender bias from language models. Powered by unsupervised learning, Google introduced changes earlier this year to reduce gender bias in neural machine translation models.

"In my opinion, we are in the early stages of this war. This problem could be seemingly simple; a lot of people could think it's very simple to fix. It's extremely hard to fix, because the notion of a bias in many cases doesn't exist in an AI environment, when we watch it learn, and get both training data and train models to actually address it well," Turovsky said. Indeed, earlier this year researchers affiliated with Georgetown University and Stanford University found racial automatic speech detection systems from companies including Amazon and Google work better for White users than Black users.

More