Amazon digs into ambient and generalizable intelligence at re:MARS

Many, if not most, AI experts maintain that artificial general intelligence (AGI) is still many decades away, if not longer. And the AGI debate has been heating up over the past couple of months. However, according to Amazon, the route to “generalizable intelligence” begins with ambient intelligence. And it says that future is unfurling now.

“We are living in the golden area of AI, where dreams and science fiction are becoming a reality,” said Rohit Prasad, senior vice president and head scientist for Alexa at Amazon.

Prasad spoke on the potential evolution from ambient intelligence to generalizable intelligence (GI) today at re:MARS, Amazon’s conference on machine learning (ML), automation, robotics and space.

Prasad made clear that his definition of generalizable intelligence is not an all-knowing, human-like AI His definition is that GI agents should have three key attributes: They should have the ability to accomplish multiple tasks, rapidly evolve to ever-changing environments and learn new concepts and actions with minimal external input from humans.

Ambient intelligence, Prasad said, is when underlying AI is available everywhere, assists people when they need it – and also learns to anticipate needs – then fades into the background when it’s not needed.

A prime example and a significant step toward GI, Prasad said, is Amazon’s Alexa, which he described as a “personal assistant, advisor, companion.”

The virtual assistant is equipped with 30 ML systems that process various sensory signals, he explained. It gets more than 1 billion requests a week in 17 languages in dozens of countries. It will also, he said, be headed to the moon as part of the uncrewed Artemis 1 mission set to launch in August.

"One thing that surprised me the most about Alexa," Prasad said, "is the companionship relationship we have with it. Human attributes of empathy and affect are key for building trust." He added that these attributes have been even more important due the COVID-19 pandemic when so many of us have lost loved-ones. "While AI can't eliminate that pain of loss," Prasad said, "it can definitely make their memories last."

As an example of creating those last personal relationships, a future Alexa feature will be able to synthesize short audio clips into longer speech. As an example, Prasad showed a video of a deceased grandmother reading a grandson a bedtime story.

“This required inventions where we had to learn to produce a high-quality voice with less than a minute of recording versus hours of recording,” he said. He added that it involved framing the problem “as a voice conversion task and not a speech generation path,” he said.

Ambient intelligence reactive, proactive, predictive

As Prasad explained, ambient intelligence is both reactive (responding to direct requests) as well as proactive (anticipating needs). This it accomplishes through the use of numerous sensing technologies: vision, sound, ultrasound, depth, mechanical and atmospheric sensors. These are then acted on.

All told, this capability requires deep learning capabilities, as well as natural language processing (NLP). Ambient intelligence “agents” are also self-supervising and self-learning, which allow them to generalize what they learn and apply that to new contexts.

Alexa’s self-learning mechanism, for instance, automatically corrects tens of millions of defects a week, he said – both customer errors as well as errors in its own natural language understanding (NLU) models.

He described this as the “most practical” route to GI, or the ability for AI entities to understand and learn any intellectual task that humans can.

Ultimately, “that’s why the ambient-intelligence path leads to generalized intelligence,” Prasad said.

What do GI agents actually do?

GI requires a significant dose of common sense, Prasad said, claiming that Alexa already exhibits this: If a user asks to set a reminder for the Super Bowl, for example, it will identify the date of the big game while also converting it to their time zone, then remind them before it starts. It also suggests routines and detects anomalies through its “hunches'' feature.

Still, he emphasized, GI isn’t an “all-knowing, all-capable” technology that can accomplish any task.

“We humans are still the best example of generalization,” he said, “and the standard for AI to aspire to.”

GI is already being realized, he pointed out: Foundational transformer-based large language models trained with self-supervision are powering many tasks with far less manually labeled data than ever before. An example of this is Amazon’s Alexa Teacher Model, which gleans knowledge from NLU, speech recognition, dialogue prediction and visual scene understanding.

The goal is to take automated reasoning to new heights, with the first goal being the “pervasive use” of commonsense knowledge in conversational AI, he said.

In working towards this, Amazon has released a dataset for commonsense knowledge with more than 11,000 newly collected dialogues to aid research in open-domain conversation.

The company has also invented a generative approach that it deems “think-before-you-speak.” This involves the AI agent learning to externalize implicit commonsense knowledge (“think”) and using a large language model (such as the freely available semantic network ConceptNet) combined with a commonsense knowledge graph. It then uses that knowledge to generate responses (“speak”).

Amazon is also training Alexa to answer complex queries requiring multiple inference steps, and is also enabling “conversational explorations” on ambient devices so that users don’t have to pull out their phones or laptops to explore the web.

Prasad said that this capability has required dialogue-flow prediction through deep learning, web-scale neural information retrieval, and automated summarization that can distill information from multiple sources.

The Alexa Conversations dialogue manager helps Alexa decide what actions it should take based on interaction, dialogue history, current inputs and queries, query-guided and self-attention mechanisms. Neural information retrieval pulls information from different modalities and languages based on billions of data points. Transformer-based models – trained using a multistage paradigm optimized for diverse data sources – help to semantically match queries with relevant information. Deep learning models distill information for users while holding onto critical information.

Prasad described the technology as multitasking, multilingual and multimodal, allowing for “more natural, human-like conversations.”

The ultimate goal is to not only make AI useful for customers in their daily lives, but also simple. It’s intuitive, they want to use it, and even come to rely on it. It’s AI that thinks before it speaks, is equipped with common sense knowledge graphs, and can generate responses through explainability – in other words, have the capability to process questions and answers that are not always straightforward.

Ultimately, GI is becoming more and more realizable by the day, as “AI can generalize better than before,” Prasad said.

For retail, AI learns to let customers just walk out

Amazon is also using ML and AI to “reinvent” physical retail through such capabilities as futuristic palm scanning and smart carts in its Amazon Go stores. This enables the “just walk out” ability, explained Dilip Kumar, vice president for physical retail and technology.

The company opened the first of its physical stores in January 2018. These have evolved from 1,800 square foot convenience style to 40,000 square foot grocery style, Kumar said. The company advanced these with its Dash Cart in summer 2020, and with Amazon One in fall 2020.

Advanced computer vision capabilities and ML algorithms allow people to scan their palms upon entry to a store, pick up items, add them to their carts, then walk out.

Palm scanning was selected because the gesture had to be intentional and intuitive, Kumar explained. Palms are associated with the customer’s credit or debit card information, and accuracy is achieved in part through subsurface images of vein information.

This allows for accuracy at “a greater order of magnitude than what face recognition can do,” Kumar said.

Carts, meanwhile, are equipped with weight sensors that identify specific items and the number of items. Advanced algorithms can also handle the increased complexity of “picks and returns” – or when a customer changes their mind about an item – and can eliminate ambient noise.

These algorithms are run locally in-store, in the cloud, and on the edge, Kumar explained. “We can mix and match depending on the environment,” he said.

The goal is to “make this technology entirely recede into the background,” Kumar said, so that customers can focus on shopping. “We hid all of this complexity from customers,” he said, so that they can be “immersed in their shopping experience, their mission.”

Similarly, the company opened its first Amazon Style store in May 2022. Upon entry to the store, customers can scan items on the shop floor that are automatically sent to fitting rooms or pick-up desks. They are also offered suggestions on additional buys.

Ultimately, Kumar said, “we’re very early in our exploration, our pushing the boundaries of ML. We have a whole lot of innovation ahead of us.”

Ambient intelligence reactive, proactive, predictive

What do GI agents actually do?

For retail, AI learns to let customers just walk out

More