Microsoft proposes AI that improves when you smile

Positive affectivity, or the characteristic that describes how people experience affects (e.g., sensations, emotions, and sentiments) and interact with others as a consequence, has been linked to increased interest and curiosity as well as satisfaction in learning. Inspired by this, a team of Microsoft researchers propose imbuing reinforcement learning, an AI training technique that employs rewards to spur systems toward goals, with positive affect, which they assert might drive exploration useful in gathering experiences critical to learning.

As the researchers explain, reinforcement learning is commonly implemented via policy-specific rewards designed for a predefined goal. Problematically, these extrinsic rewards are narrow in scope and can be difficult to define, as opposed to intrinsic rewards that are task-independent and quickly indicate success or failure.

In pursuit of an intrinsic policy, the researchers developed a framework comprising mechanisms motivated by human affect -- one that motivates agents by drives like delight. Using a computer vision system that models the reward and another system that uses data to solve multiple tasks, it measures human smiles as positive affect.

The framework encourages agents to explore virtual or real-world environments without getting into perilous situations, and it has the advantage of being agnostic to any specific machine intelligence application. A positive intrinsic reward mechanism predicts human smile responses as the exploration evolves, while a sequential decision-making framework learns a generalizable policy. As for the positive intrinsic affect model, it changes the action selection such that it biases actions providing better intrinsic rewards, and a final component uses data collected during the agent's exploration to build representations for visual recognition and understanding tasks.

To test the framework, the researchers collected data from five subjects tasked with exploring a digital three-dimensional maze with a vehicle, as well as synchronized footage of each of their faces. (Every person drove for 11 minutes each, providing a total of 64,000 frames.) Participants were told to explore the environment but were given no additional instruction about other objectives, and their smile responses were calculated and recorded by an open source algorithm.

The affect-based intrinsic motivation model was trained using the subjects' data, with image frames from the vehicle's dashboard serving as the input and the smile probability serving as the output. The results of further experiments show that the framework improved safe exploration while at the same time enabling efficient learning; compared with baselines, the researchers' intrinsic reward policy covered 46% more space in the maze and collided with obstacles 29% less of the time.

"Here we were not attempting to mimic affective processes, but rather to show that functions trained on affect like signals can lead to improved performance," wrote the coauthors of the paper detailing the work. "In summary, we argue that such an intrinsically motivated learning framework inspired by affective mechanisms can be effective in increasing the coverage during exploration, decreasing the number catastrophic failures, and that the garnered experiences can help us learn general representations for solving tasks including depth estimation, scene segmentation, and sketch-to-image translation."

More