Softbank Robotics today announced that its robot Pepper will now use emotion recognition AI from Affectiva to interpret and respond to human activity.
Pepper is about four feet tall, gets around on wheels, and has a tablet in the center of its chest. The humanoid robot made its debut in 2015 and was designed to interact with people. Cameras and microphones are used to help Pepper recognize human emotions, like hostility or joy, and respond appropriately with a smile or indications of sadness.
This type of intelligence likely comes in handy for the environments where Pepper operates, like banks, hotels, and Pizza Huts in some parts of Asia.
Affectiva’s emotion recognition AI is able to identify things like laughter in your voice or facial expressions of joy, disgust, surprise, fear, and contempt, as well as specific characteristics about a person, such as age, gender, and ethnicity.
Though the robot is already equipped to recognize some human emotions, Affectiva is being used to give Pepper the ability to understand more nuanced states of human feelings, such as the difference between a smile and a smirk.
“Understanding these complex states will enable Pepper to have more meaningful interactions with people and adapt its behavior to better reflect the way people interact with one another,” the company said in a statement about the partnership shared with VentureBeat.
Affectiva CEO Rana el Kaliouby believes emotion recognition or affective computing is essential in a variety of human-machine encounters, including interactions with home robots, AI assistants like Google Assistant and Alexa, and even autonomous vehicles.
In March, Affectiva launched emotion tracking for cameras in modern cars or semi-autonomous vehicle systems to recognize if people are angry or happy, or to assign drivers distraction scores.
Recognition of a person’s emotional or cognitive state could also be used by AI assistants or other forms of tech to help people lead happier, healthier, more productive lives, and could even assist in mental health treatment, el Kaliouby said.
Emotion detection from cameras and detection drawn from voices and sounds are effective together, but each can be used for different purposes.
“So, the face is very good at positive and negative expressions. The voice, however, is very good about the intensity of the emotions — we call it the arousal level — so we can identify arousal from your voice,” she told VentureBeat in an interview last year. “We can detect smiles through your facial expression, but then we can identify specifically when you’re laughing through voice.”
Understanding things like when a person is laughing at a joke can, for example, lead to a follow-up joke, and knowing when a person is angry could help a company take action to remedy the situation.