Affectiva, one in a series of companies to come out of MIT’s Media Lab whose work revolves around affective computing, used to be best known for sensing emotion in videos. It recently expanded into emotion detection in audio with the Speech API for companies making robots and AI assistants.
Affective computing, the use of machines to understand and respond to human emotion, has many practical uses. In addition to Affectiva, Media Lab nurtured Koko, a bot that detects words used on chat apps like Kik to recognize people who need emotional support, and Cogito, whose AI is used by the U.S. Department of Veteran Affairs to analyze the voices of military veterans with PTSD to determine if they need immediate help. Then there’s Jibo, a home robot that mimics human emotion on its five-inch LED face that Time magazine recently declared one of the best inventions of 2017.
Instead of natural language processing, the Speech API private beta uses voice to recognize things like laughing, anger, and various forms of arousal, alongside voice volume, tone, speed, and pauses.
The combination of sentiment analysis of voice and face cues, Affectiva CEO Rana el Kaliouby said, make it possible for technology to respond to human moods and emotions and be part of more humanlike interactions that improve lives. Her favorite example of this comes from the movie Her, in which Joaquin Phoenix’s character falls in love with his AI assistant Samantha, played by Scarlett Johansson.
“I think it’s very powerful that this operating system, because it knew that guy really, really well,” el Kaliouby said. “[He] kind of had this very negative outlook on life, and she was able to turn that around because she knew him so well, so she was able to … persuade him and motivate him to change his behavior, and I think that’s the true power of these conversational interfaces.”
Ask Alexa or a home robot today for a joke and they may tell you one, but they don’t know when or not you found that joke funny. They haven’t learned how to react, and it’s the continued interpretation of those reactions that will remake human-machine interaction.
In essence, El Kaliouby argues, computers need empathy to recognize and respond in a natural way when they see human emotions demonstrated. Empathy, she says, is an intelligence that will lead to a future in which robots can enhance our humanity rather than take away from it.
“There’s a lot of ways these things [conversational agents] can persuade you to lead more productive, healthier, happier lives, but in my opinion they can’t get there unless they have empathy, and unless they can factor in the considerations of your social, emotional, and cognitive state. And you can’t do that without affective computing, or what we call artificial emotional intelligence,” el Kaliouby said.
“We need to build EQ in our AI systems because otherwise they’re not going to be as effective as they were designed to be,” she said.
VentureBeat spoke with el Kaliouby last month, shortly before the World Economic Forum’s global council for robotics and AI, where she joined other members of the business and AI community to discuss how to build ethics and morality into robots and AI systems.
El Kaliouby moved to the United States from Egypt in 2006 to take a postdoctoral position at MIT, where she was part of a project to give people on the autism spectrum real-time feedback on the emotions and expressions people demonstrate in conversations.
In our discussion, we talked about how interpretation and reaction to human emotion may fundamentally change the way humans and machines interact with one another, how voice analytics apply to health care, and what companies mean when they say they want to democratize AI.
This interview was edited for brevity and clarity.
VentureBeat: Affectiva is able to detect, correct me if I’m wrong, seven emotions in videos today?
El Kaliouby: The way we think about it is like facial expressions are the building blocks of different emotional states, so we can read over different tiny facial expressions, then combine these in different ways to represent seven different emotional states plus age, gender, and ethnicity. The key thing with the underlying facial expressions is that sometimes you’ll see somebody squint. That may not have an emotion associated, but it’s a very important facial expression and has a lot of meaning. Or maybe somebody smirks, and if they’re smirking, they’re kind of saying “Yeah, hmmm, I’m not persuaded,” and again that may not map into one of these seven emotional spaces, but it’s still a very representative expression.
VentureBeat: How does emotion detection in video translate to voice? Are those the same emotions you’re detecting?
El Kaliouby: They’re not, though there is some overlap. So the face is very good at positive and negative expressions. The voice, however, is very good about the intensity of the emotions — we call it the arousal level — so we can identify arousal from your voice. We can detect smiles through your facial expression, but then we can identify specifically when you’re laughing through voice. Another example is anger. People communicate anger of course through facial expressions, but in voice there’s a wider spectrum, like cold anger and hot anger and frustration and annoyance, and that entire spectrum is a lot clearer in the voice channel. So they overlap, but they kind of complement each other.
VentureBeat: Everybody’s emotional state is kind of different, so how do you form a baseline of each individual’s emotional state?
El Kaliouby: We factor that into the algorithm. So the clearest example of this is in the face world: Like, some people have wrinkles between their eyebrows, things you can fix with Botox, like “resting bitch face,” basically, and so we developed algorithms that subtract that.
Basically the algorithm first learns, “Oh, this is your neutral face,” and that’s your baseline mode. So if it sees enough of it, and then if it sees a deviation from that baseline, it can subtract that out, and you can do that using neural nets. Eventually, with enough data — like if Alexa has interacted with you every day for the past year — it should have enough information to build a very personalized model of you. We don’t do that yet at Affectiva, but I think that’s where the world will eventually go: superpersonalized models.
VentureBeat: Especially with first-time users, I’ve noticed that people can get really angry at AI assistants at times, and they can get pretty rude at times. What do you think of the idea of making emotional AI that sometimes gets offended and shuts off if you get too rude?
El Kaliouby: That’s interesting; in my mind, that kind of rebels. I don’t know if you want that, but it might, you know. I’m thinking of especially kids, like kids will say, “Alexa, you’re stupid!”
VentureBeat: Right, exactly.
El Kaliouby: So maybe Alexa should kind of rebel for a day.
VentureBeat: I guess it’s rebellion, but in another sense, it’s reinforcement of the social norm that you shouldn’t be mean to somebody who is being servile to you or helping you.
El Kaliouby: Yeah, I absolutely agree. So one reason I think we’re now dehumanizing each other is because we communicate primarily through digital.
A lot of our communication has now become digital, and it does not mimic the natural way we have evolved to communicate with each other, so it’s almost like we have this muscle, these social-emotional skills, and they’re atrophying, right? You look at young kids — you know how there’s all these articles about kids being in an accident and instead of jumping in to help, they’ll just stand and shoot video on their phone — you’ve got to wonder whatever happened to good old empathy right?
And I really think it’s disappearing because we’re not practicing these skills. And so, arguably, you need to be kind to your social robot, and you need to say please and thank you and all these good things. I think that maybe that brings back our humanity in a weird way.
VentureBeat: What are your thoughts on the use of emotional AI to analyze the use of a person’s mental health?
El Kaliouby: I’m very excited about that. I got my start in this area by working on a National Science Foundation-funded project for autism. We built Google-like glasses that had a camera embedded in it, and kids on the spectrum would wear these glasses and it would give them real-time feedback on the emotions and social expressions of people they were talking to.
I actually like this example because it’s an example of where AI can broaden inclusion, because if you take the case of individuals on the spectrum, they usually don’t have equal access to job opportunities because they lack all these social intelligence skills, and that’s really key in the office or on any job. That’s one example. Another is around depression.
There has been some academic research showing that there are facial and vocal biomarkers of depressed patients, and they can use that to flag depression. So there’s a case to be made for using this technology to scale this, and when people are on their devices at home, you can collect all that data, build a baseline of your general mood, and if you deviate from that, it can flag some of these mental health biomarkers.
VentureBeat: Are you thinking about an approach where you’re able to flag something for a physician or provide clinical level data?
El Kaliouby: I think there’s opportunities for both. At Affectiva we’ve had conversations with pharmaceutical companies that wanted to add our facial and vocal biomarkers as independent measures for clinical trials.
Eventually you can imagine how this can be deployed for nurse avatars like the kind being made by Sense.ly. If they flag that it looks like you’re not doing very well, it can loop in an actual human being. So yeah, there’s a number of ways where these can eventually get deployed. None of this has been deployed at scale yet, though, so there’s still a lot of work that needs to be done.
VentureBeat: There’s Affectiva’s whole idea to sell emotional intelligence services to Alexa and third parties, but if an individual wants to use data accrued by services like Alexa to deliver these kinds of insights, is that a service you’d consider making available?
El Kaliouby: Could be right, I mean theoretically speaking. I was just writing this thought piece and imagining my daughter, she’s 14 now. If she gets a personal assistant, when she’s 30 years old that assistant will know her really well, and it will have a ton of data about my daughter. It could know her baseline and should be able to flag if Jana’s feeling really down or fatigued or stressed, and I imagine there’s good to be had from leveraging that data to flag mental health problems early on and get the right support for people. Yeah, I think that can very well happen.
To your point, I’m not quite sure the commercial path to do that. We’re very interested in the health care space and specifically mental health, but we haven’t been able to crack, like: What’s the path to commercialization? Is it partnering with pharma companies? Is it partnering with hospitals? Is it building our own app? Who knows, right? It’s not yet clear.
VentureBeat: It seems like as an industry AI is predicted to affect 30 to 40 percent of jobs out there, and that tends to impact specific cities and regions. I hear that argument on one side, then I hear about the great advances and efficiencies gained from people in the industry on the other side, and it doesn’t always feel like those two meet.
El Kaliouby: That’s another topic, reskilling. Some jobs are going to get eliminated, or maybe not entirely eliminated — they’ll change. Like we think about truck drivers: I don’t know what it looks like, maybe it won’t entirely go away, but it will change.
I think the same is going to happen to mental health clinicians and nurses, where it’s a combination of these nurse avatars, then you have a human overseer that manages all these AI systems. So that means if you’re a nurse today and you want to keep your job where you’re overseeing these nurse avatars, you better get some training on how to operate these AI systems, so I think there’s a big reskilling opportunity. I grew up in the Middle East, and I worry that AI increases the socioeconomic divide, as opposed to closing the gap.
VentureBeat: Yeah, I hear Fresno, California, for example, is expected to be the city most impacted by AI in the United States. Who the hell knew? But also, developing countries are expected to be disproportionately impacted, as well as those that haven’t seen as much industrialization in the past.
El Kaliouby: But I also wonder if we kind of prioritize that dialogue: Some countries, like I’m originally Egyptian, I wonder if we put that top of mind and we reskill people so they can become AI operators and AI trainers or whatever these new jobs are, right? Yeah, I don’t know if anybody is focused on that or thinking about that, but I feel like right now it could go either way. It could help close the gap or hugely increase that gap.
VentureBeat: To that question, the most salient way to put it may be to use a word used a lot in AI circles: democratization. That term has come up at other times in tech history when people wanted to spread adoption, but it’s not clear everyone has the same definition. What do you think it means, if successful and done right, to democratize AI?
El Kaliouby: I think the example that I’m probably most passionate about is in learning, in education. So again, I come from a part of the world where… I was lucky, I got to go to great schools in the Middle East, and I think that was the springboard for how I ended up where I am. I recognize that I’m fortunate, and not everybody in my community had access to these amazing educational opportunities. But we already know that a lot of learning is becoming hybrid, and there’s a lot of online digital learning happening and MOOCs and whatnot. So what if you could augment that with AI systems that could measure students engagement, then offer personalized [curricula] based on your learning style and preferences, and even suggest skills that it thinks you should be learning, and matches you with course content?
I think that could be done entirely in software; it could be something you have on your phone. It’s not that you don’t need brick and mortar and you don’t need school and you don’t need to train teachers — which is, again, using Egypt as an example, that’s like a long road to education reform. But online digital learning, that’s totally doable, and so you could democratize access to education if you leverage AI in the right way.
The audio problem: Learn how new cloud-based API solutions are solving imperfect, frustrating audio in video conferences. Access here