The way you walk says a lot about how you’re feeling at any given moment. When you’re downtrodden or depressed, for example, you’re more likely to slump your shoulders than when you’re contented or upset. Leveraging this somatic lexicon, researchers at the University of Chapel Hill and the University of Maryland recently investigated a machine learning method that can identify a person’s perceived emotion, valence (e.g., negative or positive), and arousal (calm or energetic) from their gait alone. The researchers claim this approach — which they believe is the first of its kind — achieved 80.07% percent accuracy in preliminary experiments.
“Emotions play a large role in our lives, defining our experiences and shaping how we view the world and interact with other humans,” wrote the coauthors. “Because of the importance of perceived emotion in everyday life, automatic emotion recognition is a critical problem in many fields, such as games and entertainment, security and law enforcement, shopping, human-computer interaction, and human-robot interaction.”
The researchers selected four emotions — happy, sad, angry, and neutral — for their tendency to “last an extended period” and their “abundance” in walking activity. Then they extracted gaits from multiple walking video corpora to identify affective features and extracted poses using a 3D pose estimation technique. Finally, they tapped a long short-term memory (LSTM) model — capable of learning long-term dependencies — to obtain features from pose sequences, which they combined with a random forest classifier (which outputs the mean prediction of several individual decision trees) to classify examples into the aforementioned four emotion categories.
The features included things like shoulder posture, the distance between consecutive steps, and the area between the hands and neck. Head tilt angle was used to distinguish between happy and sad emotions, while more compact postures and “body expansion” identified positive and negative emotions, respectively. As for arousal, which the scientists note tends to correspond to increased movements, the model considered the magnitude of velocity, acceleration, and “movement jerks” of hands, feet, and head joints.
The AI system processed samples from Emotion Walk, or EWalk, a novel data set containing 1,384 gaits extracted from videos of 24 subjects walking around a university campus, both indoors and outdoors. Roughly 700 participants from Amazon Mechanical Turk labeled emotions, and the researchers used these labels to determine valence and arousal level.
In tests, the team reports that their emotion detection approach offered a 13.85% improvement over state-of-the-art algorithms and a 24.60% improvement over “vanilla” LSTMs that don’t consider affective features. That isn’t to say it’s foolproof — its accuracy is largely dependent on the precision of the 3D human pose estimation and gait extraction. But despite these limitations, the team believes their method will provide a strong foundation for studies involving additional activities and other emotion identification algorithms.
“Our approach is also the first approach to provide a real-time pipeline for emotion identification from walking videos by leveraging state-of-the-art 3D human pose estimation,” wrote the coauthors. “As part of future work, we would like to collect more data sets and address [limitations].”