A wearable heart rate monitor is one thing, but what about a system that’s able to estimate a person’s heartbeat from footage of their face alone? That’s what researchers at the Chinese Academy of Sciences set out to design in a preprint paper published on Arxiv.org. In it, they describe RhythmNet, an end-to-end trainable heart rate estimator that taps AI and photoplethysmography (PPG) — an optical technique that detects blood volume changes in skin tissue — to address challenges in head movement and variations in lighting.

As the researchers explain, PPG-based HR estimation is made possible by the fact that skin light absorption varies periodically with the blood volume pulse (BVP). Chromosomes like hemoglobin in the microvasculars of the dermis and subcutis layers take in a disproportionate amount of light, such that tiny color changes occur as blood pumps through underlying veins and arteries. They’re invisible to the human eye, but they can be easily captured by RGB sensors like those embedded in wearables.

To train a RhythmNet, the team created a large-scale multi-modal corpus — VIPL-HR1, which is available in open source — containing 2,378 visible light videos and 752 near-infrared videos of 107 subjects. Each clip was captured with a combination of webcams and infrared sensors as a well as smartphones, and contains variations in head movements, head poses (with annotated yaw, pitch, and roll angles), illumination, and device usage.

AI heart rate face

Above: RGB cameras capture skin color changes that can be used to estimate heart rate.

RhythmNet consists of several components, including a face detector that localizes upwards of 81 facial landmarks given a video of a person’s face. A separate component performs alignment and skin segmentation to remove eye regions and other non-face areas, and then generates spatial-temporal maps from video frames 0.5 seconds apart to represent heart rate signals. The maps are fed into a machine learning model trained to predict heart rate from the spatial-temporal maps, after which the estimated beats per minute is computed as the average of all the estimated rates from individual clips.

The researchers evaluated their system on two widely-used databases in MAHNOB-HCI and MMSE-HR, as well as their own. They report that for most of the samples (71%) tested against VIPL-HR1, RhythmNet achieved a heart rate estimation error lower than 5 beats per minute and that it correlated well with the ground truth between 47 beats per minute and 147 beats per minute. Moreover, they say that error rates on MAHNOB-HCI and MMSE-HR didn’t exceed 8.28 beats per minute, outperforming the previous work to which the model was compared.

The team says it plans to investigate the effectiveness of its approach for the other physiological status measurement tasks, such as the breath rate and blood pressure measurement from videos. It also hopes to develop a more robust heart rate estimation model that taps distribution learning and multi-task learning techniques.

“Heart rate is an important physiological signal that reflects the physical and emotional status of a person. Traditional heart rate measurements usually rely on contact monitors, which may cause inconvenience and discomfort,” the paper’s coauthors wrote. “[Our] proposed [system] achieves promising heart rate estimation accuracies in both within-database and cross-database testing scenarios [from the face alone].”