Pucker up, buttercup: Your Pixel knows when you’re about to smooch someone. In a blog post today, Google detailed the machine learning techniques underlying the camera app (Pixel Camera) that ships on its Pixel 3 smartphone, and announced a new feature — kiss detection — heading to the Photobooth mode in the latest release.
Architecting Photobooth — which is optimized for the Pixels’ front-facing cameras — wasn’t easy, according to Google senior software engineer Navid Shiee and Google AI staff research scientist Aseem Agarwala. The team had to solve for a number of computational challenges, chiefly how to identify objectively “good” content and how to optimally time the camera shutter.
They tackled the first with twin AI models: one for facial expressions and another that detects kissing. Shiee, Agarwala, and colleagues worked with professional photographers to figure out five expressions that should trigger capture — smiles, stuck-out tongues, kissy faces, puffed-out cheeks, and reactions of surprise — and trained the aforementioned models (both of which use MobileNets, a family of low-overhead computer vision models designed to maximize accuracy) to detect them. Interestingly, the kiss detection model is a variation of AI system trained for Google’s eponymous Google Clips camera, fine-tuned specifically on kissing.
Shutter control was another matter. The team’s approach produces a “content score” by temporally analyzing the confidence values from the facial expression and kissing detection models, which serves as the first defense against snaps with closed eyes, talking, or motion blur. Then, it subjects each frame to “a more fine-grained analysis,” which outputs an overall score that considers both facial expression quality and the kiss score.
Shiee and Agarwala note that, because the kissing detection model operates on the entire frame, its output can be used directly as a full-frame score value for kissing. (By contrast, the face expressions model outputs a score for each identified expression.) To account for the variable number of faces present in each frame, Photobooth uses a separate system to compute an expression quality representation, weight each face (to ensure the background isn’t emphasized at the expense of the foreground), and calculates a single, global score for the quality of in-frame expressions.
This final score — a weighted combination of the attention-based facial expression score and the kiss score factor — is used to trigger the shutter, while a separate algorithm maintains a buffer of frames and snaps a shot only if the frame score surpasses that of the frames after it.
The updated Photobooth is available now in the Pixel Camera, and joins other AI-driven features like Top Shot, Portrait mode, and Night Sight.
“We’re excited by the possibilities of automatic photography on camera phones,” they wrote. “As computer vision continues to improve, in the future we may generally trust smart cameras to select a great moment to capture. Photobooth is an example of how we can carve out a useful corner of this space — selfies and group selfies of smiles, funny faces, and kisses — and deliver a fun and useful experience.”