Did you miss a session from the Future of Work Summit? Head over to our Future of Work Summit on-demand library to stream.

Google AI researchers today said they used 2,000 “mannequin challenge” YouTube videos as a training data set to create an AI model capable of depth prediction from videos in motion. Applications of such an understanding could help developers craft augmented reality experiences in scenes shot with hand-held cameras and 3D video.

The mannequin challenge asked groups of people to basically act like time stood still while one person shoots video. In a paper titled “Learning the Depths of Moving People by Watching Frozen People,” researchers said this provides a dataset that helps detect depth of field in videos where the camera and people in the video are moving.

“While there is a recent surge in using machine learning for depth prediction, this work is the first to tailor a learning-based approach to the case of simultaneous camera and human motion,” research scientist Tali Dekel and engineer Forrester Cole said in a blog post today.

The approach outperforms state-of-the-art tools for making depth maps, Google researchers said.

“To the extent that people succeed in staying still during the videos, we can assume the scenes are static and obtain accurate camera poses and depth information by processing them with structure-from-motion (SfM) and multi-view stereo (MVS) algorithms,” the paper reads. “Because the entire scene, including the people, is stationary, we estimate camera poses and depth using SfM and MVS, and use this derived 3D data as supervision for training.”

To make the model, researchers trained a neural network capable of input from RGB images, a mask for human regions, and initial depth of non-human environments in video in order to produce a depth map and make human shape and pose predictions.

YouTube videos were also used as a dataset by University of California, Berkeley AI researchers last year to train AI models to dance the Gangnam style and perform acrobatic human feats like backflips.


VentureBeat's mission is to be a digital town square for technical decision-makers to gain knowledge about transformative technology and transact. Our site delivers essential information on data technologies and strategies to guide you as you lead your organizations. We invite you to become a member of our community, to access:
  • up-to-date information on the subjects of interest to you
  • our newsletters
  • gated thought-leader content and discounted access to our prized events, such as Transform 2021: Learn More
  • networking features, and more
Become a member