MIT CSAIL uses AI to create 3D motion sculptures

You've heard of sculptures, and you've likely watched a 3D movie or two, but 3D motion sculptures probably don't ring a bell. The artistic hybrids were created by researchers at the Massachusetts Institute of Technology's Computer Science and Artificial Intelligence Lab, Google Research, and the University of California Berkley, who together used an artificially intelligent (AI) system to generate a surrealist blend of movement and pose.

The system -- dubbed MoSculp -- is described in a paper ("MoSculp: Interactive Visualization of Shape and Time") that'll be presented next month at the User Interface Software and Technology (UIST) conference in Berlin, Germany. Xiuming Zhang, a PhD student and lead author on the paper, thinks it could be used to enable detailed studies of motion for athletes who want to improve their skills.

“Imagine that you have a video of Roger Federer serving a ball in a tennis match, and a video of yourself learning tennis,” Zhang said. “You could ... build motion sculptures of both scenarios [with MoSculp] to compare them and more comprehensively study where you need to improve.”

It's a multistep process. First, MoSculp detects a human body and its 2D pose, tapping OpenPose, a real-time library for multiperson keypoint detection maintained by Carnegie Melon University's Perceptual Computing Lab, to estimate keypoints (an ankle, elbow, hip, etc.) in each frame. Next, it recovers a 3D body model that represents the person's overall shape and their poses across frames.

It sweeps this model through 3D space to create the initial motion sculpture but, as the researchers note, this model lacks texture and structural details, such as fine facial structure, hair, and clothes. The clever solution? Inserting the sculpture into the original video rather than mapping the 3D contents from the video to the scene.

To prevent artifacts and occlusion, MoSculp estimates a depth map of the person and sculpture in each frame, comparing the two to determine if the person is closer to or farther away from the camera than the sculpture. Then it extracts foreground masks of the subject across all frames to refine the initial depth map.

Here's how it works in practice: After a video's loaded into the system, MoSculp overlays the detected keypoints on input frames and confirms them with a few randomly selected frames. (A built-in correction tool lets users make adjustments if necessary.) After correcting for "temporally inconsistent detections," it generates the motion sculpture and loads it into a custom interface.

From within MoSculp, users can navigate around the sculpture or print it using a 3D printer. Tools allow them to customize the material, body parts, scene background, lighting conditions, and other aesthetic features.

Currently, MoSculp only works with single-subject videos, but the team hopes to expand it to multiple people. In the future, they believe it could be used to study things like social disorders, team dynamics, and interpersonal interactions.

“Dance and highly skilled athletic motions often seem like ‘moving sculptures’, but they only create fleeting and ephemeral shapes,” said Courtney Brigham, communications lead at Adobe. “This work shows how to take motions and turn them into real sculptures with objective visualizations of movement, providing a way for athletes to analyze their movements for training, requiring no more equipment than a mobile camera and some computing time.”

More