MIT CSAIL's AI can reconstruct hidden movement from video footage alone

Seeing around corners and through walls is old hat for AI and machine learning algorithms, which are at the heart of systems (some of which use lasers) that produce images outside a sight line. But what about the much more challenging task of reconstructing hidden objects without special equipment?

Researchers at MIT's Computer Science and Artificial Intelligence Laboratory say they've developed exactly that. Their system, which they lay out in a preprint paper published this week, can reconstruct hidden videos from the shadows and reflections on an observed pile of clutter. With nothing more than a video camera switched on in a room, it's capable of "seeing" around corners even when those corners (and live-action performances) fall outside the camera's field of view.

This advances on preliminary work published in 2017 and 2018, in which the system was limited by the amount of light in scenes and tripped up by changes in lighting conditions. The current iteration of the system is more generalizable and more robust, according to the research team.

The secret sauce is an algorithm that predicts the way light travels in a scene (a phenomenon known as light transport). As it turns out, a pile of objects behaves like a pinhole camera, in that it blocks some light rays while allowing others to pass through, painting an image of the surroundings wherever the rays hit. But whereas a pinhole camera lets through just the number of light rays to form a readable picture, a general pile of clutter produces an image that's scrambled beyond recognition.

To unscramble the image, the team sussed out a pattern corresponding to plausible real-world shadowing and shading, and they leveraged the fact that AI algorithms naturally prefer to express "image-like" content even when they've never been trained to do so. To this end, the researchers' system trains two machine learning models simultaneously -- one that produces the scrambling pattern and another that estimates the hidden video. The two are rewarded when the combination of factors reproduces the video recorded from the clutter, driving them to explain the observations with plausible hidden data.

To test their method, the researchers piled up objects on one wall and either projected a video or physically moved around near the opposite wall. From this, they were able to reconstruct videos that conveyed a general sense of what hidden motion was taking place.

It was by no means perfect -- the reconstructions took around two hours to generate -- but the researchers believe the system could one day benefit "many facets of society." Self-driving cars could better understand what's emerging from behind corners, for instance, and search-and-rescue robots could navigate dangerous or obstructed areas with greater robustness. In fact, in October MIT CSAIL researchers presented a system for autonomous vehicles called ShadowCam that uses similar techniques to detect and classify shadows on the ground.

"You can achieve quite a bit with non-line-of-sight imaging equipment like lasers, but in our approach you only have access to the light that's naturally reaching the camera, and you try to make the most out of the scarce information in it," said Miika Aittala, a Nvidia research scientist and the lead researcher on the new technique. "Given the recent advances in neural networks, this seemed like a great time to visit some challenges that, in this space, were considered largely unapproachable before."

More