Waymo is using AI to simulate autonomous vehicle camera data

Waymo says it's beginning to leverage AI to generate camera images for simulation by using sensor data collected by its self-driving vehicles. A recent paper coauthored by company researchers including principal scientist Dragomir Anguelov describes the technique, SurfelGAN, which uses texture-mapped surface elements to reconstruct scenes and camera viewpoints for positions and orientations.

Autonomous vehicle companies like Waymo use simulation environments to train, test, and validate their systems before those systems are deployed to real-world cars. There are countless ways to design simulators, including simulating mid-level object representations, but basic simulators omit cues critical for scene understanding, like pedestrian gestures and blinking lights. As for more complex simulators like Waymo's CarCraft, they're computationally demanding, because they attempt to model materials highly accurately to ensure sensors like lidars and radars behave realistically.

In SurfelGAN, Waymo proposes a simpler, data-driven approach for simulating sensor data. Drawing on feeds from real-world lidar sensors and cameras, the AI creates and preserves rich information about the 3D geometry, semantics, and appearance of all objects within the scene. Given the reconstruction, SurfelGAN renders the simulated scene from various distances and viewing angles.

"We've developed a new approach that allows us to generate realistic camera images for simulation directly using sensor data collected by a self-driving vehicle," a Waymo spokesperson told VentureBeat via email. "In simulation, when a trajectory of a self-driving car and other agents (e.g. other cars, cyclists, and pedestrians) changes, the system generates realistic visual sensor data that helps us model the scene in the updated environment ... Parts of the system are in production."

SurfelGAN

SurfelGAN makes use of what's called a texture-enhanced surfel map representation, a compact, easy-to-construct scene representation that preserves sensor information while retaining reasonable computational efficiency. Surfels -- an abbreviated term for "surface element" -- represent objects with discs holding lighting information. Waymo's approach takes voxels (units of graphic information defining points in 3D space) captured by lidar scans and converts them into surfel discs with colors estimated from camera data, after which the surfels are post-processed to address variations in lighting and pose.

To handle dynamic objects like vehicles, SurfelGAN also employs annotations from the Waymo Open Dataset, Waymo's open source corpus of self-driving vehicle sensor logs. Data from lidar scans of objects of interest are accumulated so that in simulation, Waymo can generate reconstructions of cars and pedestrians that can be placed in any location, albeit with imperfect geometry and texturing.

One module within SurfelGAN -- a generative adversarial network (GAN) -- is responsible for converting surfel image renderings into realistic-looking images. Its generator models produce synthetic examples from random noise sampled using a distribution, which along with real examples from a training data set are fed to discriminators, which attempt to distinguish between the two. Both the generators and discriminators improve in their respective abilities until the discriminators are unable to tell the real examples from the synthesized examples with better than the 50% accuracy expected of chance.

The SurfelGAN module trains in an unsupervised fashion, meaning it infers patterns within the corpora without reference to known, labeled, or annotated outcomes. Interestingly, the discriminators' work informs that of the generator -- every time the discriminators correctly identify a synthesized work, they tell the generators how to tweak their output so that they might be more realistic in the future.

Promising results

Waymo conducted a series of experiments to evaluate SurfelGAN's performance, feeding it 798 training sequences consisting of 20 seconds of camera data (from five cameras) and lidar data along with annotations for vehicles, pedestrians, and cyclists from the Waymo Open Dataset. The SurfelGAN team also created and used a new data set called the Waymo Open Dataset-Novel View -- which lacks camera images but starts from scenes and renders surfel images from camera poses perturbed from existing poses -- to create one new surfel image rendering for each frame in the original data set. (The perturbations arose from applying random translations and yaw angle.)

Finally, Waymo collected additional sequences -- 9,800 in total, 100 frames for each -- of unannotated camera images and built a corpus dubbed Dual-Camera-Post Dataset (DCP) to measure the realism of SurfelGAN-generated images. DCP deals with scenarios where two vehicles observe the same scene at the same time; Waymo used data from the first vehicle to reconstruct scenes and render the surfel images at the exact poses of the second vehicle, producing around 1,000 pairs for judging pixel-wise accuracy.

The coauthors of the paper report that when SurfelGAN-generated images were served to an off-the-shelf vehicle detector, the highest-quality synthesized images achieved a metric on par with real images. SurfelGAN also improved on top of the surfel renderings in DCP, producing images closer to real images at a range of distances. Moreover, the researchers demonstrated that images from SurfelGAN could boost the average precision (i.e., how close estimates from different samples were to each other) of a vehicle detector from 11.9% to 13%.

Waymo notes that SurfelGAN isn't perfect. For instance, it's sometimes unable to recover from broken geometry, resulting in unrealistic-looking vehicles. And in the absence of surfel cues, the AI exhibits high variance, especially when it tries to hallucinate patterns uncommon in the dataset, like tall buildings. Despite this, the company's researchers believe it's a strong foundation for future dynamic object modeling and video generation simulation systems.

"Simulation is a vital tool in the advancement of self-driving technology that allows us to pick and replay the most interesting and complex scenarios from our over 20 million autonomous miles on public roads," the spokesperson said. "In such scenarios, the ability to accurately simulate the vehicle sensors [using methods like SurfelGAN] is very important."