Waymo's AI uses vectors to predict pedestrian, cyclist, and driver behavior

Today, a week after its cars resumed testing on public roads and days after it raised $750 million in capital, Waymo took the wraps off an AI model it claims "significantly" improved its driverless systems' ability to predict the behavior of pedestrians, cyclists, and drivers. Called VectorNet, it ostensibly provides more accurate projections while requiring less compute compared with previous approaches.

Anticipating road agents' future positions is table stakes for driverless cars, which by definition must navigate challenging environments without any human supervision. As tragically illustrated by the March 2018 collision involving an autonomous Uber vehicle and a bicyclist, perception is critical. Without it, self-driving cars can't reliably make decisions about how to respond in familiar -- or unfamiliar -- scenarios.

VectorNet aims to help predict the movements of road users by building representations to encode information from maps, including real-time trajectories. Waymo, like rivals Cruise and Aurora, collects high-definition, precise-to-the-centimeter maps of regions where its autonomous vehicles drive. Paired with sensor data, these provide context to the Waymo Driver, Waymo's full-stack driverless system. But the maps can't be incorporated into prediction models until they've been rendered as images and encoded with scene information, like traffic signs, lanes, and round boundaries.

That's where VectorNet comes in. Unlike the convolutional neural networks it replaced, which operated on computationally expensive pixel renderings of maps, VectorNet ingests each map and sensor input in the form of vectors (sketches made up of points, lines, and curves based on mathematical equations).

Waymo uses vectors to represent road features as points, polygons, and curves. Lane boundaries contain multiple points that form a spline (i.e., curves added together to make larger continuous curves), crosswalks are polygons comprising at least two points, and stop signs are represented by a single point. These geographic entities can be approximated by polylines (connected series of line segments) made up of points, along with their attributes, while moving agents can by estimated by polylines based on their motion trajectories.

Graph neural networks operate directly on graphs, or mathematical objects consisting of nodes and edges. Within VectorNet, a hierarchical graph neural network, each vector is treated as a node, and data from the maps -- along with agents' trajectories -- is propagated to a target node throughout the network. A designated output node corresponding to the target agent is used to decode the trajectories.

VectorNet first obtains polyline-level information before passing it on to a graph to model higher-order interactions among the polylines. It computes objects' future trajectories and captures the relationships among vectors, like when a car enters an intersection or a pedestrian approaches a crosswalk, which allows for better prediction of agents' behaviors.

To further boost VectorNet's capabilities and understanding of the world, thereby improving its predictions, Waymo trained the system to learn from context clues to make inferences about what could happen near a vehicle. Company researchers randomly masked out map features during training, such as a stop sign at a four-way intersection, and required VectorNet to complete the missing elements. In validation tests against Waymo's own data set and startup Argo AI's Argoverse, VectorNet achieved 18% better performance than ResNet-18 (a popular convolutional neural network) while using 29% of the parameters (variables) and consuming 20% of the computation, on average.

"These improvements enable us to make better predictions, creating a safer and smoother experience for our riders, and even parcels we carry on behalf of our local delivery partners," said Waymo in a statement. "This will be especially beneficial as we expand to more cities, where we will continue encountering new scenarios and behavior patterns. VectorNet will allow us to better adapt to these new areas, enabling us to learn more efficiently and effectively and helping us achieve our goal of delivering fully self-driving technology to more people in more places."

This is not the first time Waymo has used AI to expedite workloads like perception, data augmentation, and search.

In early April, the company detailed Progressive Population Based Augmentation (PPBA), a system it claims has improved the performance of its object detection systems while reducing the amount of data required to train them. Waymo collaborated with DeepMind on PBT (Population Based Training), which managed to reduce false positives by 24% in pedestrian, bicyclist, and motorcyclist recognition tasks while cutting training time and computational resources in half. And Waymo previously spotlighted Content Search, which draws on tech similar to that powering Google Photos and Google Image Search to let data scientists quickly locate almost any object in Waymo's driving history and logs.

More