Uber's self-driving AI predicts the trajectories of pedestrians, vehicles, and cyclists

In a preprint paper, Uber researchers describe MultiNet, a system that detects and predicts the motions of obstacles from autonomous vehicle lidar data. They say that unlike existing models, MultiNet reasons about the uncertainty of the behavior and movement of cars, pedestrians, and cyclists using a model that infers detections and predictions and then refines those to generate potential trajectories.

Anticipating the future states of obstacles is a challenging task, but it's key to preventing accidents on the road. Within the context of a self-driving vehicle, a perception system has to capture a range of trajectories other actors might take rather than a single likely trajectory. For example, an opposing vehicle approaching an intersection might continue driving straight or turn in front of an autonomous vehicle; in order to ensure safety, the self-driving vehicle needs to reason about these possibilities and adjust its behavior accordingly.

MultiNet takes as input lidar sensor data and high-definition maps of streets and jointly learns obstacle trajectories and trajectory uncertainties. For vehicles (but not pedestrians or cyclists), it then refines these by discarding the first-stage trajectory predictions and taking the inferred center of objects and objects' headings before normalizing them and feeding them through an algorithm to make final future trajectory and uncertainty predictions.

To test MultiNet's performance, the researchers trained the system for a day on ATG4D, a data set containing sensor readings from 5,500 scenarios collected by Uber's autonomous vehicles across cities in North America using a roof-mounted lidar sensor. They report that MultiNet outperformed several baselines by a significant margin on all three obstacle types (vehicles, pedestrians, and cyclists) in terms of prediction accuracies. Concretely, modeling uncertainty led to improvements of 9% to 13%, and it allowed for reasoning about the inherent noise of future traffic movement.

"[In one case, an] actor approaching an intersection [made] a right-hand turn, where [a baseline system] incorrectly predicted that they will continue moving straight through the intersection. On the other hand, MultiNet predicted a very accurate turning trajectory with high certainty, while also allowing for the possibility of going-straight behavior," the researchers noted. "[Another] actor made an unprotected left turn towards the self-driving vehicle, which IntentNet mispredicted. Conversely, we see that MultiNet produced both possible modes, including a turning trajectory with large uncertainty due to the unusual shape of the intersection."

More