McGill University researchers say they’ve developed a technique to train a remote-controlled, offroad car to drive on terrain from aerial and first-person imagery. Their hybrid approach accounts for terrain roughness and obstacles using on-board sensors, enabling it to generalize to environments with vegetation, rocks, and sandy trails.
The work is preliminary, but it might hold promise for autonomous vehicle companies that rely chiefly on camera footage to train their navigational AI. U.K.-based Wayve is in that camp, as are Tesla, Mobileye, and Comma.ai.
The researchers’ work combines elements of model-free and model-based AI training methods into a single graph to leverage the strength of both while offsetting their weaknesses. (As opposed to model-free methods, model-based methods have a software agent try to understand the world and create a model representing it, which sometimes leads to poor performance due to cascading errors.) Their model learns to navigate collision-free trajectories while favoring smooth terrain in a self-supervised fashion, such that the training data is labeled autonomously.
The researchers’ off-road vehicle is based on an electric, two-motor remote-controlled buggy with a mechanical brake that’s wirelessly connected to an Intel i7 NUC computer running the open source Robot Operating System (ROS). The buggy is equipped with both a short-range lidar sensor and a forward-facing camera coupled with an inertial measurement unit, and with a microcontroller that relays all sensor information to the NUC computer.
Before deploying the buggy on an all-terrain course, the team captured images of the course from an 80-meter height using a DJI Mavic Pro, and then they extracted 12-meter-by-9-meter patches of the images so that they could be oriented and centered. The images were taken at a resolution of 0.01 meters per pixel and were aligned within 0.1 meter, using four visual landmarks measured with the buggy.
During training, the team’s model estimates terrain roughness using an inertial measurement unit while the lidar sensor measures the distance between obstacles. Given fused input images from an onboard camera and local aerial view, a recent visual history, terrain class labels (e.g., “rough,” “smooth,” “obstacle”), and a sequence of steering commands, it predicts collision probabilities over a fixed horizon from which a policy or strategy can be derived.
In a real-world field trial, the researchers had the buggy drive at a speed of 6 kilometers per hour (~3.7 miles per hour) after training on 15,000 data samples collected over 5.25 kilometers (~3.2 miles). They report that the navigational model achieved a prediction accuracy of 60% to 78% using the forward ground camera and that when the aerial imagery was incorporated, accuracy increased by around 10% for trajectories with angle changes of 45 degrees or higher. Indeed, the policy drove on smooth terrain 90% of the time and reduced the proportion of rough terrain by over 6.1 times compared with a model using only first-person imagery.