Researchers detail LaND, AI that learns from autonomous vehicle disengagements

UC Berkeley AI researchers say they've created AI for autonomous vehicles driving in unseen, real-world landscapes that outperforms leading methods for delivery robots driving on sidewalks. Called LaND, for Learning to Navigate from Disengagements, the navigation system studies disengagement events, then predicts when disengagements will happen in the future. The approach is meant to provide what the researchers call a needed shift in perspective about disengagements for the AI community.

A disengagement describes each instance when an autonomous system encounters challenging conditions and must turn control back over to a human operator. Disengagement events are a contested, and some say outdated, metric for measuring the capabilities of an autonomous vehicle system. AI researchers often treat disengagements as a signal for troubleshooting or debugging navigation systems for delivery robots on sidewalks or autonomous vehicles on roads, but LaND treats disengagements as part of training data.

Doing so, according to engineers from Berkeley AI Research, allows the robot to learn from datasets collected naturally during the testing process. Other systems have learned directly from training data gathered from onboard sensors, but researchers say that can require a lot of labeled data and be expensive.

"Our results demonstrate LaND can successfully learn to navigate in diverse, real world sidewalk environments, outperforming both imitation learning and reinforcement learning approaches," the paper reads. "Our key insight is that if the robot can successfully learn to execute actions that avoid disengagement, then the robot will successfully perform the desired task. Crucially, unlike conventional reinforcement learning algorithms, which use task-specific reward functions, our approach does not even need to know the task -- the task is specified implicitly through the disengagement signal. However, similar to standard reinforcement learning algorithms, our approach continuously improves because our learning algorithm reinforces actions that avoid disengagements."

LaND utilizes reinforcement learning, but rather than seek a reward, each disengagement event is treated as a way to learn directly from input sensors like a camera while taking into account factors like steering angle and whether autonomy mode was engaged. The researchers detailed LaND in a paper and code published last week on preprint repository arXiv.

The team collected training data to build LaND by driving a Clearpath Jackal robot on the sidewalks of Berkeley. A human safety driver escorted the robot to reset its course or take over driving for a short period if the robot drove into a street, driveway, or other obstacle. In all, nearly 35,000 data points were collected and nearly 2,000 disengagements were produced during the LaND training on Berkeley sidewalks. Delivery robot startup Kiwibot also operates at UC Berkeley and on nearby sidewalks.

Compared with a deep reinforcement learning algorithm (Kendall et al.) and behavioral cloning, a common method of imitation learning, initial experiments showed that LaND traveled longer distances on sidewalks before disengaging.

In future work, authors say LaND can be combined with existing navigation systems, particularly leading imitation learning methods that use data from experts for improved results. Investigating ways to have the robot alert its handlers when it needs human monitoring could lower costs.

In other recent work focused on keeping training costs down for robotic systems, in August a group of UC Berkeley AI researchers created a simple method for training grasping systems that uses a $18 reacher-grabber and GoPro to collect training data for robotic grasping systems. Last year, Berkeley researchers including Pieter Abbeel, a coauthor of LaND research, introduced Blue, a general purpose robot that costs a fraction of existing robot systems.

More