In a study published this week on the preprint server Arxiv.org, Google and University of California, Berkely researchers propose a framework that combines learning-based perception with model-based controls to enable wheeled robots to autonomously navigate around obstacles. They say it generalizes well to avoiding unseen buildings and humans in both simulation and real-world environments and that it leads to better and more data-efficient behaviors than a purely learning-based approach.
As the researchers explain, autonomous robot navigation has the potential to enable many critical robot applications, from service robots that deliver food and medicine to logistical and search robots for rescue missions. In these applications, it’s imperative for robots to work safely among humans and to adjust their movements based on observed human behavior. For example, if a person is turning left, the robot should pass the human to the right to avoid cutting them off, and when a person is moving in the same direction as the robot, the robot should maintain a safe distance between itself and the person.
To this end, the researchers’ framework leverages a data set aptly dubbed Activate Navigation Dataset (HumANav), which consists of scans of 6,000 synthetic but realistic humans placed in office buildings. (Building mesh scans were sampled from the open source Stanford Large Scale 3D Indoor Spaces Dataset, but any textured building meshes are supported.) It allows users to manipulate the human agents within the building and provides photorealistic renderings via a standard camera, ensuring that important visual cues associated with human movement are present in images, such as the fact that when someone walks quickly their legs will be further apart than if they’re moving slowly.
For the above-mentioned synthetic humans, the team turned to the SURREAL Dataset, which renders images of people in a variety of poses, genders, body shapes, and lighting conditions. The images come from real human motion capture data and contain a variety of actions, like running, jumping, dancing, acrobatics, and walking, with adjustable variables — including position, orientation, and angular speed.
After the framework generates waypoints and their associated trajectories, it renders the images recorded by the robot’s camera at each state along the trajectory and saves the trajectory, along with the optimal waypoint. The trajectory and waypoint are used to train a machine learning model that facilitates reasoning about human motion.
In experiments, the researchers generated 180,000 samples and trained a model — LB-WayPtNav-DH — on 125,000 of them in simulation. When deployed on a Turtlebot 2 robot without fine-tuning or additional training in two never-before-seen buildings, the model succeeded in 10 trials by “exhibiting behavior [that] takes into account the dynamic nature of the human agent.” Concretely, in one instance, it avoided a collision with a human by moving in the opposite direction, and in another, it took a larger turn radius around a corner to leave space for a person.
The team says their framework results in smoother trajectories than prior work and doesn’t require explicit state estimation or trajectory prediction of humans, leading to more reliable performance. Moreover, they say the agent can learn to reason about the dynamic nature of humans, taking into account people’s anticipated motions while planning its own path.
“In future work, it would be interesting to learn richer navigation behaviors in more complex and crowded scenes,” wrote the coauthors. “Dealing with noise in robot state estimation will be another interesting future direction.”
Google isn’t the only tech giant pursuing autonomous robot navigation research. Facebook recently released a simulator — AI Habitat — that can train AI agents embodying things like a home robot to operate in environments meant to mimic real-world apartments and offices. And in a paper published last December, Amazon researchers described a home robot that asks questions when it’s confused about where to go.
The audio problem: Learn how new cloud-based API solutions are solving imperfect, frustrating audio in video conferences. Access here