Robots don’t plan ahead as well as humans, but they’re becoming better at it. That’s the gist of a trio of academic papers Google’s robotics research division highlighted in a blog post this afternoon. Taken together, the authors say, they lay the groundwork for robots capable of navigating long distances by themselves.

“In the United States alone, there are three million people with a mobility impairment that prevents them from ever leaving their homes,” senior research scientist Aleksandra Faust and senior robotics software engineer Anthony Francis wrote. “[Machines could] improve the independence of people with limited mobility, for example, by bringing them groceries, medicine, and packages.”

How? In part by using reinforcement learning (RL), an AI training technique that employs rewards to drive agents toward goals. Faust, Francis, and colleagues combined RL with long-range planning to produce planner agents that can traverse short distances (up to 15 meters) safely, without colliding into moving obstacles. They tapped AutoRL, a tool that automates the search for RL rewards and neural network architectures, to train those agents in a simulated environment. They next used the trained agents to build roadmaps, or graphs comprising nodes (locations) and edges that connect to the nodes only if said agents can traverse between them reliably.

Google robot

VB Event

The AI Impact Tour

Connect with the enterprise AI community at VentureBeat’s AI Impact Tour coming to a city near you!


Learn More

It’s easier said than done; as the researchers point out, training agents with traditional RL approaches poses lots of challenges. It requires spending time iterating and hand-tuning rewards and making poorly informed decisions about AI architectures, not to mention mitigating “catastrophic forgetting,” a phenomenon in which AI systems abruptly forget previously learned information upon learning new information.

AutoRL attempts to solve for this in two phases: reward search and neural network architecture search. During the first stage, it trains agents concurrently over several generations, each with slightly different reward functions. At the end of the phase, the reward that leads the agent to its destination most often is selected. The neural network architecture search phase is a repetition of the first phase, essentially, but using the selected reward to tune the network and optimizing for the cumulative reward.

Google AI robots

Above: Automating reinforcement learning with reward and neural network architecture search.

Image Credit: Google

The process isn’t particularly efficient — AutoRL training over ten generations of 100 agents requires five billion samples, or 32 years’ worth of training. But importantly, it’s automated. The models don’t experience catastrophic forgetting, and the resulting policies are “higher quality” compared to prior art (up to 26 percent better in navigation tasks). They’re even robust enough to guide robots through unstructured environments — i.e., environments they’ve never seen before.

The policies AutoRL produces are great for local navigation, but what about long-range navigation? That’s where probabilistic roadmaps come in. They’re a subcategory of sampling-based planners (which approximate robot motions) that sample robot poses and connect them with “feasible transitions,” creating roadmaps tuned to the unique abilities and geometry of a robot. Combined with hand-tuned RL-based, AutoRL-tuned local planners, they can be used to train robots once locally and subsequently adapted to different environments.

“First, for each robot, we train a local planner policy in a generic simulated training environment,” Faust and Francis explained. “Next, we build a PRM with respect to that policy, called a PRM-RL, over a floor plan for the deployment environment. The same floor plan can be used for any robot we wish to deploy in the building in a one time per robot+environment setup.”

The newest iteration of PRM-RL takes things a step further by replacing the hand-tuned models with AutoRL-trained local planners, which improves long-range navigation. Additionally, it adds simultaneous localization and mapping (SLAM) maps as a source for building the aforementioned roadmaps.

To evaluate PRM-RL, researchers at Google built a roadmap using floor maps of offices up to 200 times larger than the training environments, and accepted edges with at least 90 percent success over 20 trials. Compared to other methods over distances of 100 meters, PRM-RL had 2 to 3 times the rate of success over baseline. And in real-world tests with multiple robots and real building sites, the machines were “very robust” — except near cluttered areas off the edge of the map.

“We can achieve this by development of easy-to-adapt robotic autonomy, including methods that can be deployed in new environments using information that is already available,” Faust and Francis wrote. “This is done by automating the learning of basic, short-range navigation behaviors with AutoRL and using these learned policies in conjunction with SLAM maps to build roadmaps … The result is a policy that once trained can be used across different environments and can produce a roadmap custom-tailored to the particular robot.”

VentureBeat's mission is to be a digital town square for technical decision-makers to gain knowledge about transformative enterprise technology and transact. Discover our Briefings.