Google's AI learns how to navigate environments from limited data

Carnegie Mellon, Google, and Stanford researchers write in a paper that they've developed a framework for using weak supervision -- a form of AI training where the model learns from large amounts of limited, imprecise, or noisy data -- that enables robots to efficiently explore a challenging environment. By teaching the robots to reach only the areas of their surroundings that are relevant, the researchers say their approach speeds up training on various robot manipulation tasks.

The team's framework -- Weakly-Supervised Control (WSC) -- learns a corpus with which a software agent can generate its own goals and perform exploration. It incorporates reinforcement learning, a form of training that spurs agents to accomplish goals via rewards. But unlike traditional reinforcement learning, which requires hand-designed rewards that are computationally expensive to obtain, WSC frames the weakly supervised learning problem in a way that provides a form of supervision scalable with the collection of data -- and doesn't require labels in the reinforcement learning loop.

In experiments, the researchers sought to determine whether weak supervision was necessary for learning a disentangled state representation -- i.e., a set of features influenced by the actions of the agent. They tasked several models with simulated vision-based, goal-conditioned manipulation tasks of varying complexity. In one environment, agents were tasked with moving a specific object to a goal location, while in another the agents had to open a door to match a goal angle.

The coauthors report that WSC learned more quickly than prior state-of-the-art goal-conditioned reinforcement learning methods, particularly as the complexity of the agents' various environments grew. Moreover, they say that WSC attained a higher correlation between latent goals and final states, indicating that it learned a more interpretable goal-conditioned policy.

However, the researchers concede that WSC isn't without its limitations. It requires a user to indicate the factors relevant for downstream tasks, which might require expertise, and it only uses weak supervision during pretraining, which might produce representations that don't generalize to new interactions encountered by the agent. That said, they hope in future work to investigate other forms of weak supervision that can provide useful signals to agents, as well as other ways to leverage these labels.

"Given the promising results in increasingly complex environments, evaluating this approach with robots in real-world environments is an exciting future direction," wrote the coauthors. "Overall, we believe that our framework provides a new perspective on supervising the development of general-purpose agents acting in complex environments."