Carnegie Mellon, Google, and Stanford researchers write in a paper that they’ve developed a framework for using weak supervision — a form of AI training where the model learns from large amounts of limited, imprecise, or noisy data — that enables robots to efficiently explore a challenging environment. By teaching the robots to reach only the areas of their surroundings that are relevant, the researchers say their approach speeds up training on various robot manipulation tasks.
The team’s framework — Weakly-Supervised Control (WSC) — learns a corpus with which a software agent can generate its own goals and perform exploration. It incorporates reinforcement learning, a form of training that spurs agents to accomplish goals via rewards. But unlike traditional reinforcement learning, which requires hand-designed rewards that are computationally expensive to obtain, WSC frames the weakly supervised learning problem in a way that provides a form of supervision scalable with the collection of data — and doesn’t require labels in the reinforcement learning loop.
In experiments, the researchers sought to determine whether weak supervision was necessary for learning a disentangled state representation — i.e., a set of features influenced by the actions of the agent. They tasked several models with simulated vision-based, goal-conditioned manipulation tasks of varying complexity. In one environment, agents were tasked with moving a specific object to a goal location, while in another the agents had to open a door to match a goal angle.
The coauthors report that WSC learned more quickly than prior state-of-the-art goal-conditioned reinforcement learning methods, particularly as the complexity of the agents’ various environments grew. Moreover, they say that WSC attained a higher correlation between latent goals and final states, indicating that it learned a more interpretable goal-conditioned policy.
However, the researchers concede that WSC isn’t without its limitations. It requires a user to indicate the factors relevant for downstream tasks, which might require expertise, and it only uses weak supervision during pretraining, which might produce representations that don’t generalize to new interactions encountered by the agent. That said, they hope in future work to investigate other forms of weak supervision that can provide useful signals to agents, as well as other ways to leverage these labels.
“Given the promising results in increasingly complex environments, evaluating this approach with robots in real-world environments is an exciting future direction,” wrote the coauthors. “Overall, we believe that our framework provides a new perspective on supervising the development of general-purpose agents acting in complex environments.”
VentureBeatVentureBeat's mission is to be a digital town square for technical decision-makers to gain knowledge about transformative technology and transact. Our site delivers essential information on data technologies and strategies to guide you as you lead your organizations. We invite you to become a member of our community, to access:
- up-to-date information on the subjects of interest to you
- our newsletters
- gated thought-leader content and discounted access to our prized events, such as Transform 2021: Learn More
- networking features, and more