How might a two-armed robot go about accomplishing a task like opening a bottle? Invariably, it’ll need to hold the bottle’s base with one hand while grasping the cap with the other and twisting it off. That high-level sequence of steps is what’s known as a schema, and it’s thankfully uninfluenced by objects’ geometric and spatial states. As an added bonus, unlike reinforcement learning techniques that aim to solve tasks by learning a policy, schemas don’t require millions of examples ingested over the course of hours, weeks, or even months.
Recently, a team at Facebook AI Research sought to imbue two robotic Sawyer arms with the ability to select appropriate steps from a library to complete an objective. At each timestep, their agent had to decide which skill to use and what arguments to use for it (e.g., the location to apply force, the amount of force, or the target pose to move to). Despite the complexity involved, the team says that their approach yielded improvements in learning efficiency, such that manipulation skills could be discovered within only a few hours of training.
The team’s key insight was that for many tasks, the learning process could be split into two parts: (1) learning a task schema and (2) learning a policy that chooses appropriate parameterizations for the different skills. They assert that this approach leads to faster learning, in part because data from different versions of a given task could be used to improve shared skills. Moreover, they say it allowed for the transfer of learned schemas among related tasks.
“For example, suppose we have learned a good schema for picking up a long bar in simulation, where we have access to object poses, geometry information, [and more],” explained the coauthors of the paper detailing the work. “We can then reuse that schema for a related task such as picking up a tray in the real world from only raw camera observations, even though both the state space and the optimal parameterizations (e.g., grasp poses) differ significantly. As the schema is fixed, policy learning for this tray pickup task will be very efficient, since it only requires learning the (observation-dependent) arguments for each skill.”
The researchers gave the aforementioned two robotic arms a generic library of skills such as twisting, lifting, and reaching, which they had to apply to several lateral lifting, picking, opening, and rotating tasks involving varying objects, geometries, and initial poses. The schemas were learned in MuJoCo (a simulation environment) by training with low-dimensional input data like geometric and proprioceptive features (joint positions, joint velocities, end effector pose), and then transferred to visual inputs both in simulation as well as in the real world.
During experiments, the Sawyer arms — which were equipped with cameras and controlled by Facebook’s PyRobot open source robotics platform — were tasked with manipulating nine household objects (such as a rolling pin, soccer ball, glass jar, and T-wrench) that required two parallel-jaw grippers to interact with. Despite having to learn from raw visual images, they say that the system learned to manipulate most items using 2,000 skills with over 90% success in around 4-10 hours of training.
“We have studied how to leverage state-independent sequences of skills to greatly improve the sample efficiency of model-free reinforcement learning,” wrote the coauthors. “Furthermore, we have shown experimentally that transferring sequences of skills learned in simulation to real-world tasks enables us to solve sparse-reward problems from images very efficiently, making it feasible to train real robots to perform complex skills such as bimanual manipulation.”