New framework can train a robotic arm on 6 grasping tasks in less than an hour

Advances in machine learning have given rise to a range of robotics capabilities, including grasping, pushing, pulling, and other object manipulation skills. However, general-purpose algorithms have to date been extremely sample-inefficient, limiting their applicability to the real world. Spurred on by this, researchers at the University of California, Berkeley developed a framework -- Framework for Efficient Robotic Manipulation (FERM) -- that leverages cutting-edge techniques to achieve what they claim is "extremely" sample-efficient robotic manipulation algorithm training. The coauthors say that, given only 10 demonstrations amounting to 15 to 50 minutes of real-world training time, a single robotic arm can learn to reach, pick, move, and pull large objects or flip a switch and open a drawer using FERM.

McKinsey pegs the robotics automation potential for production occupations at around 80%, and the pandemic is likely to accelerate this shift. A report by the Manufacturing Institute and Deloitte found that 4.6 million manufacturing jobs will need to be filled over the next decade, and challenges brought on by physical distancing measures and a sustained uptick in ecommerce activity have stretched some logistics operations to the limit. The National Association of Manufacturers says 53.1% of manufacturers anticipate a change in operations due to the health crisis, with 35.5% saying they're already facing supply chain disruptions.

FERM could help accelerate the shift toward automation by making "pixel-based" reinforcement learning -- a type of machine learning in which algorithms learn to complete tasks from recorded demonstrations -- more data-efficient. As the researchers explain in a paper, FERM first collects a small number of demonstrations and stores them in a "replay buffer." An encoder machine learning algorithm is pretrained on the demonstration data contained within the replay buffer. Then a reinforcement learning algorithm in FERM trains on images "augmented" with data generated both by the encoder and the initial demonstrations.

According to the researchers, FERM is easy to assemble in that it only requires a robot, a graphics card, two cameras, a handful of demonstrations, and a reward function that guides the reinforcement learning algorithm toward a goal. In experiments, they say FERM enabled an xArm to learn six tasks within 25 minutes of training time (corresponding to 20 to 80 episodes of training) with an average success rate of 96.7%. The arm could even generalize to objects not seen during training or demonstrations and deal with obstacles blocking its way to goal positions.

"To the best of our knowledge, FERM is the first method to solve a diverse set of sparse-reward robotic manipulation tasks directly from pixels in less than one hour," the researchers wrote. "Due to the limited amount of supervision required, our work presents exciting avenues for applying reinforcement learning to real robots in a quick and efficient manner."

Open source frameworks like FERM promise to advance the state of the art in robotic manipulation, but there remain questions about how to measure progress. As my colleague Khari Johnson writes, metrics used to measure progress in robotic grasping can vary based on the task. For robots operating in a mission-critical environment like space, for example, accuracy matters above all.

"Under certain circumstances, if we have nice objects and you have a very fast robot, you can get there [human picking rates]," roboticist Ken Goldberg told VentureBeat in a previous interview. "But they say humans are like 650 per hour; that's an amazing level. It's very hard to beat humans. We're very good. We've evolved over millions of years."

More