Watch Google's AI teach a picker robot to assemble objects

Manipulating objects in a range of shapes isn't machines' forte, but it's a useful skill for any robot tasked with navigating the physical world. To advance the state-of-the-art in this domain, researchers at Google, Stanford, and Columbia recently investigated a machine learning system dubbed Form2Fit, which aims to teach a picker robot with a suction arm the concept of assembling objects into kits.

"If robots could learn 'how things fit together,' then perhaps they could become more adaptable to new manipulation tasks involving objects they have never seen before, like reconnecting severed pipes, or building makeshift shelters by piecing together debris during disaster response scenarios," wrote research intern Kevin Zakka and robotics research scientist Andy Zeng in a blog post. "It helps to increase the efficiency with which we perform tasks, like assembling DIY furniture kits or packing gifts into a box."

As Zakka and Zeng explain, Form2Fit learns to recognize how objects correspond (or "fit") to each other mainly through trial and error. One component -- a two-stream matching algorithm -- infers three-dimensional point representations that communicate not only an object's geometry, but its texture and contextual task-level knowledge. These descriptors are used to establish relationships between objects and their target locations. And because the point representations are orientation-sensitive, they imbue Form2Fit with the knowledge of how an object should be rotated before it's placed in its target location.

Two separate Form2Fit components generate valid pick and place candidates: a suction model and a planner model. The former ingests three-dimensional images of objects and predicts the success of the aforementioned robotic arm's suction arm. As for the planner model, it takes in images of the target location and outputs predictions of placement success, after which it integrates the output of all three of Form2Fit's components (including the matching algorithm) to produce the final pick location, place location, and rotation angle.

The team created a training data set through a concept they call time-reversed disassembly, where the sequence of disassembling a kit becomes a valid assembly sequence when reversed over time. This allowed them to train Form2Fit through self-supervision by randomly picking to disassemble a fully assembled kit, then reversing that disassembly sequence to learn how the kit should be put together.

After training the robot overnight for 12 hours, the researchers report that it learned effective pick and place policies for a variety of objects, achieving 94% assembly success rates with kits in varying configurations and over 86% success rates when handling completely new objects and kits. Even when a policy was trained to assemble a kit only in one specific position and orientation, it still managed to handle random rotations and translations of the kit 90% of the time.

"While Form2Fit's results are promising, its limitations suggest directions for future work," notes Zakka and Zeng. "In our experiments, we assume a 2D planar workspace to constrain the kit assembly task so that it can be solved by sequencing top-down picking and placing actions. This may not work for all cases of assembly -- for example, when a peg needs to be precisely inserted at a 45 degree angle. It would be interesting to expand Form2Fit to more complex action representations for 3D assembly."

More