Robotics researchers propose AI that locates and safely moves items on shelves

A pair of new robotics studies from Google and the University of California, Berkeley propose ways of finding occluded objects on shelves and solving "contact-rich" manipulation tasks like moving objects across a table. The UC Berkeley research introduces Lateral Access maXimal Reduction of occupancY support Area (LAX-RAY), a system that predicts a target object's location, even when only a portion of that object is visible. As for the Google-coauthored paper, it proposes Contact-aware Online COntext Inference (COCOI), which aims to embed the dynamics properties of physical things in an easy-to-use framework.

While researchers have explored the robotics problem of searching for objects in clutter for quite some time, settings like shelves, cabinets, and closets are a less-studied area, despite their wide applicability. (For example, a service robot at a pharmacy might need to find supplies from a medical cabinet.) Contact-rich manipulation problems are just as ubiquitous in the physical world, and humans have developed the ability to manipulate objects of various shapes and properties in complex environments. But robots struggle with these tasks due to the challenges inherent in comprehending high-dimensional perception and physics.

The UC Berkeley researchers, working out of the university's AUTOLab department, focused on the challenge of finding occluded target objects in "lateral access environments," or shelves. The LAX-RAY system comprises three lateral access mechanical search policies. Called "Uniform," "Distribution Area Reduction (DAR)," and "Distribution Area Reduction over 'n' steps (DER-n)," they compute actions to reveal occluded target objects stored on shelves. To test the performance of these policies, the coauthors leveraged an open framework -- The First Order Shelf Simulator (FOSS) -- to generate 800 random shelf environments of varying difficulty. Then they deployed LAX-RAY to a physical shelf with a Fetch robot and an embedded depth-sensing camera, measuring whether the policies could figure out the locations of objects accurately enough to have the robot push those objects.

The researchers say the DAR and DER-n policies showed strong performance compared with the Uniform policy. In a simulation, LAX-RAY achieved 87.3% accuracy, which translated to about 80% accuracy when applied to the real-world robot. In future work, the researchers plan to investigate more sophisticated depth models and the use of pushes parallel to the camera to create space for lateral pushes. They also hope to design pull actions using pneumatically activated suction cups to lift and remove occluding objects from crowded shelves.

In the Google work, which had contributions from researchers at Alphabet's X, Stanford, and UC Berkeley, the coauthors designed a deep reinforcement learning method that takes multimodal data and uses a "deep representative structure" to capture contact-rich dynamics. COCOI taps video footage and readings from a robot-mounted touch sensor to encode dynamics information into a representation. This allows a reinforcement learning algorithm to plan with "dynamics-awareness" that improves its robustness in difficult environments.

The researchers benchmarked COCOI by having both a simulated and real-world robot push objects to target locations while avoiding knocking them over. This isn't as easy as it sounds; key information couldn't be easily extracted from third-angle perspectives, and the task dynamics properties weren't directly observable from raw sensor information. Moreover, the policy needed to be effective for objects with different appearances, shapes, masses, and friction properties.

The researchers say COCOI outperformed a baseline "in a wide range of settings" and dynamics properties. Eventually, they intend to extend their approach to pushing non-rigid objects, such as pieces of cloth.