Optical sensors such as cameras and lidar are a fundamental part of modern robotics platforms, but they suffer from a common flaw: transparent objects like glass containers tend to confuse them. That’s because most of the algorithms analyzing data from those sensors assume all surfaces are Lambertian, or that they reflect light evenly in all directions and from all angles. By contrast, transparent objects both refract and reflect light, rendering depth data invalid or full of noise.
In search of a solution, a team of Google researchers collaborated with Columbia University and Synthesis AI, a data generation platform for computer vision, to develop ClearGrasp. It’s an algorithm capable of estimating accurate 3D data of transparent objects from RGB images, and importantly one that works with inputs from any standard RGB camera, using AI to reconstruct the depth of transparent objects and generalize to objects unseen during training.
As the researchers note, training sophisticated AI models usually requires large data sets, and because no corpus of transparent objects existed, they created their own containing more than 50,000 photorealistic renders with corresponding depth, edges, surface normals (which represent the surface curvature), and more. Each image shows up to five transparent objects, either on a flat ground plane or inside a tote with various backgrounds and lighting. And a separate set of 286 real-world images with corresponding ground truth depth serves as a test set.
ClearGrasp comprises three machine learning algorithms in total: a network to estimate surface normals, one for occlusion boundaries (depth discontinuities), and one that masks transparent objects. This mask removes all pixels belonging to transparent objects so that the correct depths can be filled in, and so an optimization module can extend the surface’s depth using predicted surface normals to guide the reconstruction’s shape. (The predicted occlusion boundaries help to maintain separation between distinct objects.)
In experiments, the researchers trained the models on their custom data set, as well as real indoor scenes from the open-source Matterport3D and ScanNet corpora. They say that ClearGrasp managed to reconstruct depth for transparent objects with much higher fidelity than the baseline methods, and that its output depth could be directly used as input to manipulation algorithms that use images. When using a robot parallel-jaw gripper arm, the gripping success rate of transparent objects improved from 12% to 74%, and from 64% to 86% with suction.
“ClearGrasp can benefit robotic manipulation by incorporating it into our pick and place robot’s control system, where we observe significant improvements in the grasping success rate of transparent plastic objects,” wrote study coauthors Shreeyak Sajjan, a Synthesis AI research engineer, and Andy Zeng, a Google research scientist. “A promising direction for future work is improving the domain transfer to real-world images by generating renders with physically-correct caustics and surface imperfections such as fingerprints … Enabling machines to better sense transparent surfaces would not only improve safety, but could also open up a range of new interactions in unstructured applications — from robots handling kitchenware or sorting plastics for recycling, to navigating indoor environments or generating AR visualizations on glass tabletops.”