Dex-Net AR uses Apple's ARKit to train robots to grasp objects

UC Berkeley AI researchers are using an iPhone X and Apple's ARKit to train a robotic arm how to grasp an object. It's part of Dex-Net AR, a pipeline for using commodity smartphones for robotic grasping. ARKit creates point clouds from data generated by moving an RGB camera around an object for two minutes.

Robotic grasping is a particular robotics subfield focused on the challenge of teaching a robot to pick up, move, manipulate, or grasp an object. The Dexterity Network, or Dex-Net, research project at UC Berkeley's Autolab dates back to 2017 and includes open source training data sets and pretrained models for robotic grasping in an ecommerce bin-picking scenario. The ability for robots to quickly learn how to grasp objects has a big impact on how automated warehouses like Amazon fulfillment centers can become.

In early experiments with eight objects in a laboratory, Dex-Net AR converted ARKit scans to depth maps for an ABB YuMi robot to grasp objects with a success rate of 95%. Each scan creates a point cloud.

"As the camera moves through space, the density of the point cloud increases, better detecting and defining the object’s surfaces for grasping," a recently published paper detailing Dex-Net AR reads. "Dex-Net AR can generate grasps with accuracy similar to state-of-the-art systems that rely on expensive, industry grade depth sensors. Compared to depth camera systems that capture images from a fixed view, usually top-down, Dex-Net AR allows the user to move the smartphone camera all around the object, collecting three-dimensional point cloud data."

Dex-Net AR cleans up noise caused by estimation errors in ARKit point clouds using an outlier removal algorithm and k-nearest neighbor algorithm. The Dex-Net grasp planner then evaluates how the robot should pick up the object.

Since each ARKit scan took a fixed two minutes per object, in future efforts researchers will look for ways to scan objects more quickly. "[O]ne potential improvement is that we can try to bring down the amount of time in video capturing using a learning-based method to augment and complete the point cloud data given that only limited data are available," the paper reads. Researchers also plan to explore how to better utilize the iPhone X depth-sensing cameras to collect cleaner point cloud data.

Dex-Net AR was introduced last week at the International Conference on Robotics and Automation (ICRA). Other papers published at the conference include works that explore ideal ways to walk for lower body skeletons for humans and four-legged robots. A Stanford lab shared a multi-drone management system that utilizes public buses to reduce delivery costs and energy consumption. Google Brain, Intel AI Lab, and Autolab also introduced Motion2Vec, AI trained AI for robotic surgery using video observation.

More