Few fields have been more thoroughly transformed by artificial intelligence (AI) than robotics. San Francisco-based startup OpenAI developed a model that directs mechanical hands to manipulate objects with state-of-the-art precision, and Softbank Robotics recently tapped sentiment analysis firm Affectiva to imbue its Pepper robot with emotional intelligence.

The latest advancement comes from researchers at the Massachusetts Institute of Technology’s Computer Science and Artificial Intelligence Laboratory (CSAIL), who today in a paper (“Dense Object Nets: Learning Dense Visual Object
Descriptors and Application to Robotic Manipulation”) detailed a computer vision system — dubbed Dense Object Nets — that allows robots to inspect, visually understand, and manipulate object they’ve never seen before.

The team plans to present their findings at the conference on Robot Learning in Zürich, Switzerland in October.

“Many approaches to manipulation can’t identify specific parts of an object across the many orientations that object may encounter,” PhD student Lucas Manuelli, a lead author on the paper, said in a blog post published on MIT CSAIL’s website. “For example, existing algorithms would be unable to grasp a mug by its handle, especially if the mug could be in multiple orientations, like upright, or on its side.”

VB Event

The AI Impact Tour

Connect with the enterprise AI community at VentureBeat’s AI Impact Tour coming to a city near you!


Learn More

Above: DON assists a robot arm in picking up a shoe.

Image Credit: MIT CSAIL

DON isn’t a control system. Rather, it’s a self-supervised deep neural network — layered algorithms that mimic the function of neurons in the brain — trained to generate descriptions of objects in the form of precise coordinates. After training, it’s able to autonomously pick out frames of reference and, when presented with a novel object, map them together to visualize their shape in three dimensions.

Object descriptors take just 20 minutes to learn, on average, according to the researchers, and they’re task-agnostic — that is to say, they’re applicable to both rigid objects (e.g., hats) and non-rigid objects (plush toys). (In one round of training, the system learned a descriptor for hats after seeing only six different types.)

Furthermore, the descriptors remain consistent despite differences in object color, texture, and shape, which gives DON a leg up on models that use RGB or depth data. Because the latter don’t have a consistent object representation and effectively look for “graspable” features, they can’t find such points on objects with even slight deformations.


Above: Visual representations of objects generated by DON.

Image Credit: MIT CSAIL

“In factories, robots often need complex part feeders to work reliably,” Manuelli said. “But a system like this that can understand objects’ orientations could just take a picture and be able to grasp and adjust the object accordingly.”

In tests, the team selected a pixel in a reference image for the system to autonomously identify. They then employed a Kuka arm to grasp objects in isolation (a caterpillar toy), objects within a given class (different kinds of sneakers), and objects in a clutter (a shoe in a spread of other shoes).

During one demonstration, the robotic arm managed to nab a hat out of a pile of similar hats, despite having never seen pictures of the hats in training data. In another, it grasped a caterpillar toy’s right ear from a range of configurations, demonstrating that it could distinguish left from right on symmetrical objects.


Above: Close-up shot of DON system and Kuka Robot grasping a cup.

Image Credit: Tom Buehler / MIT CSAIL

“We observe that for a wide variety of objects, we can acquire dense descriptors that are consistent across viewpoints and configurations,” the researchers wrote. “The variety of objects includes moderately deformable objects, such as soft plush toys, shoes, mugs, and hats, and can include very low-texture objects. Many of these objects were just grabbed from around the lab (including the authors’ and labmates’ shoes and hats), and we have been impressed with the variety of objects for which consistent dense visual models can be reliably learned with the same network architecture and training.”

The team thinks DON might be useful in industrial settings (think object-sorting warehouse robots), but it hopes to develop a more capable version that can perform tasks with a “deeper understanding” of corresponding objects.

“We believe Dense Object Nets are a novel object representation that can enable many new approaches to robotic manipulation,” the researchers wrote. “We are interested to explore new approaches to solving manipulation problems that exploit the dense visual information that learned dense descriptors provide and [to see] how these dense descriptors can benefit other types of robot learning, e.g. learning how to grasp, manipulate, and place a set of objects of interest.”

VentureBeat's mission is to be a digital town square for technical decision-makers to gain knowledge about transformative enterprise technology and transact. Discover our Briefings.