AI robotic perception system can recognize objects by touching them

It's a well-established fact that humans learn a lot about the world from touch. In fact, some studies show that kinesthetic learning -- a learning style in which students carry out physical activities, rather than listen to a lecture or watch a demonstration -- can improve outcomes for even those more strongly aligned with visual, auditory, or reading and writing learning.

Inspired by the power of touch, scientists at the University of California, Berkeley devised a perception framework for robots that relies principally on tactility as opposed to vision. Building on work published by Carnegie Mellon University researchers and others, they set out to design an AI system capable of recognizing whether a set of physical observations correspond to particular objects.

"Humans naturally associate the appearance and material properties of objects across multiple modalities. Our perception is inherently multi-modal: when we see a soft toy, we imagine what our fingers would feel touching the soft surface, when we feel the edge of the scissors, we can picture them in our mind – not just their identity, but also their shape, rough size, and proportions," the paper's coauthors wrote in a paper ("Learning to Identify Object Instances by Touch: Tactile Recognition via Multimodal Matching") published on the preprint server Arxiv.org. "In this work, we study how similar multi-modal associations can be learned by a robotic manipulator."

It was easier said than done. As the researchers explain, tactile sensors lack the same "global view" as image sensors -- by contrast, they operate with respect to local surface properties. Moreover, their readings tend to be more difficult to interpret.

To get around these and other limitations, the team combined a high-resolution GelSight touch sensor, which generates readings with a camera that observes gel deformations made by contacts with objects, with a convolutional neural network, an AI system commonly applied to analyze visual imagery. They mounted two GelSight sensors on the fingers of a parallel jaw gripper, which they used to compile a dataset from the camera's observations and the tactile sensor's readings in cases where the gripper successfully got its fingers around target objects.

In total, they collected samples for 98 different objects, 80 of which (comprising 27,386 examples) they used to train the aforementioned neural network. (The remaining 18 and 6,844 examples were reserved for the test set.)

So how did it fare? In tests, the AI system was able to accurately deduce the identity of objects from tactile feel about 64.3 percent of the time, including those which it hadn't encountered during training. Furthermore, the team claims it "outperformed" similar methods, including those of 11 human volunteers (UC Berkeley undergraduates) in 420 trials who were asked to identify objects by looking at the shape of their fingers as they held them in their hands.

The paper's authors say that there's room for improvement -- all images came from the same environment, and they note that their work considered only individual grasps rather than "multiple" tactile interactions. But nonetheless, they contend that it's a promising first step toward perception systems that can, like humans, identify objects from touch alone.

"[E]xtending upon our proposed approach within a robotic manipulation framework is an exciting direction for future research: by enabling robots to recognize objects by touch, we can image robotic warehouses where robots retrieve objects from product images by feeling for them on shelves, robots in the home that can retrieve objects from hard-to-reach places, and perhaps a deeper understanding of object properties through multi-modal training," they wrote.