Were you unable to attend Transform 2022? Check out all of the summit sessions in our on-demand library now! Watch here.

Robots that can learn to see by touch are within reach, claim researchers at MIT’s Computer Science and Artificial Intelligence Laboratory. Really. In a newly published paper that’ll be presented next week at the Conference on Computer Vision and Pattern Recognition in Long Beach, California, they describe an AI system capable of generating visual representations of objects from tactile signals, and of predicting tactility from snippets of visual data.

“By looking at the scene, our model can imagine the feeling of touching a flat surface or a sharp edge,” said CSAIL PhD student and lead author on the research Yunzhu Li, who wrote the paper alongside MIT professors Russ Tedrake and Antonio Torralba and MIT postdoc Jun-Yan Zhu. “By blindly touching around, our [AI] model can predict the interaction with the environment purely from tactile feelings. Bringing these two senses together could empower the robot and reduce the data we might need for tasks involving manipulating and grasping objects.”

The team’s system employed GANs — two-part neural networks consisting of generators that produce samples and discriminators that attempt to distinguish between the generated samples and real-world samples — to piece together visual images based on tactile data. Fed a tactile sample from VisGel, a corpus of more than 3 million visual/tactile data pairs comprising 12,000 video clips of nearly 200 objects (like tools, fabrics, and household products), it sussed out the shape and material of the contact position and looked back to the reference image to “imagine” the interaction.


For example, given tactile data on a shoe, the model could determine where the shoe was most likely to be touched.

The reference images helped to encode details about the objects and the environment, enabling the machine learning model to self-improve. Deployed on a Kuka robot arm with a tactile GelSight sensor (which was designed by another group at MIT), it compared the current frame with the reference image to identify the location and scale of the touch.

The researchers note that the current data set only has examples of interactions in a controlled environment, and they say that some details, like the color and softness of objects, remain difficult for the system to infer. Still, they say their approach could lay the groundwork for more seamless human-robot integration in manufacturing settings, particularly concerning tasks that lack visual data, like when a light is off or when a worker has to reach into a container blindly.


“This is the first method that can convincingly translate between visual and touch signals,” says Andrew Owens, a postdoctoral researcher at the University of California at Berkeley. “Methods like this have the potential to be very useful for robotics, where you need to answer questions like ‘is this object hard or soft?’ or ‘if I lift this mug by its handle, how good will my grip be?’ This is a very challenging problem, since the signals are so different, and this model has demonstrated great capability.”

VentureBeat's mission is to be a digital town square for technical decision-makers to gain knowledge about transformative enterprise technology and transact. Learn more about membership.