MIT's AI model learns relationships among objects with minimal training data

Deep learning systems pick out statistical patterns in data -- that's how they interpret the world. But statistical learning requires lots of data, and it's not particularly adept at applying past knowledge to new situations. That's unlike symbolic AI, which records the chain of steps taken to reach a decision with less data than traditional methods.

A new study by a team of researchers at MIT, MIT-IBM Watson AI Lab, and DeepMind demonstrates the potential of symbolic AI applied to an image comprehension task. They say that in tests, their hybrid model managed to learn object-related concepts like color and shape, using that knowledge to suss out object relationships in a scene with minimal training data and "no explicit programming."

"One way children learn concepts is by connecting words with images," said study lead author Jiayuan Mao in a statement. "A machine that can learn the same way needs much less data, and is better able to transfer its knowledge to new scenarios."

The team's model comprises a perception component that translates the images into an object-based representation, and a language layer that extracts meanings from words and sentences and creates "symbolic programs" (i.e., instructions) that tell the AI how to answer the question. A third module runs the symbolic programs on the scene and spits out an answer, updating the model when it makes mistakes.

The researchers trained it on images paired with related questions and answers from Stanford University's CLEVR image comprehension test set. (For example: "What's the color of the object?" and "How many objects are both right of the green cylinder and have the same material as the small blue ball?") The questions grew progressively harder as the model learned, and once it mastered object-level concepts, the model advanced to learning how to relate objects and their properties to each other.

In experiments, it was able to interpret new scenes and concepts "almost perfectly," the researchers report, handily outperforming other bleeding-edge AI systems with just 5,000 images and 100,000 questions used (compared with 70,000 images and 700,000 questions). The team leaves to future work improving its performance on real-world photos and extending it to video understanding and robotic manipulation.

More