MIT researchers release Clevrer to advance visual reasoning and neurosymbolic AI

Researchers from Harvard University and MIT-IBM Watson AI Lab have released Clevrer, a data set for evaluating AI models' ability to recognize causal relationships and carry out reasoning. A paper sharing initial findings about the CoLlision Events for Video REpresentation and Reasoning (Clevrer) data set was published this week at the entirely digital International Conference of Representation Learning (ICLR).

Clevrer builds on Clevr, a data set released in 2016 by a team from Stanford University and Facebook AI Research, including ImageNet creator Dr. Fei-Fei Li, for analyzing the visual reasoning abilities of neural networks. Clevrer cocreators like Chuang Gan of MIT-IBM Watson Lab and Pushmeet Kohli of Deepmind introduced Neuro-Symbolic Concept Learner (NS-DR), a neuralsymbolic model applied to Clevr at ICLR one year ago.

"We present a systematic study of temporal and causal reasoning in videos. This profound and challenging problem deeply rooted to the fundamentals of human intelligence has just begun to be studied with 'modern' AI tools," the paper reads. "Our newly introduced Clevrer data set and the NS-DR model are preliminary steps toward this direction."

The data set includes 20,000 synthetic videos of colliding objects on a tabletop created with the Bullet physics simulator, together with a natural language data set of questions and answers about objects in videos. The more than 300,000 questions and answers are categorized as descriptive, explanatory, predictive, and counterfactual.

MIT-IBM Watson Lab director David Cox told VentureBeat in an interview that he believes the data set can make progress toward creating hybrid AI that combines neural networks and symbolic AI. IBM Research will apply the approach to IT infrastructure management and industrial settings like factories and construction sites, Cox said.

"I think this is actually going to be important for pretty much every kind of application," Cox said. "The very simple world that we're seeing are these balls moving around is really the first step on the journey to look at the world, understand that world, be able to make plans about how to make things happen in that world. So we think that's probably going to be across many domains, and indeed vision and robotics are great places to start."

The MIT-IBM Watson AI Lab was created three years ago as a way to look for disruptive advances in AI related to the general theme of broad AI. Some of that work -- like ObjectNet -- highlighted the brittle nature of deep learning success stories like ImageNet, but the lab has focused on the combination of neural networks and symbolic or classical AI.

Like neural networks, symbolic AI has been around for decades. Cox argues that just as neural networks waited for the right conditions -- enough data, ample compute -- symbolic AI was waiting for neural networks in order to experience a resurgence.

Cox says the two forms of AI complement each other well and together can build more robust and reliable models with less data and more energy efficiency. In a conversation with VentureBeat at the start of the year, IBM Research director Dario Gil called neurosymbolic AI one of the top advances expected in 2020.

Rather than map inputs and outputs like neural networks, whatever you want the outcome to be, you can represent knowledge or programs. Cox says this may lead to AI better equipped to solve real-world problems.

"Google has a river of data, Amazon has a river of data, and that's great, but the vast majority of problems are more like puzzles, and we think that to move forward and actually make AI live beyond the hype we need to build systems that can do that, that have a logical component, can flexibly reconfigure themselves, that can act on the environment and experiments, that can interpret that information, and define their own internal mental models of the world," Cox said.

The joint MIT-IBM Watson AI Lab was created in 2017 with a $240 million investment.

More