Viewing scenes and making sense of them is something people do effortlessly every day. Whether it’s sussing out objects’ colors or gauging their distances apart, it doesn’t take much conscious effort to recognize items’ attributes and apply knowledge to answer questions about them.
That’s patently untrue of most AI systems, which tend to reason rather poorly. But emerging techniques in visual recognition, language understanding, and symbolic program execution promise to imbue them with the ability to generalize to new examples, much like humans.
Scientists at the MIT-IBM Watson AI Lab, a joint 10-year, $240 million partnership to propel scientific breakthroughs in machine learning, are perfecting an approach they say might overcome longstanding barriers in AI model design. It marries deep learning with symbolist philosophies, which advocate representations and logical rules as intelligent machine cornerstones, to create programs that learn about the world through observation.
Here’s how Dario Gil, IBM Research vice president of AI and IBM Q, explained it to me in an interview last week: Imagine you’re given a photo of a scene depicting a collection of objects and tasked with classifying and describing each of them. A purely deep learning solution to the problem would require training a model on thousands of example questions, and that model could be tripped up by variations on those same questions.
“You need to decompose the problem into a variety of things,” said Gil. “You have a visual perception challenge — you have a question and you have to understand what those words mean — and then you have a logic reasoning part that you have to execute to solve this problem [as well].”
By contrast, symbolic reasoning approaches like that described in a recent paper from MIT, IBM, and DeepMind leverage a neurosymbolic concept learner (NS-CL), an amalgamated model programmed to understand concepts like “objects” and “spatial relationship” in text. One component is set loose on a data set of scenes made up of objects, while another learns to map natural language questions to answers from corpora of question-answer pairs.
The framework can answer new questions about different scenes by recognizing visual concepts in those questions, making it highly scalable. As an added benefit, it requires far less data than deep learning approaches alone.
“The data efficiency in solving the task essentially perfectly is [incredible],” said Gil. “[Y]ou can achieve the same accuracy with 1% of the training data, [which is good news for the] 99.99% of businesses that [don’t] have an overabundance of large amounts of labeled data.”
MIT and IBM’s work in symbolic reasoning is one of several recent efforts to inject AI with contextual knowledge about the world. In June, Salesforce researchers detailed an open source corpus — Common Sense Explanations (CoS-E) — for training and inference with a novel machine learning framework (Commonsense Auto-Generated Explanation, or CAGE), which they said improves performance on question-and-answer benchmarks by 10% over baselines and demonstrates an aptitude for reasoning in out-of-domain tasks.
According to Salesforce chief scientist Richard Socher, it could lay the groundwork for more helpful, less frustrating AI assistants. Imagine a machine learning algorithm that intuitively “knows,” without having been explicitly taught, what happens when a ball is pushed off of a table.
“It turns out that, despite all the recent breakthroughs over the last decade, it’s been historically really hard to capture commonsense knowledge in a form that algorithms can actually make useful,” Socher told VentureBeat in a previous phone interview. “The reason I’m so excited for [this research] is that [it’s the] first approach to capture commonsense knowledge, and it turns out that language models — simple models that read text and try to predict the next word and make sense of the future to autocomplete sentences — capture this commonsense knowledge.”
The emergence of more capable AI models has necessitated new benchmarks capable of measuring their performance. To this end, Facebook AI Research, together with Google’s DeepMind, University of Washington, and New York University, earlier this month introduced SuperGLUE, the successor to the General Language Understanding Evaluation (GLUE) benchmark for language understanding. It assigns systems numerical scores based on how well they perform in nine English sentence understanding challenges for natural language understanding systems, with a focus on tasks that have yet to be solved using state-of-the-art methods.
“Current question answering systems are focused on trivia-type questions, such as whether jellyfish have a brain. [SuperGLUE] goes further by requiring machines to elaborate with in-depth answers to open-ended questions, such as ‘How do jellyfish function without a brain?'” Facebook explained in a blog post.
Artificial general intelligence (AGI), or a system that can perform any intellectual task that a human can, remains more or less a pipe dream. But if models and methods at the cutting edge are anything to go by, we might find ourselves engaging in meaningful conversation with an AI assistant sooner rather than later.
Thanks for reading,
AI staff writer