We’ve seen a big push in recent months to solve AI’s “big data problem.” And some interesting breakthroughs have begun to emerge that could make AI accessible to many more businesses and organizations.
What is the big data problem? It’s the challenge of getting enough data to enable deep learning, a very popular and promising AI technique that allows machines to find relationships and patterns in data by themselves. (For example, after being fed many images of cats, a deep learning program could create its own definition of what constitutes ‘cat’ and use that to identify future images as either ‘cat’ or ‘not cat’. If you change ‘cat’ to ‘customer,’ you can see why many companies are eager to test-drive this technology.)
Deep learning algorithms often require millions of training examples to perform their tasks accurately. But many companies and organizations don’t have access to such large caches of annotated data to train their models (getting millions of pictures of cats is hard enough; how do you get millions of properly annotated customer profiles — or, considering an application from the health care realm, millions of annotated heart failure events?). On top of that, in many domains, data is fragmented and scattered, requiring tremendous efforts and funding to consolidate and clean for AI training. In other fields, data is subject to privacy laws and other regulations, which may put it out of reach of AI engineers.
This is why AI researchers have been under pressure over the last few years to find workarounds for the enormous data requirements of deep learning. And it’s why there’s been a lot of interest in recent months as several promising solutions have emerged — two that would require less training data, and one that would allow organizations to create their own training examples.
Here’s an overview of those emerging solutions.
Hybrid AI models
For a good part of AI’s six-decade history, the field has been marked by a rivalry between symbolic and connectionist AI. Symbolists believe AI must be based on explicit rules coded by programmers. Connectionists argue that AI must learn through experience, the approach used in deep learning.
But more recently, researchers have found that by combining connectionist and symbolist models, they can create AI systems that require much less training data.
In a paper presented at the ICLR conference in May, researchers from MIT and IBM introduced the “Neuro-Symbolic Concept Learner,” an AI model that brings together rule-based AI and neural networks.
NSCL uses neural networks to extract features from images and compose a structured table of information (called “symbols” in AI jargon). It then uses a classic, rule-based program to answer questions and solve problems based on those symbols.
By combining the learning capabilities of neural nets and the reasoning power of rule-based AI, NSCL can adapt to new settings and problems with much less data. The researchers tested the AI model on CLEVR, a dataset for solving visual question answering (VQA). In VQA, an AI must answer questions about the objects and elements contained in a given picture.
AI models based purely on neural networks usually need a lot of training examples to solve VQA problems with decent accuracy. However, NSCL was able to master CLEVR with a fraction of the data.
Few-shot learning and one-shot learning
The traditional approach to cut down training data is to use transfer learning, the process of taking a pretrained neural network and fine-tuning it for a new task. For instance, an AI engineer can use AlexNet, an open-source image classifier trained on millions of images, and repurpose it for a new task by retraining it with domain-specific examples.
Transfer learning reduces the amount of training data required to create an AI model. But it might still require hundreds of examples, and the tuning process requires a lot of trial and error.
In recent months, AI researchers have been able to create techniques that can train for new tasks with far fewer examples.
In May, Samsung’s research lab introduced Talking Heads, a face animation AI model that could perform few-shot learning. The Talking Heads system can animate the portrait of a previously unseen person by seeing only a few images of the subject.
After training on a large dataset of face videos, the AI learns to identify and extract facial landmarks from new images and manipulate them in natural ways without requiring many examples.
Another interesting project in few-shot learning is RepMet, an AI model developed jointly by researchers at IBM and Tel-Aviv University. RepMet uses a specialized technique to fine-tune an image classifier to detect new types of objects with as few as one image per object.
This is useful in settings such as restaurants, where you want to be able to classify dishes that are unique to each restaurant without gathering too many pictures for training your AI models.
Generating training data with GANs
In some domains, training examples exist, but obtaining them poses virtually insurmountable challenges. An example is health care, where data is fragmented across different hospitals and contains sensitive patient information. This makes it even more difficult for AI scientists to obtain and handle data while also remaining in compliance with regulations such as GDPR and HIPAA.
To solve this problem, many researchers are getting help from generative adversarial networks (GANs), a technique invented by AI researcher Ian Goodfellow in 2014. GANs pit a generator and discriminator neural network against each other to create new data.
But GANs can also help reduce the human effort required to gather annotated examples for training deep learning algorithms. Researchers at National Taiwan University recently created a GAN that generates electronic health records to train AI models. Since the generated EHRs are purely synthetic, AI engineers who use them to train their models won’t need to obtain special permissions or worry about falling afoul of privacy laws.
More recently, researchers at Germany’s University of Lubeck introduced a new method to use GANs to synthesize high-quality medical images such as CT scans and MRIs. This new technique is memory efficient, which means it doesn’t require the vast computing resources only available to large AI labs and big tech companies.
Many fear that with the rise of deep learning, companies and organizations that have access to vast amounts of data will dominate. While it’s hard to predict how long it will take for less-data-intensive AI models to transition from research labs to commercially available options, what can be said for sure is that as these and other similar projects emerge, we can become more hopeful that deep learning innovation will not remain limited to the likes of Google and Facebook.
Ben Dickson is a software engineer and the founder of TechTalks, a blog that explores the ways technology is solving and creating problems.
[VentureBeat regularly publishes guest posts from experts who can provide unique and useful perspectives to our readers on news, trends, emerging technologies, and other areas of interest related to tech innovation.]