Google Brain chief: Deep learning takes at least 100,000 examples

While the current class of deep learning techniques is helping fuel the AI wave, one of the frequently cited drawbacks is that they require a lot of data to work. But how much is enough data?

“I would say pretty much any business that has tens or hundreds of thousands of customer interactions has enough scale to start thinking about using these sorts of things,” Jeff Dean, a senior fellow at Google, said in an onstage interview at the VB Summit in Berkeley, California. “If you only have 10 examples of something, it's going to be hard to make deep learning work. If you have 100,000 things you care about, records or whatever, that’s the kind of scale where you should really start thinking about these kinds of techniques.”

Dean knows a thing or two about deep learning -- he’s head of the Google Brain team, a group of researchers focused on a wide-ranging set of problems in computer science and artificial intelligence. He’s been working with neural networks since the 1990s, when he wrote his undergraduate thesis on artificial neural networks.

In his view, machine learning techniques have an opportunity to impact virtually every industry, though the rate at which that happens will depend on the specific industry.

There are still plenty of hurdles that humans need to tackle before they can take the data they have and turn it into machine intelligence. In order to be useful for machine learning, data needs to be processed, which can take time and require (at least at first) significant human intervention.

“There’s a lot of work in machine learning systems that is not actually machine learning,” Dean said. “And so you still have to do a lot of that. You have to get the data together, maybe you have to have humans label examples, and then you have to write some data processing pipeline to produce the dataset that you will then do machine learning on.”

In order to simplify the process of creating machine learning systems, Google is turning to machine learning itself to determine the right system for solving a particular problem. It’s a tough task that isn’t anywhere near completed, but Dean said the team's early work is promising.

One encouraging example of how this might work comes from a self-trained network that posted state-of-the-art results identifying images from the ImageNet dataset earlier this year. And Google-owned DeepMind just published a paper about a version of AlphaGo that appeared to have mastered the game solely by playing against itself.

More