DeepMind aims to marry deep learning and classic algorithms

Will deep learning really live up to its promise? We don't actually know. But if it's going to, it will have to assimilate how classical computer science algorithms work. This is what DeepMind is working on, and its success is important to the eventual uptake of neural networks in wider commercial applications.

Founded in 2010 with the goal of creating AGI -- artificial general intelligence, a general purpose AI that truly mimics human intelligence -- DeepMind is on the forefront of AI research. The company is also backed by industry heavyweights like Elon Musk and Peter Thiel.

Acquired by Google in 2014, DeepMind has made headlines for projects such as AlphaGo, a program that beat the world champion at the game of Go in a five-game match, and AlphaFold, which found a solution to a 50-year-old grand challenge in biology.

Now DeepMind has set its sights on another grand challenge: bridging the worlds of deep learning and classical computer science to enable deep learning to do everything. If successful, this approach could revolutionize AI and software as we know them.

Petar Veličković is a senior research scientist at DeepMind. His entry into computer science came through algorithmic reasoning and algorithmic thinking using classical algorithms. Since he started doing deep learning research, he has wanted to reconcile deep learning with the classical algorithms that initially got him excited about computer science.

Meanwhile, Charles Blundell is a research lead at DeepMind who is interested in getting neural networks to make much better use of the huge quantities of data they're exposed to. Examples include getting a network to tell us what it doesn't know, to learn much more quickly, or to exceed expectations.

When Veličković met Blundell at DeepMind, something new was born: a line of research that goes by the name of Neural Algorithmic Reasoning (NAR), after a position paper the duo recently published.

NAR traces the roots of the fields it touches upon and branches out to collaborations with other researchers. And unlike much pie-in-the-sky research, NAR has some early results and applications to show for itself.

Algorithms and deep learning: the best of both worlds

Veličković was in many ways the person who kickstarted the algorithmic reasoning direction in DeepMind. With his background in both classical algorithms and deep learning, he realized that there is a strong complementarity between the two of them. What one of these methods tends to do really well, the other one doesn't do that well, and vice versa.

"Usually when you see these kinds of patterns, it's a good indicator that if you can do anything to bring them a little bit closer together, then you could end up with an awesome way to fuse the best of both worlds, and make some really strong advances," Veličković said.

When Veličković joined DeepMind, Blundell said, their early conversations were a lot of fun because they have very similar backgrounds. They both share a background in theoretical computer science. Today, they both work a lot with machine learning, in which a fundamental question for a long time has been how to generalize -- how do you work beyond the data examples you've seen?

Algorithms are a really good example of something we all use every day, Blundell noted. In fact, he added, there aren't many algorithms out there. If you look at standard computer science textbooks, there's maybe 50 or 60 algorithms that you learn as an undergraduate. And everything people use to connect over the internet, for example, is using just a subset of those.

"There's this very nice basis for very rich computation that we already know about, but it's completely different from the things we're learning. So when Petar and I started talking about this, we saw clearly there's a nice fusion that we can make here between these two fields that has actually been unexplored so far," Blundell said.

The key thesis of NAR research is that algorithms possess fundamentally different qualities to deep learning methods. And this suggests that if deep learning methods were better able to mimic algorithms, then generalization of the sort seen with algorithms would become possible with deep learning.

To approach the topic for this article, we asked Blundell and Veličković to lay out the defining properties of classical computer science algorithms compared to deep learning models. Figuring out the ways in which algorithms and deep learning models are different is a good start if the goal is to reconcile them.

Deep learning can't generalize

For starters, Blundell said, algorithms in most cases don't change. Algorithms are comprised of a fixed set of rules that are executed on some input, and usually good algorithms have well-known properties. For any kind of input the algorithm gets, it gives a sensible output, in a reasonable amount of time. You can usually change the size of the input and the algorithm keeps working.

The other thing you can do with algorithms is you can plug them together. The reason algorithms can be strung together is because of this guarantee they have: Given some kind of input, they only produce a certain kind of output. And that means that we can connect algorithms, feeding their output into other algorithms' input and building a whole stack.

People have been looking at running algorithms in deep learning for a while, and it's always been quite difficult, Blundell said. As trying out simple tasks is a good way to debug things, Blundell referred to a trivial example: the input copy task. An algorithm whose task is to copy, where its output is just a copy of its input.

It turns out that this is harder than expected for deep learning. You can learn to do this up to a certain length, but if you increase the length of the input past that point, things start breaking down. If you train a network on the numbers 1-10 and test it on the numbers 1-1,000, many networks will not generalize.

Blundell explained, "They won't have learned the core idea, which is you just need to copy the input to the output. And as you make the process more complicated, as you can imagine, it gets worse. So if you think about sorting through various graph algorithms, actually the generalization is far worse if you just train a network to simulate an algorithm in a very naive fashion."

Fortunately, it's not all bad news.

"[T]here's something very nice about algorithms, which is that they're basically simulations. You can generate a lot of data, and that makes them very amenable to being learned by deep neural networks," he said. "But it requires us to think from the deep learning side. What changes do we need to make there so that these algorithms can be well represented and actually learned in a robust fashion?"

Of course, answering that question is far from simple.

"When using deep learning, usually there isn't a very strong guarantee on what the output is going to be. So you might say that the output is a number between zero and one, and you can guarantee that, but you couldn't guarantee something more structural," Blundell explained. "For example, you can't guarantee that if you show a neural network a picture of a cat and then you take a different picture of a cat, it will definitely be classified as a cat."

With algorithms, you could develop guarantees that this wouldn't happen. This is partly because the kind of problems algorithms are applied to are more amenable to these kinds of guarantees. So if a problem is amenable to these guarantees, then maybe we can bring across into the deep neural networks classical algorithmic tasks that allow these kinds of guarantees for the neural networks.

Those guarantees usually concern generalizations: the size of the inputs, the kinds of inputs you have, and their outcomes that generalize over types. For example, if you have a sorting algorithm, you can sort a list of numbers, but you could also sort anything you can define an ordering for, such as letters and words. However, that's not the kind of thing we see at the moment with deep neural networks.

Algorithms can lead to suboptimal solutions

Another difference, which Veličković noted, is that algorithmic computation can usually be expressed as pseudocode that explains how you go from your inputs to your outputs. This makes algorithms trivially interpretable. And because they operate over these abstractified inputs that conform to some preconditions and post-conditions, it's much easier to reason theoretically about them.

That also makes it much easier to find connections between different problems that you might not see otherwise, Veličković added. He cited the example of MaxFlow and MinCut as two problems that are seemingly quite different, but where the solution of one is necessarily the solution to the other. That's not obvious unless you study it from a very abstract lens.

"There's a lot of benefits to this kind of elegance and constraints, but it's also the potential shortcoming of algorithms," Veličković said. "That's because if you want to make your inputs conform to these stringent preconditions, what this means is that if data that comes from the real world is even a tiny bit perturbed and doesn't conform to the preconditions, I'm going to lose a lot of information before I can massage it into the algorithm."

He said that obviously makes the classical algorithm method suboptimal, because even if the algorithm gives you a perfect solution, it might give you a perfect solution in an environment that doesn't make sense. Therefore, the solutions are not going to be something you can use. On the other hand, he explained, deep learning is designed to rapidly ingest lots of raw data at scale and pick up interesting rules in the raw data, without any real strong constraints.

"This makes it remarkably powerful in noisy scenarios: You can perturb your inputs and your neural network will still be reasonably applicable. For classical algorithms, that may not be the case. And that's also another reason why we might want to find this awesome middle ground where we might be able to guarantee something about our data, but not require that data to be constrained to, say, tiny scalars when the complexity of the real world might be much larger," Veličković said.

Another point to consider is where algorithms come from. Usually what happens is you find very clever theoretical scientists, you explain your problem, and they think really hard about it, Blundell said. Then the experts go away and map the problem onto a more abstract version that drives an algorithm. The experts then present their algorithm for this class of problems, which they promise will execute in a specified amount of time and provide the right answer. However, because the mapping from the real-world problem to the abstract space on which the algorithm is derived isn't always exact, Blundell said, it requires a bit of an inductive leap.

With machine learning, it's the opposite, as ML just looks at the data. It doesn't really map onto some abstract space, but it does solve the problem based on what you tell it.

What Blundell and Veličković are trying to do is get somewhere in between those two extremes, where you have something that's a bit more structured but still fits the data, and doesn't necessarily require a human in the loop. That way you don't need to think so hard as a computer scientist. This approach is valuable because often real-world problems are not exactly mapped onto the problems that we have algorithms for -- and even for the things we do have algorithms for, we have to abstract problems. Another challenge is how to come up with new algorithms that significantly outperform existing algorithms that have the same sort of guarantees.

Why deep learning? Data representation

When humans sit down to write a program, it's very easy to get something that's really slow -- for example, that has exponential execution time, Blundell noted. Neural networks are the opposite. As he put it, they're extremely lazy, which is a very desirable property for coming up with new algorithms.

"There are people who have looked at networks that can adapt their demands and computation time. In deep learning, how one designs the network architecture has a huge impact on how well it works. There's a strong connection between how much processing you do and how much computation time is spent and what kind of architecture you come up with -- they're intimately linked," Blundell said.

Veličković noted that one thing people sometimes do when solving natural problems with algorithms is try to push them into a framework they've come up with that is nice and abstract. As a result, they may make the problem more complex than it needs to be.

"The traveling [salesperson], for example, is an NP complete problem, and we don't know of any polynomial time algorithm for it. However, there exists a prediction that's 100% correct for the traveling [salesperson], for all the towns in Sweden, all the towns in Germany, all the towns in the USA. And that's because geographically occurring data actually has nicer properties than any possible graph you could feed into traveling [salesperson]," Veličković said.

Before delving into NAR specifics, we felt a naive question was in order: Why deep learning? Why go for a generalization framework specifically applied to deep learning algorithms and not just any machine learning algorithm?

The DeepMind duo wants to design solutions that operate over the true raw complexity of the real world. So far, the best solution for processing large amounts of naturally occurring data at scale is deep neural networks, Veličković emphasized.

Blundell noted that neural networks have much richer representations of the data than classical algorithms do. "Even inside a large model class that's very rich and complicated, we find that we need to push the boundaries even further than that to be able to execute algorithms reliably. It's a sort of empirical science that we're looking at. And I just don't think that as you get richer and richer decision trees, they can start to do some of this process," he said.

Blundell then elaborated on the limits of decision trees.

"We know that decision trees are basically a trick: If this, then that. What's missing from that is recursion, or iteration, the ability to loop over things multiple times. In neural networks, for a long time people have understood that there's a relationship between iteration, recursion, and the current neural networks. In graph neural networks, the same sort of processing arises again; the message passing you see there is again something very natural," he said.

Ultimately, Blundell is excited about the potential to go further.

"If you think about object-oriented programming, where you send messages between classes of objects, you can see it's exactly analogous, and you can build very complicated interaction diagrams and those can then be mapped into graph neural networks. So it's from the internal structure that you get a richness that seems might be powerful enough to learn algorithms you wouldn't necessarily get with more traditional machine learning methods," Blundell explained.

Algorithms and deep learning: the best of both worlds

Deep learning can't generalize

Algorithms can lead to suboptimal solutions

Why deep learning? Data representation

More