Head over to our on-demand library to view sessions from VB Transform 2023. Register Here

Demis Hassabis founded DeepMind with the goal of unlocking answers to some of the world’s toughest questions by recreating intelligence itself. His ambition remains just that — an ambition — but Hassabis and colleagues inched closer to realizing it this week with the publication of papers in Nature addressing two formidable challenges in biomedicine.

The first paper originated from DeepMind’s neuroscience team, and it advances the notion that an AI research development might serve as a framework for understanding how the brain learns. The other paper focuses on DeepMind’s work with respect to protein folding — work which it detailed in December 2018. Both follow on the heels of DeepMind’s work in applying AI to the prediction of acute kidney injury, or AKI, and to challenging game environments such as Go, shogi, chess, dozens of Atari games, and Activision Blizzard’s StarCraft II.

“It’s exciting to see how our research in [machine learning] can point to a new understanding of the learning mechanisms at play in the brain,” said Hassabis. “[Separately, understanding] how proteins fold is a long-standing fundamental scientific question that could one day be key to unlocking new treatments for a whole range of diseases — from Alzheimer’s and Parkinson’s to cystic fibrosis and Huntington’s — where misfolded proteins are believed to play a role.”


In the paper on dopamine, teams hailing from DeepMind and Harvard investigated whether the brain represents possible future rewards not as a single average but as a probability distribution — a mathematical function that provides the probabilities of occurrence of different outcomes. They found evidence of “distributional reinforcement learning” in recordings taken from the ventral tegmental area — the midbrain structure that governs the release of dopamine to the limbic and cortical areas — in mice. The evidence indicates that reward predictions are represented by multiple future outcomes simultaneously and in parallel.


VB Transform 2023 On-Demand

Did you miss a session from VB Transform 2023? Register to access the on-demand library for all of our featured sessions.


Register Now

The idea that AI systems mimic human biology isn’t new. A study conducted by researchers at Radboud University in the Netherlands found that recurrent neural networks (RNNs) can predict how the human brain processes sensory information, particularly visual stimuli. But, for the most part, those discoveries have informed machine learning rather than neuroscientific research.

In 2017, DeepMind built an anatomical model of the human brain with an AI algorithm that mimicked the behavior of the prefrontal cortex and a “memory” network that played the role of the hippocampus, resulting in a system that significantly outperformed most machine learning model architectures. More recently, DeepMind turned its attention to rational machinery, producing synthetic neural networks capable of applying humanlike reasoning skills and logic to problem-solving. And in 2018, DeepMind researchers conducted an experiment suggesting that the prefrontal cortex doesn’t rely on synaptic weight changes to learn rule structures, as once thought, but instead uses abstract model-based information directly encoded in dopamine.

Reinforcement learning and and neurons

Reinforcement learning involves algorithms that learn behaviors using only rewards and punishments as teaching signals. The rewards serve to reinforce whatever behaviors led to their acquisition, more or less.

As the researchers point out, solving a problem requires understanding how current actions result in future rewards. That’s where temporal difference learning (TD) algorithms come in — they attempt to predict the immediate reward and their own reward prediction at the next moment in time. When this comes in bearing more information, the algorithms compare the new prediction against what it was expected to be. If the two are different, this “temporal difference” is used to adjust the old prediction toward the new prediction so that the chain becomes more accurate.

DeepMind dopamine

Above: When the future is uncertain, future reward can be represented as a probability
distribution. Some possible futures are good (teal), others are bad (red).

Image Credit: DeepMind

Reinforcement learning techniques have been refined over time to bolster the efficiency of training, and one of the recently developed techniques is called distributional reinforcement learning.

Distributional reinforcement learning

The amount of future reward that will result from a particular action is often not a known quantity, but instead involves some randomness. In such situations, a standard TD algorithm learns to predict the future reward that will be received on average, while a distributional reinforcement algorithm predicts the full spectrum of rewards.

It’s not unlike how dopamine neurons function in the brains of animals. Some neurons represent reward prediction errors, meaning they fire — i.e., send electrical signals — upon receiving more or less reward than expected. It’s called the reward prediction error theory — a reward prediction error is calculated, broadcast to the brain via dopamine signal, and used to drive learning.

DeepMind dopamine

Above: Each row of dots corresponds to a
dopamine cell, and each color corresponds to a different reward size.

Image Credit: DeepMind

Distributional reinforcement learning expands upon the canonical reward prediction error theory of dopamine. It was previously thought that reward predictions were represented only as a single quantity, supporting learning about the mean — or average — of stochastic (i.e., randomly determined) outcomes, but the work suggests that the brain in fact considers a multiplicity of predictions. “In the brain, reinforcement learning is driven by dopamine,” said DeepMind research scientist Zeb Kurth-Nelson. “What we found in our … paper is that each dopamine cell is specially tuned in a way that makes the population of cells exquisitely effective at rewiring those neural networks in a way that hadn’t been considered before.”

One of the simplest distributional reinforcement algorithms — distributional TD — assumes that reward-based learning is driven by a reward prediction error that signals the difference between received and anticipated rewards. As opposed to traditional reinforcement learning, however, where the prediction is represented as a single quantity — the average over all potential outcomes weighted by their probabilities — distributional reinforcement uses several predictions that vary in their degree of optimism about upcoming rewards.

A distributional TD algorithm learns this set of predictions by computing a prediction error describing the difference between consecutive predictions. A collection of predictors within apply different transformations to their respective reward prediction errors, such that some predictors selectively “amplify” or “overweight” their reward errors. When the reward prediction error is positive, some predictors learn a more optimistic reward corresponding to a higher part of the distribution, and when the reward prediction is negative, they learn more pessimistic predictions. This results in a diversity of pessimistic or optimistic value estimates that capture the full distribution of rewards.

DeepMind dopamine

Above: As a population, dopamine cells encode the shape of the learned reward distribution:
We can decode the distribution of rewards from their firing rates. The gray shaded area is the true distribution of rewards encountered in the task.

Image Credit: DeepMind

“For the last three decades, our best models of reinforcement learning in AI … have focused almost entirely on learning to predict the average future reward. But this doesn’t reflect real life,” said DeepMind research scientist Will Dabney. “[It is in fact possible] to predict the entire distribution of rewarding outcomes moment to moment.”

Distributional reinforcement learning is simple in its execution, but it’s highly effective when used with machine learning systems — it’s able to increase performance by a factor of two or more. That’s perhaps because learning about the distribution of rewards gives the system a more powerful signal for shaping its representation, making it more robust to changes in the environment or a given policy.

Distributional learning and dopamine

The study, then, sought to determine whether the brain uses a form of distributional TD. The team analyzed recordings of dopamine cells in 11 mice that were made while the mice performed a task for which they received stimuli. Five mice were trained on a variable-probability task, while six were trained on a variable-magnitude task. The first group was exposed to one of four randomized odors followed by a squirt of water, an air puff, or nothing. (The first odor signaled a 90% chance of reward, while the second, third, and fourth odors signaled a 50% chance of reward, 10% chance of reward, and 90% chance of reward, respectively.)

Dopamine cells change their firing rate to indicate a prediction error, meaning there should be zero prediction error when a reward is received that’s the exact size a cell predicted. With that in mind, the researchers determined the reversal point for each cell — the reward size for which a dopamine cell didn’t change its firing rate — and compared them to see if there were any differences.

They found that some cells predicted large amounts of reward, while others predicted little reward, far beyond the differences that might be expected from variability. They again saw diversity after measuring the degree to which the different cells exhibited amplifications of positive versus negative expectations. And they observed that the same cells that amplified their positive prediction errors had higher reversal point, indicating they were tuned to expect higher reward volumes.

DeepMind AlphaFold

Above: Complex 3D shapes emerge from a string of amino acids.

Image Credit: DeepMind

In a final experiment, the researchers attempted to decode the reward distribution from the firing rates of the dopamine cells. They report success: By performing inference, they managed to reconstruct a distribution that was a match to the actual distribution of rewards in the task in which the mice were engaged.

“As the work examines ideas that originated within AI, it’s tempting to focus on the flow of ideas from AI to neuroscience. However, we think the results are equally important for AI,” said DeepMind director of neuroscience research Matt Botvinick. “When we’re able to demonstrate that the brain employs algorithms like those we are using in our AI work, it bolsters our confidence that those algorithms will be useful in the long run — that they will scale well to complex real-world problems and interface well with other computational processes. There’s a kind of validation involved: If the brain is doing it, it’s probably a good idea.”

Protein folding

The second of the two papers details DeepMind’s work in the area of protein folding, which began over two years ago. As the researchers note, the ability to predict a protein’s shape is fundamental to understanding how it performs its function in the body. This has implications beyond health and could help with a number of social challenges, like managing pollutants and breaking down waste.

The recipe for proteins — large molecules consisting of amino acids that are the fundamental building block of tissues, muscles, hair, enzymes, antibodies, and other essential parts of living organisms — are encoded in DNA. It’s these genetic definitions that circumscribe their three-dimensional structure, which in turn determines their capabilities. Antibody proteins are shaped like a “Y,” for example, enabling them to latch onto viruses and bacteria, while collagen proteins are shaped like cords, which transmit tension between cartilage, bones, skin, and ligaments.

DeepMind AlphaFold

But protein folding, which occurs in milliseconds, is notoriously difficult to determine from a corresponding genetic sequence alone. DNA contains only information about chains of amino acid residues and not those chains’ final form. In fact, scientists estimate that because of the incalculable number of interactions between the amino acids, it would take longer than 13.8 billion years to figure out all the possible configurations of a typical protein before identifying the right structure (an observation known as Levinthal’s paradox).

That’s why instead of relying on conventional methods to predict protein structure, such as X-ray crystallography, nuclear magnetic resonance, and cryogenic electron microscopy, the DeepMind team pioneered a machine learning system dubbed AlphaFold. It predicts the distance between every pair of amino acids and the twisting angles between the connecting chemical bonds, which it combines into a score. A separate optimization step refines the score through gradient descent (a mathematical method of improving the structure to better match the predictions), using all distances in aggregate to estimate how close the proposed structure is to the right answer.

The most successful protein folding prediction approaches thus far have leveraged what’s known as fragment assembly, where a structure is created through a sampling process that minimizes a statistical potential derived from structures in the Protein Data Bank. (As its name implies, the Protein Data Bank is an open source repository of information about the 3D structures of proteins, nucleic acids, and other complex assemblies.) In fragment assembly, a structure hypothesis is modified repeatedly, typically by changing the shape of a short section while retaining changes that lower the potential, ultimately leading to low potential structures.

With AlphaFold, DeepMind’s research team focused on the problem of modeling target shapes from scratch without drawing on solved proteins as templates. Using the aforementioned scoring functions, they searched the protein landscape to find structures that matched their predictions and replaced pieces of the protein structure with new protein fragments. They also trained a generative system to invent new fragments, which they used along with gradient descent optimization to improve the score of the structure.

The models trained on structures extracted from the Protein Data Bank across 31,247 domains, which were split into train and test sets comprising 29,427 and 1,820 proteins, respectively. (The results in the paper reflect a test subset containing 377 domains.) Training was split across eight graphics cards, and it took about five days to complete 600,000 steps.

The fully trained networks predicted the distance of every pair of amino acids from the genetic sequences it took as its input. A sequence with 900 amino acids translated to about 400,000 predictions.

DeepMind AlphaFold

Above: The top figure features the distance matrices for three proteins, where the brightness of each pixel represents the distance between the amino acids in the sequence comprising the protein. The bottom row shows the average of AlphaFold’s predicted distance

Image Credit: DeepMind

AlphaFold participated in the December 2018 Critical Assessment of protein Structure Prediction competition (CASP13), a competition that has been held every every two years since 1994 and offers groups an opportunity to test and validate their protein folding methods. Predictions are assessed on protein structures that have been solved experimentally but whose structures have not been published, demonstrating whether methods generalize to new proteins.

AlphaFold won the 2018 CASP13 by predicting the most accurate structure for 24 out of 43 proteins. DeepMind contributed five submissions chosen from eight structures produced by three different variations of the system, all of which used potentials based on the AI model distance predictions, and some of which tapped structures generated by the gradient descent system. DeepMind reports that AlphaFold performed particularly well in the free modeling category, creating models where no similar template exists. In point of fact, it achieved a summed z-score  — a measure of how well systems perform against the average — of 52.8 in this category, ahead of 36.6 for the next-best model.

“The 3D structure of a protein is probably the single most useful piece of information scientists can obtain to help understand what the protein does and how it works in cells,” wrote head of the UCL bioinformatics group David Jones, who advised the DeepMind team on parts of the project. “Experimental techniques to determine protein structures are time-consuming and expensive, so there’s a huge demand for better computer algorithms to calculate the structures of proteins directly from the gene sequences which encode them, and DeepMind’s work on applying AI to this long-standing problem in molecular biology is a definite advance. One eventual goal will be to determine accurate structures for every human protein, which could ultimately lead to new discoveries in molecular medicine.”

VentureBeat's mission is to be a digital town square for technical decision-makers to gain knowledge about transformative enterprise technology and transact. Discover our Briefings.