Deep reinforcement learning (DRL) is an exciting area of AI research, with potential applicability to a variety of problem areas. Some see DRL as a path to artificial general intelligence, or AGI, because of how it mirrors human learning by exploring and receiving feedback from environments. Recent successes of DRL agents besting human video game players, the well-publicized defeat of a Go grandmaster at the hands of DeepMind’s AlphaGo, and demonstrations of bipedal agents learning to walk in simulation have all contributed to the general sense of enthusiasm about the field.

Unlike supervised machine learning, which trains models based on known-correct answers, in reinforcement learning, researchers train the model by having an agent interact with an environment. When the agent’s actions produce desired results, it gets positive feedback. For example, the agent gets a reward for scoring a point or winning a game. Put simply, researchers reinforce the agent’s good behaviors.


One of the key challenges in applying DRL to non-trivial problems is in constructing a reward function that encourages desired behaviors without undesirable side effects. When you get this wrong, all kinds of bad things can happen, including cheating behaviors. (Think of rewarding a robot maid on some visual measure of room cleanliness, just to teach the bot to sweep dirt under the furniture.)

It might be worth noting here that while deep reinforcement learning — “deep” referring to the fact that the underlying model is a deep neural network — is still a relatively new field, reinforcement learning has been around since the 1970s or earlier, depending on how you count. As Andrej Karpathy points out in his 2016 blog post, pivotal DRL research such as the AlphaGo paper and the Atari Deep Q-Learning paper are based on algorithms that have been around for a while, but with deep learning swapped in instead of other ways to approximate functions. Their use of deep learning is of course enabled by the explosion in inexpensive compute power we’ve seen over the past 20+ years.

The promise of DRL, along with Google’s 2014 acquisition of DeepMind for $500 million, has led to a number of startups hoping to capitalize on this technology. I’ve interviewed Bonsai founder Mark Hammond for the This Week in Machine Learning & AI podcast (disclosure: Bonsai is a client of mine). That company offers a development platform for applying deep reinforcement learning to a variety of industrial use cases. I spoke with University of California at Berkeley’s Pieter Abbeel on the topic as well. He’s since founded Embodied Intelligence, a still-stealthy startup looking to apply VR and DRL to robotics.

Osaro, backed by Jerry Yang, Peter Thiel, Sean Parker, and other boldface-named investors, is also looking to apply DRL in the industrial space. Meanwhile, is seeking to best traditional hedge funds by applying it to algorithmic trading, and DeepVu is addressing the challenge of managing complex enterprise supply chains.

As a result of increased interest in DRL, we’ve also seen the creation of new open source toolkits and environments for training DRL agents. Most of these frameworks are essentially special-purpose simulation tools or interfaces thereto. Here are some of the ones I’m tracking.

OpenAI Gym

OpenAI Gym is a popular toolkit for developing and comparing reinforcement learning models. Its simulator interface supports a variety of environments, including classic Atari games as well as robotics and physics simulators like MuJoCo and the DARPA-funded Gazebo. Like other DRL toolkits, it offers APIs to feed observations and rewards back to agents.

DeepMind Lab

DeepMind Lab is a 3D learning environment based on the Quake III first-person shooter video game, offering up navigation and puzzle-solving tasks for learning agents. DeepMind recently added DMLab-30, a collection of new levels, and introduced its new Impala distributed agent training architecture.


Another DeepMind toolkit, open-sourced earlier this year, Psychlab extends DeepMind Lab to support cognitive psychology experiments like searching an array of items for a specific target or detecting changes in an array of items. Researchers can then compare the performance of human and AI agents on these tasks.


A collaboration between UC Berkeley and Facebook AI researchers, House3D offers over 45,000 simulated indoor scenes with realistic room and furniture layouts. The primary task covered in the paper that introduced House3D was “concept-driven navigation,” such as training an agent to navigate to a room in a house given only a high-level descriptor like “dining room.”

Unity Machine Learning Agents

Under the stewardship of vice president of AI and ML Danny Lange, game engine developer Unity is making an effort to incorporate cutting-edge AI technology into its platform. Unity Machine Learning Agents, released in September 2017, is an open source Unity plugin that enables games and simulations running on the platform to serve as environments for training intelligent agents.


While the other tools listed here focus on DRL training environments, Ray is more about the infrastructure of DRL at scale. Developed by Ion Stoica and his team at the Berkeley RISELab, Ray is a framework for efficiently running Python code on clusters and large multi-core machines, specifically targeted at providing a low-latency distributed execution framework for reinforcement learning.

The advent of all these tools and platforms will make DRL more accessible to developers and researchers. They’ll need all the help they can get, though, because deep reinforcement learning can be challenging to put into practice. A recent critique by Google engineer Alex Irpan, in his provocatively titled article “Deep Reinforcement Learning Doesn’t Work Yet,” explains why. Irpan cited the large amount of data required by DRL, the fact that most approaches to DRL don’t take advantage of prior knowledge about the systems and environments involved, and the aforementioned difficulty in coming up with an effective reward function, among other issues.

I expect deep reinforcement learning to continue to be a hot topic in the AI field, from both the research and applied perspectives, for some time. It has shown great promise at handling complex, multifaceted, and sequential decision-making problems, which makes it useful not just for industrial systems and gaming, but in fields as varied as marketing, advertising, finance, education, and even data science itself.

This story originally appeared in the This Week in Machine Learning & AI newsletter. Copyright 2018.

Sam Charrington is host of the podcast This Week in Machine Learning & AI (TWiML & AI) and founder of CloudPulse Strategies.