Reinforcement learning — an artificial intelligence (AI) technique that uses rewards (or punishments) to drive agents in the direction of specific goals — trained the systems that defeated Alpha Go world champions and mastered Valve’s Dota 2. And it’s a core part of Google subsidiary DeepMind’s deep Q-network (DQN), which can distribute learning across multiple workers in the pursuit of, for example, achieving “superhuman” performance in Atari 2600 games. The trouble is, reinforcement learning frameworks take time to master a goal, tend to be inflexible, and aren’t always stable.

That’s why Google is proposing an alternative: an open source reinforcement framework based on TensorFlow, its machine learning library. It’s available from Github starting today.

“Inspired by one of the main components in reward-motivated behavior in the brain and reflecting the strong historical connection between neuroscience and reinforcement learning research, this platform aims to enable the kind of speculative research that can drive radical discoveries,” Pablo Samuel Castro and Marc G. Bellemare, researchers on the Google Brain Team, wrote in a blog post. “This release also includes a set of colabs that clarify how to use our framework.”

They and the Google Brain team developed the reinforcement framework with three tenets in mind: flexibility, stability, and reproducibility.

Google reinforcement

Above: A visualization of AI agents trained using reinforcement learning.

Image Credit: Google

To that end, it includes a compact set of well-documented code (15 Python files) focused on the Arcade Learning Environment — a platform for evaluating AI technology with video games — and four distinct machine learning models: the aforementioned DQN; C51; a simplified variant of the Rainbow agent; and the Implicit Quantile Network. In the interest of reproducibility, the code is provided with full test coverage and training data (in JSON and Python pickle formats) across the 60 games supported by the Arcade Learning Environment and follows best practices on standardizing the results for empirical evaluations.

Alongside the release of the reinforcement framework, Google is launching a website that allows developers to quickly visualize training runs for multiple agents. It’s also making available trained models, raw statistics logs, and TensorFlow event files for plotting with TensorBoard, the Mountain View company’s suite of visualization tools for TensorFlow programs.

“Our hope is that our framework’s flexibility and ease-of-use will empower researchers to try out new ideas, both incremental and radical,” Bellemare and Castro wrote. “We are already actively using it for our research and finding it is giving us the flexibility to iterate quickly over many ideas. We’re excited to see what the larger community can make of it.”