Reinforcement learning (RL) — an artificial intelligence (AI) training technique that uses rewards or punishments to drive agents toward goals — has a problem: It doesn’t result in highly generalizable models. Trained agents struggle to transfer their experience to new environments. It’s a well-understood limitation of RL, but one that hasn’t prevented data scientists from benchmarking their systems within the environments on which they were trained. That makes overfitting — a modeling error that occurs when a function is too closely fit to a dataset — challenging to quantify.
Nonprofit AI research company OpenAI is taking a stab at the problem with an AI training environment — CoinRun — that provides a metric for an agent’s ability to transfer its experience to unfamiliar scenarios. It’s basically like a classic platformer, complete with enemies, objectives, and stages of varying difficulty,
It follows on the heels of the launch of OpenAI’s Spinning Up, a program designed to teach anyone deep reinforcement learning.
“CoinRun strikes a desirable balance in complexity: the environment is much simpler than traditional platformer games like Sonic the Hedgehog, but it still poses a worthy generalization challenge for state of the art algorithms,” OpenAI wrote in a blog post. “The levels of CoinRun are procedurally generated, providing agents access to a large and easily quantifiable supply of training data.”
An exclusive invite-only evening of insights and networking, designed for senior enterprise executives overseeing data stacks and strategies.
As OpenAI explains, prior work in reinforcement learning environments has focused on procedurally generated mazes, community projects like the General Video Game AI framework, and games like Sonic the Hedgehog, with generalization measured by training and testing agents on different sets of levels. CoinRun, by contrast, offers agents a single reward at the end of each level.
AI agents have to contend with stationary and moving obstacles, collision with which results in immediate death. It’s game over when the aforementioned coin is collected, or after 1,000 time steps.
As if that weren’t enough, OpenAI developed two additional environments to investigate overfitting: CoinRun-Platforms and RandomMazes. The first contains several coins randomly scattered across platforms, forcing agents to actively explore levels and occasionally do some backtracking. RandomMazes, meanwhile, is a simple maze navigation task.
To validate CoinRun, CoinRun-Platforms, and RandomMazes, OpenAI trained 9 agents, each with a different number of training levels. The first 8 trained on sets ranging from 100 to 16,000 levels, and the final agent trained on an unrestricted set of levels — roughly 2 million in practice — so that it never saw the same one twice.
The agents experienced overfitting at 4,000 training levels, and even at 16,000 training levels; the best-performing agents turned out to be the ones trained with the unrestricted set of levels. And in CoinRun-Platforms and RandomMazes, the agents strongly overfit in all cases.
The results provide valuable insight into the challenges underlying generalization in reinforcement learning, OpenAI said.
“Using the procedurally generated CoinRun environment, we can precisely quantify such overﬁtting,” the company wrote. “With this metric, we can better evaluate key architectural and algorithmic decisions. We believe that the lessons learned from this environment will apply in more complex settings, and we hope to use this benchmark, and others like it, to iterate towards more generalizable agents.”
VentureBeat's mission is to be a digital town square for technical decision-makers to gain knowledge about transformative enterprise technology and transact. Discover our Briefings.