How AI trained to beat Atari games could impact robotics and drug design

In 2018, Uber AI Labs introduced Go-Explore, a family of algorithms that beat the Atari game Montezuma's Revenge, a commonly accepted reinforcement learning challenge. Last year, Go-Explore was used to beat text-based games.

Now researchers from OpenAI and Uber AI Labs say Go-Explore has solved all previously unsolved games in the Atari 2600 benchmark from the Arcade Learning Environment, a collection of more than 50 games, including Pitfall and Pong. Go-Explore also quadruples the state-of-the-art score performance on Montezuma's Revenge.

Training agents to navigate complex environments has long been considered a challenge for reinforcement learning. Success in these areas has accounted for some major machine learning milestones, like DeepMind's AlphaGo or OpenAI's Dota 2 beating human champions.

Researchers envision recent Go-Explore advances being applied to language models but also used for drug design and robotics trained to navigate the world safely. In simulations, a robotic arm was able to successfully pick up an object and put it on one of four shelves, two of which are behind doors with latches. The ability to complete this transfer, they say, proves the policy approach is not simply leveraging the ability to restore a previously held state in a reinforcement learning environment, but a "function of its overall design."

"The insights presented in this work extend broadly; the simple decomposition of remembering previously found states, returning to them, and then exploring from them appears to be especially powerful, suggesting it may be a fundamental feature of learning in general. Harnessing these insights, either within or outside of the context of Go-Explore, may be essential to improve our ability to create generally intelligent agents," reads a paper on the research published last week in Nature.

Researchers theorize that part of the problem is that agents in reinforcement learning environments forget how to get to places they have previously been (known as detachment) and generally fail to return to a state before exploring from it (known as derailment).

"To avoid detachment, Go-Explore builds an 'archive' of the different states it has visited in the environment, thus ensuring that states cannot be forgotten. Starting from an archive containing only the initial state, it builds this archive iteratively," the paper reads. "By first returning before exploring, Go-Explore avoids derailment by minimizing exploration when returning (thus minimizing failure to return) after which it can focus purely on exploration."

Last year Jeff Clune, who cofounded Uber AI Labs in 2017 before moving to OpenAI last year, told VentureBeat that catastrophic forgetting is the Achilles' heel of deep learning. Solving this problem, he said at the time, could offer humans a faster path to artificial general intelligence (AGI).

In other recent news, OpenAI shared more details about multimodal model CLIPS this week, and the AI Index, compiled in part by former OpenAI policy director Jack Clark, was released on Wednesday. The annual index chronicles AI performance progress, as well as trends in startup investment, education, diversity, and policy.

More