For decades, games have served as benchmarks for artificial intelligence (AI).
In 1996, IBM famously set loose Deep Blue on chess, and it became the first program to defeat a reigning world champion (Garry Kasparov) under regular time controls. But things really kicked into gear in 2013 — the year Google subsidiary DeepMind demonstrated an AI system that could play Pong, Breakout, Space Invaders, Seaquest, Beamrider, Enduro, and Q*bert at superhuman levels. In March 2016, DeepMind’s AlphaGo won a three-game match of Go against Lee Sedol, one of the highest-ranked players in the world. And only a year later, an improved version of the system (AlphaZero) handily defeated champions at chess, a Japanese variant of chess called shogi, and Go.
The advancements aren’t merely advancing game design, according to folks like DeepMind cofounder Demis Hassabis. Rather, they’re informing the development of systems that might one day diagnose illnesses, predict complicated protein structures, and segment CT scans. “AlphaZero is a stepping stone for us all the way to general AI,” Hassabis told VentureBeat in a recent interview. “The reason we test ourselves and all these games is … that [they’re] a very convenient proving ground for us to develop our algorithms. … Ultimately, [we’re developing algorithms that can be] translate[ed] into the real world to work on really challenging problems … and help experts in those areas.”
With that in mind, and with 2019 fast approaching, we’ve taken a look back at some of 2018’s AI in games highlights. Here they are for your reading pleasure, in no particular order.
In Montezuma’s Revenge, a 1984 platformer from publisher Parker Brothers for the Atari 2600, Apple II, Commodore 64, and a host of other platforms, players assume the role of intrepid explorer Panama Joe as he spelunks across Aztec emperor Montezuma II’s labyrinthine temple. The stages, of which there are 99 across three levels, are filled with obstacles like laser gates, conveyor belts, ropes, ladders, disappearing floors, and fire pits — not to mention skulls, snakes, spiders, torches, and swords. The goal is to reach the Treasure Chamber and rack up points along the way by finding jewels, killing enemies, and revealing keys that open doors to hidden stages.
Montezuma’s Revenge has a reputation for being difficult (the first level alone consists of 24 rooms), but AI systems have long had a particularly tough go of it. DeepMind’s groundbreaking Deep-Q learning network in 2015 — one which surpassed human experts on Breakout, Enduro, and Pong — scored a 0 percent of the average human score of 4,700 in Montezuma’s Revenge.
Researchers peg the blame on the game’s “sparse rewards.” Completing a stage requires learning complex tasks with infrequent feedback. As a result, even the best-trained AI agents tend to maximize rewards in the short term rather than work toward a big-picture goal — for example, hitting an enemy repeatedly instead of climbing a rope close to the exit. But some AI systems this year managed to avoid that trap.
In a paper published on the preprint server Arxiv.org in May (“Playing hard exploration games by watching YouTube“), DeepMind described a machine learning model that could, in effect, learn to master Montezuma’s Revenge from YouTube videos. After “watching” clips of expert players and by using a method that embedded game state observations into a common embedding space, it completed the first level with a score of 41,000.
In a second paper published online the same month (“Observe and Look Further: Achieving Consistent Performance on Atari“), DeepMind scientists proposed improvements to the aforementioned Deep-Q model that increased its stability and capability. Most importantly, they enabled the algorithm to account for reward signals of “varying densities and scales,” extending its agents’ effective planning horizon. Additionally, they used human demonstrations to augment agents’ exploration process.
In the end, it achieved a score of 38,000 on the game’s first level.
In June, OpenAI — a nonprofit, San Francisco-based AI research company backed by Elon Musk, Reid Hoffman, and Peter Thiel — shared in a blog post a method for training a Montezuma’s Revenge-beating AI system. Novelly, it tapped human demonstrations to “restart” agents: AI player characters began near the end of the game and moved backward through human players’ trajectories on every restart. This exposed them to parts of the game which humans had already cleared, and helped them to achieve a score of 74,500.
In August, building on its previous work, OpenAI described in a paper (“Large-Scale Study of Curiosity-Driven Learning“) a model that could best most human players. The top-performing version found 22 of the 24 rooms in the first level, and occasionally discovered all 24.
What set it apart was a reinforcement learning technique called Random Network Distillation (RND), which used a bonus reward that incentivized agents to explore areas of the game map they normally wouldn’t have. RND also addressed another common issue in reinforcement learning schemes — the so-called noisy TV problem — in which an AI agent becomes stuck looking for patterns in random data.
“Curiosity drives the agent to discover new rooms and find ways of increasing the in-game score, and this extrinsic reward drives it to revisit those rooms later in the training,” OpenAI explained in a blog post. “Curiosity gives us an easier way to teach agents to interact with any environment, rather than via an extensively engineered task-specific reward function that we hope corresponds to solving a task.”
On average, OpenAI’s agents scored 10,000 over nine runs with a best mean return of 14,500. A longer-running test yielded a run that hit 17,500.
OpenAI and DeepMind aren’t the only ones that managed to craft skilled Montezuma’s Revenge-playing AI this year. In a paper and accompanying blog post published in late November, researchers at San Francisco ride-sharing company Uber unveiled Go-Explore, a family of so-called quality diversity AI models capable of posting scores of over 2,000,000 and average scores over 400,000. In testing, the models were able to “reliably” solve the entire game up to level 159 and reach an average of 37 rooms.
To reach those sky-high numbers, the researchers implemented an innovative training method consisting of two parts: exploration and robustification. In the exploration phase, Go-Explore built an archive of different game states — cells — and the various trajectories, or scores, that lead to them. It chose a cell, returned to that cell, explored the cell, and, for all cells it visited, swapped in a given new trajectory if it was better (i.e., the score was higher).
This “exploration” stage conferred several advantages. Thanks to the aforementioned archive, Go-Explore was able to remember and return to “promising” areas for exploration. By first returning to cells (by loading the game state) before exploring from them, it avoided over-exploring easily reached places. And because Go-Explore was able to visit all reachable states, it was less susceptible to deceptive reward functions.
The robustification step, meanwhile, acted as a shield against noise. If Go-Explore’s solutions were not robust to noise, it robustified them into a deep neural network with an imitation learning algorithm.
“Go-Explore’s max score is substantially higher than the human world record of 1,219,200, achieving even the strictest definition of ‘superhuman performance,'” the team said. “This shatters the state of the art on Montezuma’s Revenge both for traditional RL algorithms and imitation learning algorithms that were given the solution in the form of a human demonstration.”