OpenAI Five — the AI-imbued bot designed by San Francisco, California-based research organization OpenAI that defeated a professional esports team at Valve’s Dota 2 earlier this month — became publicly playable last week via Arena, a mode that allowed players to challenge its game-playing abilities for themselves. So after a whopping 42,729 cooperative and competitive matches (688 of which were played against as many as 1,583 players simultaneously) between April 18 and April 21, how’d OpenAI Five perform? Impressively, claims OpenAI: It managed to win 4,075 games for a victory rate of 99.4%, which compared pretty favorably to the 24% win rate of human teams with at least 1 win.
In fact, it took 459 games for the first human player (out of the 39,356 total across 225 countries who played against OpenAI Five) to eke out a win — six hours and nine minutes into Arena’s launch.
“Arena was a massive-scale experiment to test whether OpenAI is exploitable, given the entire Internet trying to break it. The Dota community teamed up, cataloging every weakness. While Five has more to learn, no one was able to find the kinds of easy-to-execute exploits that human-programmed game bots suffer from,” said OpenAI CTO Greg Brockman. “This lets us have more confidence that future AI systems we deploy in the wild will be able to be made robust and hard to subvert. And perhaps even more importantly — we learned the value of having a community of people excited to pore over a system we’ve built in order to truly understand the limits and impacts of what we’ve built.”
OpenAI stood out in other ways. Only 4 human teams were victorious in 2 games in a row against it and a measly 3 teams won 3 (one team won 10), and just 115 players with an average solo matchmaking rating of 6,500 (63% Immortal players with a rank of 500) beat the bot in competitive mode. Collectively, OpenAI Five played 10.7 years’ worth of game time facing human opponents — roughly the amount of data it generates every 12 minutes of training by self-play. (Players spent an average of two and a half hours playing against it, and one person spent nearly 30 hours.) And it attracted quite an audience: The total number of Twitch users who viewed OpenAI Five Arena games totaled 486,000, and they watched streams for an average of 7 minutes.
OpenAI says it plans to use the results to investigate to what extent OpenAI Five is making “macro-level decisions” versus relying on snap judgment and opportunistic plays.
“Arena was an immense source of anticipation and fear in the minds of many of us on the team. On one end of the spectrum we are absolutely sitting at the edge of our seat awaiting some incredible strategy to unfold, a missed blind spot pop up, or witness meaningful cooperation between AI and humans,” said OpenAI Five team researcher Jonathan Raiman. “At the other end, our focus on the team had been to reach the highest levels of play at Dota 2 and hold our own against professional players, so it was a massive shift from our competitive mindset to the this world of investigation, external scrutiny, and robustness to Internet scale. I’m deeply thrilled and humbled by the public reaction. It’s a strong validation of what can be done with scaled up reinforcement learning, willingness to validate ideas in the real world, and a glimpse into what large scale AI deployments will be in the future.”
How OpenAI tackled Dota 2
Valve’s Dota 2 — a follow-up to Defense of the Ancients (DotA), a community-created mod for Blizzard’s Warcraft III: Reign of Chaos — is what’s known as a multiplayer online battle arena, or MOBA. Two groups of five players, each of which are given a base to occupy and defend, attempt to destroy a structure — the Ancient — at the opposing team’s base. Player characters (heroes) have a distinct set of abilities, and collect experience points and items that unlock new attacks and defensive moves.
It’s more complex than it sounds. The average match contains 80,000 individual frames, during which each character can perform dozens of 170,000 possible actions. Heroes on the board finish an average of 10,000 moves each frame, contributing to the game’s more than 20,000 total dimensions. And each of those heroes — of which there are over 100 — can pick up or purchase hundreds of in-game items.
OpenAI Five isn’t able to handle the full game yet — it can only play 18 out of the 115 different heroes, and it can’t use abilities like summons and illusions. And in a somewhat controversial design decision, OpenAI’s engineers opted not to have it read pixels from the game to retrieve information (like human players do). It uses Dota 2’s bot API instead, obviating the need for it to search the map to check where its team might be, check if a spell is ready, or estimate an enemy’s health or distance.
That said, it’s able to draft a team entirely on its own that takes into account the opposing side’s choices.
OpenAI has been chipping away at the Dota 2 dilemma for a while now, and demoed an early iteration of its MOBA-playing bot — one that beat one of the world’s top players, Danil “Dendi” Ishutin, in a 1-on-1 match — in August 2017. It kicked things up a notch in June with OpenAI Five, an improved system capable of playing five-on-five matches that managed to beat a team of OpenAI employees, a team of audience members, a Valve employee team, an amateur team, and a semi-pro team.
In early August, it won two out of three matches against a team ranked in the 99.95th percentile. During the first of the two matches, Open AI Five started and finished strongly, preventing its human opponents from destroying any of its defensive towers. The second match was a tad less one-sided — the humans took out one of OpenAI Five’s towers — but the AI emerged victorious nonetheless. Only in the third match did the human players eke out a victory.
OpenAI Five consists of five single-layer, 4,096-unit long short-term memory (LSTM) networks — a type of recurrent neural network that can “remember” values over an arbitrary length of time — each assigned to a single hero. (That’s up from 1,024-unit LSTMs in previous versions.) The networks are trained using a deep reinforcement learning model that incentivizes their self-improvement with rewards. In OpenAI Five’s case, those rewards are kills, deaths, assists, last mile hits, net worth, and other stats that track progress in Dota 2.
OpenAI’s training framework — Rapid — consists of two parts: a set of rollout workers that run a copy of Dota 2 and an LSTM network, and optimizer nodes that perform synchronous gradient descent (an essential step in machine learning) across a fleet of graphics cards. As the rollout workers gain experience, they inform the optimizer nodes, and another set of workers compare the trained LSTM networks (agents) to reference agents.
To self-improve, OpenAI Five plays 180 years’ worth of games every day — 80% against itself and 20% against past selves — on 256 Nvidia Tesla P100 graphics cards and 128,000 processor cores on Google’s Cloud Platform. Months ago, when OpenAI kicked off training, the AI-controlled Dota 2 heroes “walked aimlessly around the map.” But it wasn’t long before the AI mastered basics like lane defense in farming, and soon after nailed advanced strategies like rotating heroes around the map and stealing items from opponents.
“People used to think that this kind of thing was impossible using today’s deep learning,” Brockman told VentureBeat in an interview last year. “But it turns out that these networks [are] able to play at the professional level in terms of some of the strategies they discover … and really do some long-term planning. The shocking thing to me is that it’s using algorithms that are already here, that we already have, that people said were flawed in very specific ways.”
Fully trained OpenAI Five agents are surprisingly sophisticated. Despite being unable to communicate with each other (a “team spirit” hyperparameter value determines how much or how little each agent prioritizes individual rewards over the team’s reward), they’re masters of projectile avoidance and experience points sharing, and even of advanced tactics like “creep blocking,” in which a hero physically blocks the path of a hostile creep (a basic unit in Dota 2) to slow their progress.
Dota 2 players are already studying OpenAI Five’s styles of play, some of which are surprisingly creative. (In one match, the bots adopted a mechanic that allowed their heroes to quickly recharge a certain weapon by staying out of range of enemies.) As for OpenAI, it’s applying some of the insights gleaned from to other fields: Last February, it released Hindsight Experience Replay (HER), an open source algorithm that effectively helps robots to learn from failure, and later in the year published research on a self-learning robotics system that can manipulate objects with humanlike dexterity.
Brockman said that while this summer’s matches were the final public demonstration, OpenAI will “continue to work” on OpenAI Five.
“The beauty of this technology is that it doesn’t even know it’s [playing] Dota … It’s about letting people connect the strange, exotic but still very tangible intelligences that are created … modern AI technology,” he said. “Games have really been the benchmark [in AI research] … These complex strategy games are the milestone that we … have all been working towards because they start to capture aspects of the real world.”