Computers are getting more sophisticated than ever at understanding and playing complicated games. DeepMind, one of the leaders in artificial intelligence, proved that once again today with its latest A.I. agent called AlphaStar. During a livestream, this program took on two StarCraft II pros in a series of five matches for each, and AlphaStar swept all 10 matches.
StarCraft II pros Dario “TLO” Wünsch and Greegorz “MaNa” Komincz are two of the top players in the world. But neither could handle neural-network-powered AlphaStar. Blizzard opened up StarCraft II to A.I. researchers last year, and that has resulted in huge leaps in computer performance.
DeepMind has already mastered chess and go with AlphaZero and AlphaGo, respectively. And those games are so complicated that no computer on Earth could brute-force calculate every possible match in those games. But a real-time strategy video game like StarCraft II is exponentially more complicated in terms of what is possible in every moment. And this reveals the power the deep learning. Something like AlphaStar doesn’t have to learn every possible match in StarCraft to understand it. Instead, it focuses on winning strategies.
How AlphaStar learns
The reason that AlphaStar is such a big deal is because of the way it learns. It uses multiple techniques, and DeepMind ran through how it works.
“We take many replays from pro and players, and we try to get AlphaStar to understand by looking at a situation that human player is in,” DeepMind research co-lead Oriol Vinyals said. “And then we try to get it to imitate those moves.”
DeepMind doesn’t just use pro games either. The company also looks at public matches from players who have a high matchmaking rating.
But the imitation training only creates the most basic iteration of AlphaStar. DeepMind says this version 0.1 agent is equivalent to a platinum-level ladder player.
To prep AlphaStar for a pro fight, DeepMind had to use its neural-network training.
The AlphaStar League
How do you get better at something? Study and practice. AlphaStart nailed the studying part with the imitation learning. For the practice, however, DeepMind set up what it called the AlphaStar League. This is neural-network training program where different versions of AlphaStar would play each other over and over nonstop for a week.
This is the heart of modern machine learning. DeepMind sets success parameters for the A.I. programs such as “win the match.” And then each A.I. agent, as they are called, makes decisions to accomplish that goal. Then the A.I. that wins gets to continue on in the AlphaStar League.
But the training goes deeper than that. DeepMind also increases the possibility for mutations from one generation of AlphaStar to the next by setting certain agents to try to win while favoring a certain unit type, for example.
DeepMind sets its AlphaStar agents to mutate both randomly and to take on the characteristics of the agents that are winning the most. This process works so well because the A.I. is capable of playing many matches in quick succession. At the end of a week or two of training, AlphaStar has played 200 years worth of StarCraft II.
But doesn’t the computer cheat?
DeepMind knew that some StarCraft players are skeptical about a computer-controlled opponent. It brought in StarCraft experts to talk about the matches and to ask the questions that the community would want answers to. Those experts focused on how AlphaStar actually plays and perceives the game. For example, can it see through the fog of war that looks like a veil to human players. Or is it just spamming key presses a thousand times faster than human hands could physically move?
But DeepMind said that it tried to keep things level. It limits AlphaStar’s actions-per-minute (APM) to ensure the computer isn’t winning through sheer force of speed.
“Oveall, AlphaStar uses considerably fewer APMs than a human pro,” DeepMind co-lead David Silver said. “That indicates that it’s winning by not clicking insanely but by doing something much smarter than that.”
AlphaStar also doesn’t have a superhuman reaction time.
“We measured how quickly it reacts to things,” Silver said. “If you measure the time between when AlphaStar perceives the game. From when it observes what’s going on, then has to process it, and then communicate what it chooses back to the game. That time is actually closer to 350ms. That’s on the slow side of human players.”
Finally, DeepMind explained how AlphaStar visualizes the world of the game. It’s not looking at the code, but it’s also not moving the camera around like a human player. Instead, it is looking at the map zoomed all the way out, but it cannot see through the fog of war or anything like that. It can only see parts of the map where it has units. But DeepMind says that AlphaStar is still splitting up its economy of attention in the same way that a human player is.
AlphaStar did lose one match
The livestream primarily focused on the five-game matches that AlphaStar played against TLO and MaNa a few weeks agao. But DeepMind did let MaNa get a rematch live in front of the audience watching on YouTube and Twitch. And this is when MaNa got his revenge with a win against the machine.
But the live match of MaNa vs. AlphaStar had some variations compared to the last time they played. DeepMind used a new prototype version of AlphaStar that actually uses the exact same camera view as the players. This means that AlphaStar can’t just sit at a zoomed-out perspective, it has to get in close to the action to see the details of the fight.
This version of AlphaStar also didn’t have as much time to train. So instead of playing through 200 years of an AlphaStar league, it played through something closer to 20 years. But even with that “limited” experience, it still showed off strategies that shocked everyone watching.
“The way AlphaStar played the matchup was not anything like I had experience with,” said MaNa. “It was a different kind of StarCraft. It was a great chance to learn something new from an A.I.”
And that’s one of the things that DeepMind is proudest of. That a pro player could take away new strategy ideas by playing against a computer, which is not something that anyone would have considered possible before.
“At the end of the day, playing against A.I. is great,” said Vinyals. “But because of the way we train AlphaStar, some of the moves — like oversaturating probes — maybe this could challenge some of the wisdom that has spread among top players.”