Google-owned DeepMind put out a new paper that outlines how the team took the machine learning system that created AlphaGo and built a new system that tackled chess and shogi, beating the top programs at each game. The program, called AlphaZero, also beat its predecessor, AlphaGo Zero.
It was a logical next move for DeepMind. Chess and shogi (a chess-like board game that originated in Japan) are both games with computer programs that have already beaten top human players. AlphaZero beat both Stockfish, which is at the top of the game in chess, and Elmo, which is the best program at playing shogi.
The program was trained to do that solely by playing itself, through a process known as reinforcement learning, without any foreknowledge except certain key information about the rules of each game, like how each piece is allowed to move. While AlphaGo (including AlphaGo Zero, which relied on self-play reinforcement learning for training) was built especially for Go, AlphaZero was designed to be far more flexible.
That general-purpose architecture could provide a blueprint for how to develop future AI systems both for playing games and for solving other problems with clear rules and objectives like designing medicines.
DeepMind trained three separate instances of AlphaZero, one each for Go, shogi, and chess. The chess system played 44 million games against itself, while the shogi system played 24 million games and the Go system played through 21 million games.
AlphaZero’s dominance wasn’t assured. There are a number of key differences between Go and the two other games DeepMind selected. Both chess and shogi have restrictions on how different pieces can move, and the board in either game is not rotation independent like it is in Go. What’s more, captured pieces in shogi then become available for an opponent to place on the board.
AlphaZero’s main algorithm also had to change. Because the modern game of Go doesn’t allow for draws, AlphaZero’s algorithm had to adapt from optimizing for a win to optimizing for the best outcome, taking draws into account for chess.
Some interesting trends emerged through all of the system’s testing, though: It never lost a game of chess out of a 100-game match against Stockfish. When playing white, it won 25 times and drew 25 times. It won three times and drew 47 times when playing black. (That’s not unusual — there is a significant first-move advantage in chess.)
AlphaZero also learned some of the most popular opening moves in chess through its self-play, which isn’t necessarily surprising given the limited number of potential opening moves compared to later in the game, but it shows how quickly a computer can pick up knowledge about chess that was accumulated by humans over the course of years.
AlphaZero’s games against Elmo were more lopsided, but showed some weakness. The DeepMind system lost five times as white and three times as black. Shogi is a harder game than chess, since it’s played on a larger board, leading to to higher computational complexity.
Go was the closest contest. While AlphaZero won more games than it lost playing either first or second, its predecessor AlphaGo Zero picked up 19 wins playing first and 21 wins playing second.
It’s unclear if we’ll get to see how AlphaZero will measure up with human competitors. Elmo and Stockfish have beaten top human players, so DeepMind felt comfortable calling the system’s performance superhuman. The company said that AlphaGo would retire from playing against people earlier this year, after handily defeating a set of flesh-and-blood competitors.