DeepMind, a division of Google that’s focused on advancing artificial intelligence research, unveiled a new version of its AlphaGo program today that learned the game solely by playing itself.
Called AlphaGo Zero, the system works by learning from the outcomes of its self-play games, using a machine learning technique called reinforcement learning. As Zero was continuously trained, the system began learning advanced concepts in the game of Go on its own and picking out certain advantageous positions and sequences.
After three days of training, the system was able to beat AlphaGo Lee, DeepMind’s software that defeated top Korean player Lee Sedol last year, 100 games to zero. After roughly 40 days of training — which translates to 29 million self-play games — AlphaGo Zero was able to defeat AlphaGo Master (which defeated world champion Ke Jie earlier this year) 89 games to 11.
The results show that there’s still plenty more to be learned in the field of artificial intelligence when it comes to the effectiveness of different techniques. AlphaGo Master was built using many of the similar approaches that AlphaGo Zero was, but it began training on human data first before moving on to self-play games.
One interesting note is that while AlphaGo Zero picked up on several key concepts during its weeks of training, the system learned differently than many human players who approach the game of Go. Sequences of “laddered” stones, played in a staircase-like pattern across the board, are one of the first things that humans learn when practicing the game. Zero only understood that concept later in its training, according to the paper DeepMind published in the journal Nature.
In addition, AlphaGo Zero is far more power-efficient than many of its predecessors. AlphaGo Lee required the use of several machines and 48 of Google’s Tensor Processing Unit machine learning accelerator chips. AlphaGo Fan, an earlier version of the system, required 176 GPUs. AlphaGo Zero, along with AlphaGo Master, each only require a single machine with four TPUs.
What remains to be seen is how well these techniques and concepts generalize to problems outside the realm of Go. While AlphaGo’s effectiveness in human games and against itself has shown that there’s room for AI to surpass our capacity in tasks that we think are far too difficult, the robot overlords aren’t here yet.