Background: In December 2014, Christopher Clark and Amos Storkey of the University of Edinburgh published their paper "Teaching Deep Convolutional Neural Networks to Play Go" on arXiv. It led to an intense discussion on the computer-go mailing list. The network reached 87% wins against Gnu Go, and 14% wins against Fuego 1.1 with 10 seconds per move on two threads.
I randomly picked two games to comment on, one win each for Fuego 1.1 and the network.
First, one comment about Fuego 1.1 and current Fuego. 1.1 is the latest "official" release of Fuego, from 2011. It is popular as a sparring partner in research projects. Fuego has been in continuous development since then. The current svn version includes many features that make it play more human-like. In contrast, version 1.1 is a typical early Monte Carlo program which have strong fighting but lack any knowledge about opening strategy, shapes, or patterns larger than 3x3. The network excels in these areas.
The typical pattern of these games is that the network gets far ahead in the opening, then it becomes a matter of whether it survives the fighting and wins on territory or Fuego captures something big. Once Fuego draws ahead, it will start "playing safe" in typical Monte Carlo fashion and often win by a small margin.
|
In the opening, Black plays good, normal moves while White plays some semi-random
moves typical of the early Monte Carlo programs, such as moves 10 and 12. White's
invasion at 18 and tenuki (playing elsewhere) is also bad, and Black attacks
in good style while White makes a heavy group. However, Black overextends a bit
and faces a dilemma at move 37. Black correctly strengthens the corner, but
allows White to fight back in the center with a cut at 38. With 52, White tries
to fight in the corner, but Black defends strongly. It is interesting to see
a move such as 63 being generated by the network. Move 68 by Fuego is very bad
and Black easily wins this fight. Move 72 is also a horrible blunder - of course
White must atari at L7 first. Black makes no really bad moves in this the game.
The only criticism is that after move 148, Black should answer at A15
to keep the white stones dead. It is also impressive that Black has some
understanding of semeai (capturing races). After White takes a liberty at
move 184, black correctly answers. I have not seen a learned evaluation function
do this before.
Judging only from this game, the network is much stronger than Fuego 1.1 and should win almost 100% of its games. However, the result of Clark and Storkey shows only 14% wins. The next game will give an idea why. |
|
The opening follows a similar pattern, with "cosmic style" moves by Fuego 1.1
and good, normal opening play by the network. Tenuki at move 21 is bad and is
punished correctly.
After move 50, surprisingly both players leave the
double atari situation in the center open for a while.
Black's invasion at move 61 is a bit desparate, and 67 is bad, and the
network beautifully kills the invaders.
Well, not quite. After move 73, Black has only one eye and not enough liberties. Simply playing an outside move such as T6 would win very easily for White. However, White cuts at the inside first, and then plays T6. This is a horrible blunder which forces Black to make two eyes and leaves White split into two weak groups. However, Black overplays its attack and by move 90 has lost three key stones. With moves 91 and 99 Black tries to save these stones, leaving a typical beginner textbook problem. White can throw in at P8 and capture Black in a "snapback". However, the network plays a "shape" move that lets Black get away, and all the white stones die. After this, Fuego "knows" it is completely winning, plays many ultra-safe moves, and is a bit more than 5 points ahead at the end. Something completely ridiculous happens at the very end. White 248 puts its dead stones into self atari, Black ignores it, White starts a massive ko with move 252 which could reverse the game, but at move 254 the network plays a neutral move instead of a ko threat. Both programs pass in the unresolved situation, apparently believing that the white stones in atari can not be saved. |
The evaluation learned by the network is clearly very promising. It clearly also has some huge holes in it - it has no idea of very basic tactics if it has not seen them in its training data. No human player past a beginner would try to save the stones with moves 91 and 99 in the second game, as Fuego 1.1 did. But apparently the network had no means of learning the simple refutation.
I believe that using this evaluation as an initial bias for nodes in an MCTS tree would work very well. I also think that training could and should be augmented by using game records containing some fraction of bad opponent moves to cover refutations to those plays that never show up in strong play.