next up previous contents
Next: Bridge Up: Games Previous: Games

Backgammon

In backgammon the unknown outcome of the dice rolls makes the brute-force approach infeasible by raising the branching factor to several hundreds moves (21 possible dice combinations, each of them having 20 legal moves). The backgammon program TD-Gammon [26] [27] uses temporal difference (TD) learning to learn by itself how to play backgammon at a world-championship level. The TD-Gammon neural network is trained by self-play simulations. During training, TD-Gammon considers each of the 21 ways it can play its dice role and the corresponding positions that will result. Then, the move that leads to the position with the highest estimated value is chosen. This learning method is used even at the start of the training when the network's strategy is random. After playing about 300,000 games against itself, TD-Gammon 0.0 with essentially zero backgammon knowledge learned to play approximately as well as the best previous backgammon computer program. Self-play training refined with some initial backgammon knowledge produced a program that played at a world-class level.

Recent versions of the program were augmented with a selective two-ply or three-ply search procedure. A ply is an individual playing action (only one of the players makes a move). To select moves, these programs look ahead to consider the opponent's possible dice rolls and moves. Assuming that the opponent always takes the move that appeared immediately best for the opponent, the expected value of each candidate move is computed and the best move is selected. The second ply of search is conducted only for candidate moves that were ranked high after the first ply. This selective search procedure affects only the move selection; the learning process proceeds exactly as before.

Also, simulations are used in backgammon to perform ``rollouts'' of certain positions. The rollouts are now generally regarded as the best available estimates for the equity of a given position. A simulation consists of generating a series of dice rolls, playing through to the end of the game, and then recording the outcome.


next up previous contents
Next: Bridge Up: Games Previous: Games
Lourdes Pena
1999-09-10