8.1 Self-Play Simulations

Self-play simulations offer a convenient method for the comparison of two or more versions of the program. In addition to verifying that a certain enhancement has a beneficial effect, it is possible to quantify the contribution made by each new component to the system. Since all participants in the simulated game are versions of the program, play can proceed at a rapid pace, and results can be based on large (statistically significant) sample sizes.

The self-play simulations use the duplicate tournament system described in [2], based on the same principle as duplicate bridge. Since each hand can be played with no memory of the cards dealt in preceding hands, it is possible to replay the same deal, but with the participants holding a different set of hole cards each time. This system simulates a ten-player game. Each hand is replayed ten times (ten trials), shuffling the seating arrangement so that every participant has the opportunity to play each set of hole cards once, and no two players are seated in the same relative position more than once (so, for instance, each player will play directly behind each other player exactly once). The hole cards are always in the same betting order so those belonging to the small blind are identical in each trial. The seating permutations are listed in Table 8.1 (T = 10).

Table 8.1: Seating assignments for tournament play (reproduced from [2])

	Seat Number for Each Player
Round	1	2	3	4	5	6	7	8	9	T
1	1	2	3	4	5	6	7	8	9	T
2	2	4	6	8	T	1	3	5	7	9
3	3	6	9	1	4	7	T	2	5	8
4	4	8	1	5	9	2	6	T	3	7
5	5	T	4	9	3	8	2	7	1	6
6	6	1	7	2	8	3	9	4	T	5
7	7	3	T	6	2	9	5	1	8	4
8	8	5	2	T	7	4	1	9	6	3
9	9	7	5	3	1	T	8	6	4	2
T	T	9	8	7	6	5	4	3	2	1

This arrangement greatly reduces the ``luck element" of the game, since each player will have the same number of good and bad hands. The differences in the performance of players will therefore be based more strongly on the quality of the decisions made in each situation. This large reduction in natural variance means that meaningful results can be obtained with a much smaller number of trials than in a typical game setting.

There are numerous different ways to use self-play simulation to test different versions of the program. One simple application would be to play five copies of a new version against five copies of an older version, differing only in the addition of one new feature. If the new component has improved the program (against itself), then the newer version will win against the older version. The average margin of victory, in terms of expected number of small bets per hand, can also give a preliminary indication of the relative value of the new enhancement.

However, one must be careful when drawing conclusions from self-play experiments. It is important to not over-interpret the results of one simulation [1]. With the above format, there are limitations to how much can be concluded from a single experiment, since it is representative of only one particular type of game and style of opponent. It is quite possible that the same feature will perform much worse (or much better) in a game against human opposition, for example. A wider variety of testing is necessary to get an accurate assessment of the new feature, such as changing the context of the simulated game.

However, most versions of Loki are very similar and have fairly conservative styles. It is quite possible that the consequences of each change would be different against a field of opponents who employ different playing styles. For example, against several human players, the effect of the weighting function may be much bigger than that of hand potential. Inter-dependencies between the involved players can also affect results. The second-best player may perform first overall if it can exploit a particular bad player more than the best player can.