Design

Next: Results Up: Experiments Previous: Experiments

Design

One goal of this research project was to construct a series of self-play poker tournament experiments to obtain statistically significant results that show each enhancement improved Loki-2's performance under different playing conditions (as is typically seen against human competition). The experimental design to accomplish these goals is described in this section.

Each self-play tournament consists of playing two versions of Loki against each other: eight copies of a control version and two copies of a modified version. To reduce the ``luck'' factor of the game and consequently the variance, the tournaments follow the pattern of duplicate bridge tournaments described in [2] and [19]. Each deal is played ten times, each time changing the seat order so that 1) every player holds every set of hidden cards once, and 2) every player is seated in a different position relative to all opponents. A tournament consists of 2,500 different deals (i.e. 25,000 games or trials).

The playing style of a player is defined by the percentage of hands played (e.g. liberal-loose or conservative-tight) and the frequency of raising when active (e.g. aggressive or passive). Players are classified using a two character notation where the first letter represents the percentage of hands played and varies from tight (T) to loose (L), and the second letter represents the raising frequency and goes from passive (P) to aggressive (A). These characteristics are not exclusive in a player. For example, a conservative/aggressive (T/A) player will play few hands (fold most of the hands in the preflop), but will bet/raise often when active.

To test an enhancement, one particular version of the program is first played against an identical program with the new feature in a homogeneous field (all the players have the same playing style). For example, one can play eight conservative/aggressive base Loki-1 players against two conservative/aggressive Loki-2 players that are augmented with the PT function betting strategy. Second, the enhancement is tested in combination with other changes. Third, the modification is tested against opponents that have different playing styles.

To measure the impact of each new enhancement on the program's performance, we use the average number of small bets won per hand (sb/hand). This is a metric sometimes used by human players. For instance, in a game of $10-$20 Holdem (small bets are $10 and big bets are $20), a player who has an improvement of +0.20 sb/hand will make an extra $60 per hour (based on 30 hands per hour); anything above +0.05 sb/hand is considered a large improvement. One must be cautious when interpreting the results of these self-play experiments, since any feature could perform worse (or better) playing against human opposition [1]. The main function of these experiments is to weed out bad ideas. Ultimately, the only performance metric that is important is how Loki plays against humans. Since it is difficult (and expensive) to get this data, most of our experimentation must be done with self-play first.

Next: Results Up: Experiments Previous: Experiments

Lourdes Pena
1999-09-10