When it is Loki-2's turn to act, it invokes the simulation routine to get an estimate of the EV of calling and raising. Folding is considered to have a zero EV, because there is no further profit or loss. The simulation routine plays out Loki-2's hand a specified number of times (trials). However, each trial is actually played out twice - once to consider the consequences of a check/call and once to consider a bet/raise. For each case the amount of money won or lost is determined and averaged with the corresponding results of all the trials. At the end of the simulation the averages of the two sets of trials are taken as the EVs of the corresponding actions.
Simulation is analogous to a selective expansion of some branches of a game tree. Since not all the branches of the game tree can be expanded due to time constraints, the information obtained from a simulation needs to be maximized. The ``perfect'' simulation would examine only the real game state (complete information about the opponent hands, played out over all possible combinations of future community cards). However, the ``perfect'' simulation is impossible without knowing the opponents' cards, and an accurate estimate may be found without looking at all possible outcomes of future cards. One can try to approximate the EV values obtained by the ``perfect'' simulation by expanding and evaluating the nodes which are most likely to occur. In poker not all opponent's hands are equally likely. For example, a player who has been raising the stakes is more likely to have a strong hand than a player who has just called every bet. To consider the opponents' hands in proportion to their underlying probability distribution, Loki-2 uses the information gathered by the opponent modeling module. At the beginning of every trial, Loki-2 randomly generates a hand for each opponent based on the weight table of that opponent. A random method is used to generate the opponents' hands, because of the simplicity of its implementation.
Loki-2's first betting action is predetermined to be either call or raise. Every time it is a player's turn to act inside the simulation, an action is chosen from one of three alternatives (fold, check/call, bet/raise). Since the choice is strongly correlated to the quality of the cards that the player holds, Loki-2 can use the PT generation routine to obtain the likelihood that the player will fold, check/call, or bet/raise. Thus, when a player (an opponent, or Loki-2 after its first action) has to act in the simulation, the PT generation function is called with the player's hand and the current state of the simulated game. The player's action is then randomly selected based on the probability distribution defined by the triple returned, and the simulation proceeds.
As more trials are performed, if the EV of one betting action exceeds the alternatives by a statistically significant margin, one can say that this action is an obvious move and the simulation can be stopped early, with full knowledge of the statistical validity of this decision. We currently define an obvious move as any action where the separation between the EV of the best action and the EV of the second best action is greater than the sum of the standard deviations of the EVs. This criterion for defining an obvious move is extremely conservative, since the separation between the ``best'' decision and the next one is usually not more than two small bets, and the average standard deviation of the EVs is six small bets for calling and eight small bets for raising. This situation results in declaring fewer than 5% of actions as obvious moves. Given the real-time nature of the game, more liberal criteria for distinguishing obvious moves need to be tested to produce more frequent cutoffs while retaining same statistical validity.
The interactions between the opponent modeling module (Opponent Modeler), the PT generation routine (PT Generator) and the simulation module (Simulator) are shown in Figure 5.1. In the diagram squares are system components and rounded rectangles are data structures. The data follows the arrows between components. The square corresponding to the Simulator also illustrates the major steps inside the simulation process. The dashed square around the Opponent Modeler and the PT Generator indicates that their interaction occurs before the simulation starts.
When the simulation returns the EV values for check/call, bet/raise and zero for fold, the current version of Loki-2 simply chooses the action with the greatest expectation. If two actions have the same EV, the program opts for the most aggressive one (call over fold; raise over call). However, against human opposition, a better strategy will be to randomize the selection of betting actions whose EVs are close in value to increase unpredictability.