Advantages

Next: Simulation trade-offs Up: Comments about selective sampling Previous: Comments about selective sampling

Advantages

The simulation-based approach used in Loki-2 has experimentally proved to be better than Loki-1's static approach. This should not be at all surprising, since the simulation approach essentially uses a selective search to augment and refine the static evaluation function. Excluding a serious misconception (or bad luck on a limited sample size), playing out relevant scenarios can only be expected to improve the values obtained by a heuristic (i.e. by the static evaluation function), resulting in a more accurate estimate.

Selective sampling simulations discover information that improves the values obtained by the static evaluation function. In both Loki-1's betting strategy and the PT generation function, actions are taken based on the hand evaluation. During a simulation, the accuracy of the hand evaluation is increased. The number of trials where our hand is stronger than the one assigned to the opponents refines the estimate of hand strength. The fraction of trials where our hand becomes the best one, or is overtaken, with the next cards dealt refines the calculation of hand potential. In addition, a simulation yields information about subtler implications difficult to address in a static betting strategy.

By performing simulations Loki-2 is able to find game strategies which are not specified in the knowledge contained in its evaluation function. For example, if a player has a strong hand then the player can pretend weakness by checking in the first turn to act in a betting round. The opponents will likely bet to their hands (thinking that the player's hand is not good), and then a raise will collect more money than betting as the first action. This strategy is known as check-raising. Loki-2 check-raises its opponents without having the explicit knowledge to do it. Selective sampling simulations uncover the benefits of complex strategies such as check-raising without providing additional expert knowledge to the program.

The use of available information about the game to bias the sampling of the game tree is the key difference between selective sampling simulations and Monte Carlo simulations. Selective sampling is context sensitive. In Loki-2 the opponents' weight tables are used to influence the selection of hole cards for each opponent. The sample is taken in accordance with the underlying probability distribution of the opponents' hands rather than assuming uniform or other fixed probability distributions. Although Monte Carlo techniques may eventually converge to the right answer, selective sampling converges faster and with less variance. This is essential in a real-time game like poker.

Finally, another benefit of the simulation-based framework is that the simulation can be terminated early based on the statistically well-defined concept of an obvious move instead of using an ad hoc technique as is frequently done in alpha-beta-based programs.

Next: Simulation trade-offs Up: Comments about selective sampling Previous: Comments about selective sampling

Lourdes Pena
1999-09-10