Next: Bibliography Up: Dealing with Imperfect Information Previous: 8.5 Summary Contents

9. Conclusions and Future Work

Loki successfully demonstrates beneficial opponent modeling in a high-performance game-playing program. In closed self-play experiments it was clearly beneficial to use modeling, and the results from IRC play are also promising. However, it does not necessarily follow that it will be equally successful in games against strong human players. Humans can be very good at opponent modeling, and less predictable than the players in these experiments.

In our self-play experiments, we have not yet investigated modeling opponents who vary their strategy over time. There are also many other interesting questions to be addressed. Our approach was a first approximation using an intuitive approach, and the major benefits came from the introduction of the weight array (and the re-weighting). The enumeration algorithms for hand evaluation are well suited to this expression of opponent modeling, allowing it to be a very useful asset to the accounting system.

The overall performance was hampered by the ad hoc betting strategy. In fact, many aspects of Loki were a tradeoff between usefulness and correctness - in many places we selected the simple (and cost-effective) approach for its reasonable approximations. Provided the error is not one-sided we should see an amortizing effect. We have not actually examined what the error is in our many approximations, but it is not worth the effort until it is a limiting aspect of play. Since we plan on replacing the betting strategy with something less dependent on expert rules, it is not worth examining the benefits of particular features. Similarly, we feel that Bot Warz or general IRC play could have had better results had we put more time into the betting strategy. However, this would have amounted to tweaking artificial parameters without general applicability.

The betting strategy should also use the opponent modeling information. A good approach might be to run simulations to the end of the game using the weight array to randomly select ``reasonable" opponent hands (and to weight the results as in our enumeration techniques). The specific opponent information could then be used to predict opponent actions in the simulations, resulting in estimates of the expected value for each of our options. Presumably, strategies such as check-raising or bluffing would emerge naturally. For example, bluffing may turn out to be the best action in a situation where we recognize that our opponent is likely to fold.

The specific opponent modeling program (SOM) was hampered by the crude method used for collecting and applying observed statistics. Much of the relevant context was ignored for simplicity, such as the previous action taken by a player. A more sophisticated method for observing and utilizing opponent behavior would allow for a more flexible and accurate opponent model.

The re-weighting system could be adjusted, such as inverse re-weightings for passive actions like checking/calling. Presently, every witnessed action leads to the opponent's average hand getting ``stronger". If we considered upper thresholds on actions implying some weakness, like checking and calling, we could appropriately re-weight their weight array. Specific modeling could also observe variance, or the consistency the opponent exhibits in their behavior. This information could be used in the re-weighting function instead of the simple linear function we use (with a fixed $\sigma$ ).

Poker is a complex game. Strong play requires the player to handle all aspects of the game adequately, and to excel in several. Developing Loki seems to be a cumulative process. We improve one component of the program until it becomes apparent that another aspect is the performance bottleneck. That problem is then tackled until it is no longer the limiting factor, and a new weakness in the program's play is revealed. We have made an initial foray into opponent modeling and are pleased with the results, although it is far from a completed subject.

Wherever possible, the project should be driven to remove whatever human expert information is used. Betting strategy is clearly a major component that needs to be addressed. However, there are other candidates such as more sophisticated opponent modeling. Eventually, more sophisticated simulations for learning good pre-flop play could be based on Loki's post-flop playing ability.

Concepts such as hand strength and potential are appropriate for any poker variant. While parts of our implementation, such as the weight array, may be specific to Texas Hold'em, our ideas are easily mappable to other variants.

Is it possible to build a program which is the best poker player in the world? Certainly we can construct a program which has a very strong mathematical basis and runs within the real-time constraints. It is also clear that some form of opponent modeling (in addition to other advanced features) are necessary to beat the better players. However, it is not clear how difficult it will be to build and maintain opponent models that are sufficiently detailed and context sensitive. While we are probably close to a program which can win money in most typical low-limit casino games, we are far from the lofty goal of being the best in the world.

Next: Bibliography Up: Dealing with Imperfect Information Previous: 8.5 Summary Contents

Denis Papp
1998-11-30