Monte Carlo Sampling for Regret Minimization in Extensive Games

Marc Lanctot, Kevin Waugh, Martin Zinkevich, and Michael Bowling. Monte Carlo Sampling for Regret Minimization in Extensive Games. In Advances in Neural Information Processing Systems 22 (NIPS), pp. 1078–1086, 2009. A longer version is available as a University of Alberta Technical Report, TR09-15. An earlier version appeared in the COLT Workshop on On-Line Learning with Limited Feedback (2009).

Download

[PDF] 

Abstract

Sequential decision-making with multiple agents and imperfect information is commonly modeled as an extensive game. One efficient method for computing Nash equilibria in large, zero-sum, imperfect information games is counterfactual regret minimization (CFR). In the domain of poker, CFR has proven effective, particularly when using a domain-specific augmentation involving chance outcome sampling. In this paper, we describe a general family of domain-independent CFR sample-based algorithms called Monte Carlo counterfactual regret minimization (MCCFR) of which the original and poker-specific versions are special cases. We start by showing that MCCFR performs the same regret updates as CFR on expectation. Then, we introduce two sampling schemes: outcome sampling and external sampling, showing that both have bounded overall regret with high probability. Thus, they can compute an approximate equilibrium using self-play. Finally, we prove a new tighter bound on the regret for the original CFR algorithm and relate this new bound to MCCFR's bounds. We show empirically that, although the sample-based algorithms require more iterations, their lower cost per iteration can lead to dramatically faster convergence in various games.

Errata

Supplemental material. There was an error in the original paper appearing in the NIPS proceedings. This version has been corrected. For details, please see the errata on the NIPS 2009 Electronic Proceedings.

Additional Information

Supplemental material. There was an error in the original paper appearing in the NIPS proceedings. This version has been corrected. For details, please see the errata on the NIPS 2009 Electronic Proceedings.

BibTeX

@InProceedings(09nips-mccfr-w-trws,
  Author = "Marc Lanctot and Kevin Waugh and Martin Zinkevich and Michael Bowling",
  Title = "{M}onte {C}arlo Sampling for Regret Minimization in Extensive Games",
  Booktitle = "Advances in Neural Information Processing Systems 22 (NIPS)",
  Year = "2009",
  Note = "A longer version is available as a University of Alberta Technical Report, TR09-15. An earlier version appeared in the {\em COLT Workshop on On-Line Learning with Limited Feedback} (2009)",
  Pages = {1078--1086},
  AcceptRate = "24\%",
  AcceptNumbers = "263 of 1105"
)

Generated by bib2html.pl (written by Patrick Riley) on Fri Feb 13, 2015 15:54:27