Generalized Sampling and Variance in Counterfactual Regret Minimization

Richard Gibson, Marc Lanctot, Neil Burch, Duane Szafron, and Michael Bowling. Generalized Sampling and Variance in Counterfactual Regret Minimization. In Proceedings of the Twenty-Sixth Conference on Artificial Intelligence (AAAI), pp. 1355–1361, 2012. A longer version is available as a University of Alberta Technical Report, TR12-02.

Download

[PDF] 

Abstract

In large extensive form games with imperfect information, Counterfactual Regret Minimization (CFR) is a popular, iterative algorithm for computing approximate Nash equilibria. While the base algorithm performs a full tree traversal on each iteration, Monte Carlo CFR (MCCFR) reduces the per iteration time cost by traversing just a sampled portion of the tree. On the other hand, MCCFR’s sampled values introduce variance, and the effects of this variance were previously unknown. In this paper, we generalize MCCFR by considering any generic estimator of the sought values. We show that any choice of an estimator can be used to probabilistically minimize regret, provided the estimator is bounded and unbiased. In addition, we relate the variance of the estimator to the convergence rate of an algorithm that calculates regret directly from the estimator. We demonstrate the application of our analysis by defining a new bounded, unbiased estimator with empirically lower variance than MCCFR estimates. Finally, we use this estimator in a new sampling algorithm to compute approximate equilibria in Goofspiel, Bluff, and Texas hold’em poker. Under each of our selected sampling schemes, our new algorithm converges faster than MCCFR.

BibTeX

@InProceedings(12aaai-probing-w-tr,
  Title = "Generalized Sampling and Variance in Counterfactual Regret Minimization",
  Author = "Richard Gibson and Marc Lanctot and Neil Burch and Duane Szafron and Michael Bowling",
  Booktitle = "Proceedings of the Twenty-Sixth Conference on Artificial Intelligence (AAAI)",
  Pages = "1355--1361",
  Note = "A longer version is available as a University of Alberta Technical Report, TR12-02.",
  Year = "2012",
  AcceptRate = "26\%",
  AcceptNumbers = "294 of 1129"
)

Generated by bib2html.pl (written by Patrick Riley) on Fri Feb 13, 2015 15:54:27