CPRG

AIVAT: A New Variance Reduction Technique for Agent Evaluation in Imperfect Information Games
Authors
Neil Burch, Martin Schmid, Matej Moravcik, Dustin Morrill, and Michael Bowling
Purpose
Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence (AAAI-18)
Date
February 2018
Show Abstract.
Evaluating agent performance when outcomes are stochastic and agents use randomized strategies can be challenging when there is limited data available. The variance of sampled outcomes may make the simple approach of Monte Carlo sampling inadequate. This is the case for agents playing heads-up no-limit Texas hold’em poker, where manmachine competitions typically involve multiple days of consistent play by multiple players, but still can (and sometimes did) result in statistically insignificant conclusions. In this paper, we introduce AIVAT, a low variance, provably unbiased value assessment tool that exploits an arbitrary heuristic estimate of state value, as well as the explicit strategy of a subset of the agents. Unlike existing techniques which reduce the variance from chance events, or only consider game ending actions, AIVAT reduces the variance both from choices by nature and by players with a known strategy. The resulting estimator produces results that significantly outperform previous state of the art techniques. It was able to reduce the standard deviation of a Texas hold’em poker man-machine match by 85% and consequently requires 44 times fewer games to draw the same statistical conclusion. AIVAT enabled the first statistically significant AI victory against professional poker players in no-limit hold’em. Furthermore, the technique was powerful enough to produce statistically significant results versus individual players, not just an aggregate pool of the players. We also used AIVAT to analyze a short series of AI vs human poker tournaments, producing statistical significant results with as few as 28 matches.
Hide Abstract.
Time and Space: Why Imperfect Information Games are Hard
Authors
Neil Burch
Purpose
Ph.D. Dissertation
Date
December 2017
Show Abstract.
Decision-making problems with two agents can be modeled as two player games, and a Nash equilibrium is the basic solution concept describing good play in adversarial games. Computing this equilibrium solution for imperfect information games, where players have private, hidden information, is harder than solving perfect information games. Both domains may technically be classified as easy, with algorithms that require polynomial time and space, but the difference in degree means that for extremely large games, both computation time and storage requirements may exceed available resources. In this thesis, I present four main contributions towards fast and memory efficient techniques for solving extensive-form games, which describe a class of imperfect information games with sequential decision making. First, the thesis introduces an analysis of counterfactual regret minimisation (CFR), an algorithm for solving extensive-form games, and presents tighter regret bounds that describe the rate of progress. CFR is a popular, state-of-the-art algorithm, and the improved bounds give some indication as to why the algorithm has good practical performance. Second, I describe and analyse a game-solving algorithm, CFR + , which has faster empirical performance than CFR, and compare it to a number of theoretically faster algorithms. We wrote and released an efficient, distributed implementation of CFR + to solve heads-up limit Texas hold’em, making it the first competitively played imperfect information game to be solved. Third, the thesis presents a series of theoretical tools for using decomposition, and creating algorithms which operate on small portions of a game at a time. I describe an algorithm, CFR-D, which can solve games without ever storing a complete strategy, and an algorithm for re-solving a portion of a game to complete an incomplete equilibrium within the full game. Both algorithms have a formal guarantee of correctness. Finally, I describe continual resolving, an algorithm which is an imperfect information analogue to heuristic search in perfect information games. Continual re-solving uses decomposition to play games by doing depth-limited solving with a heuristic evaluation, with a theoretical bound on solution quality. We implemented continual re-solving in the game of no-limit heads-up no-limit Texas hold’em, and beat a group of 33 professional players in a human trial.
Hide Abstract.
Heads-Up Limit Hold'em Poker Is Solved
Authors
Michael Bowling, Neil Burch, Michael Johanson, Oskari Tammelin
Purpose
Communications of the ACM
Date
November 2017
Show Abstract.
Poker is a family of games that exhibit imperfect information, where players do not have full knowledge of past events. While many perfect information games have been solved (e.g., Connect-Four and checkers), no nontrivial imperfect information game played competitively by humans has previously been solved. In this paper, we announce that the smallest variant of poker in-play, heads-up limit Texas hold'em, is now essentially weakly solved. Furthermore, this computation formally proves the common wisdom that the dealer in the game holds a significant advantage. This result was enabled by a new algorithm, CFR+, which is capable of solving extensive-form games three orders of magnitude larger than previously possible. This paper is an extended version of the original 2015 Science article,9 with additional results showing Cepheus' in-game performance against computer and human opponents.
Hide Abstract.
DeepStack: Expert-Level Artificial Intelligence in No-Limit Poker
Authors
Matej Moravcik, Martin Schmid, Neil Burch, Viliam Lisy, Dustin Morrill, Nolan Bard, Trevor Davis, Kevin Waugh, Michael Johanson, and Michael Bowling
Purpose
Science
Date
March 2017
Show Abstract.
Artificial intelligence has seen several breakthroughs in recent years, with games often serving as milestones. A common feature of these games is that players have perfect information. Poker is the quintessential game of imperfect information, and a longstanding challenge problem in artificial intelligence. We introduce DeepStack, an algorithm for imperfect information settings. It combines recursive reasoning to handle information asymmetry, decomposition to focus computation on the relevant decision, and a form of intuition that is automatically learned from self-play using deep learning. In a study involving 44,000 hands of poker, DeepStack defeated with statistical significance professional poker players in heads-up no-limit Texas hold’em. The approach is theoretically sound and is shown to produce more difficult to exploit strategies than prior approaches.
Hide Abstract.
AIVAT: A New Variance Reduction Technique for Agent Evaluation in Imperfect Information Games
Authors
Neil Burch, Martin Schmid, Matej Moravcik, and Michael Bowling
Purpose
Proceedings of the Thirty-First Conference on Artificial Intelligence (AAAI) Workshop on Computer Poker and Imperfect Information Games
Date
February 2017
Show Abstract.
Evaluating agent performance when outcomes are stochastic and agents use randomized strategies can be challenging when there is limited data available. The variance of sampled outcomes may make the simple approach of Monte Carlo sampling inadequate. This is the case for agents playing heads-up no-limit Texas hold'em poker, where man-machine competitions have involved multiple days of consistent play and still not resulted in statistically significant conclusions even when the winner's margin is substantial. In this paper, we introduce AIVAT, a low variance, provably unbiased value assessment tool that uses an arbitrary heuristic estimate of state value, as well as the explicit strategy of a subset of the agents. Unlike existing techniques which reduce the variance from chance events, or only consider game ending actions, AIVAT reduces the variance both from choices by nature and by players with a known strategy. The resulting estimator in no-limit poker can reduce the number of hands needed to draw statistical conclusions by more than a factor of 10.
Hide Abstract.
Eqilibrium Approximation Quality of Current No-Limit Poker Bots
Authors
Viliam Lisy and Michael Bowling
Purpose
Proceedings of the Thirty-First Conference on Artificial Intelligence (AAAI) Workshop on Computer Poker and Imperfect Information Games
Date
February 2017
Show Abstract.
Approximating a Nash equilibrium is currently the best performing approach for creating poker-playing programs. While for the simplest variants of the game, it is possible to evaluate the quality of the approximation by computing the value of the best response strategy, this is currently not computationally feasible for larger variants of the game, such as heads-up no-limit Texas hold'em. In this paper, we present a simple and computationally inexpensive Local Best Response method for computing an approximate lower bound on the value of the best response strategy. Using this method, we show that existing poker-playing programs, based on solving abstract games, are remarkably poor Nash equilibrium approximations.
Hide Abstract.
Online Agent Modelling in Human-Scale Problems
Authors
Nolan Bard
Purpose
Ph.D. Dissertation
Date
March 2016
Show Abstract.
The world is replete with problems involving interactions between multiple agents. While humans have historically been the primary actors in these multiagent domains, the rise of artificial intelligence has been driving computer agents to the forefront of many multiagent problems. Even now, computer agents have been deployed for everything from relatively innocuous tasks, like automated phone systems and elevator control, to potentially life altering roles, such as autonomous vehicles and stock trading agents. This ubiquity makes computer agents central actors with a very real impact on the day-to-day lives of people around the world. Moreover, the growing reach of autonomous computer agents has made coping with the presence of other decision makers a key challenge for artificial intelligence. In multiagent environments, agents must interact with other agents that may have unknown goals and capabilities. Consequently, an agent's ideal behaviour depends on how the other agents act. Learning about other agents and adapting to their behaviour is essential for such an ideal agent and is the central task of agent modelling. In this seminar, I will present a body of work focused on the problem of modelling agents during interaction in complex human-scale domains. In particular, we examine massive stochastic imperfect information domains where adaptation must occur in real-time from limited observations.
Hide Abstract.
Using Regret Estimation to Solve Games Compactly
Authors
Dustin Morrill
Purpose
M. Sc. Thesis
Date
February 2016
Show Abstract.
Game theoretic solution concepts, such as Nash equilibrium strategies that are optimal against worst case opponents, provide guidance in finding desirable autonomous agent behaviour. In particular, we wish to approximate solutions to complex, dynamic tasks, such as negotiation or bidding in auctions. Computational game theory investigates effective methods for computing such strategies. Solving human-scale games, however, is currently an intractable problem. Counterfactual Regret Minimization (CFR), is a regret-minimizing, online learning algorithm that dominates the Annual Computer Poker Competition (ACPC) and lends itself readily to various sampling and abstraction techniques. Abstract games are created to mirror the strategic elements of an original game in a more compact representation. The abstract game can be solved and the abstract game solution can be translated back into the full game. But crafting an abstract game requires domain-specific knowledge, and an abstraction can interact with the game solving process in unintuitive and harmful ways. For example, abstracting a game can create pathologies where solutions to more granular abstractions can be more exploitable against a worst-case opponent in the full game than those derived from simpler abstractions. An abstraction that could be dynamically changed and informed by the solution process could produce better solutions more consistently. We suggest that such abstractions can be largely subsumed by a regressor on game features that estimates regret during CFR. Replacing abstraction with a regressor allows the memory required to approximate a solution to a game to be proportional to the complexity of the regressor rather than the size of the game itself. Furthermore, the regressor essentially becomes a tunable, compact, and dynamic abstraction of the game that is informed by and adapts to the particular solution being computed. These properties will allow this technique to scale to previously intractable domains. We call this new algorithm Regression CFR (RCFR). In addition to showing that this approach is theoretically and practically sound, we improve RCFR by combining it with regret-matching+. Experiments involving two small poker games show that RCFR and its extension, RCFR+, show that it can approximately solve games with regressors that are drastically less complex than the game itself. In comparisons with traditional static abstractions of similar complexity, RCFR variants tend to produce less exploitable strategies.
Hide Abstract.
Counterfactual Regret Minimization in Sequential Security Games
Authors
Viliam Lisy, Trevor Davis, and Michael Bowling
Purpose
Proceedings of the Thirtieth Conference on Artificial Intelligence (AAAI)
Date
February 2016
Show Abstract.
Many real world security problems can be modelled as finite zero-sum games with structured sequential strategies and limited interactions between the players. An abstract class of games unifying these models are the normal-form games with sequential strategies (NFGSS). We show that all games from this class can be modelled as well-formed imperfect-recall extensive- form games and consequently can be solved by counterfactual regret minimization. We propose an adaptation of the CFR+ algorithm for NFGSS and compare its performance to the standard methods based on linear programming and incremental game generation. We validate our approach on two security-inspired domains. We show that with a negligible loss in precision, CFR+ can compute a Nash equilibrium with five times less computation than its competitors.
Hide Abstract.
Robust Strategies and Counter-Strategies: From Superhuman to Optimal Play
Authors
Michael Bradley Johanson
Purpose
Ph.D. Dissertation
Date
January 2016
Show Abstract.
Games have been used as a testbed for artificial intelligence research since the earliest conceptions of computing itself. The twin goals of defeating human professional players at games, and of solving games outright by creating an optimal computer agent, have helped to drive practical research in this field. Deep Blue defeating Kasparov at chess and Chinook solving the game of checkers serve as milestone events in the popular understanding of artificial intelligence. However, imperfect information games present new challenges and require new research. The Abstraction-Solving-Translation procedure for approaching such games involves abstracting a game down to a tractable size, solving the abstract game to produce a strong abstract strategy, and translating its decisions into the real game as needed. Related challenges include principled evaluation of the resulting computer agents, and using opponent models to improve in-game performance against imperfect adversaries. The papers presented in this thesis encompass the complete end-to-end task of creating strong agents for extremely large games by using the Abstraction-Solving-Translation procedure, and we present a body of research that has made contributions to each step of this task. We use the game of poker as a testbed domain to validate our research, and present two milestone accomplishments reached as a result: the first victory of a computer agent over human professionals in a meaningful poker match, and the first solution to any imperfect information game played competitively by humans.
Hide Abstract.
Using Response Functions for Strategy Training and Evaluation
Authors
Trevor Davis
Purpose
M. Sc. Thesis
Date
June 2015
Show Abstract.
Extensive-form games are a powerful framework for modeling sequential multiagent interactions. In extensive-form games with imperfect information, Nash equilibria are generally used as a solution concept, but computing a Nash equilibrium can be intractable in large games. Instead, a variety of techniques are used to find strategies that approximate Nash equilibria. Traditionally, an approximate Nash equilibrium strategy is evaluated by measuring the strategy’s worst-case performance, or exploitability. However, because exploitability fails to capture how likely the worst-case is to be realized, it provides only a limited picture of strategy strength, and there is extensive empirical evidence showing that exploitability can correlate poorly with one-on-one performance against a variety of opponents. In this thesis, we introduce a class of adaptive opponents called pretty-good responses that exploit a strategy but only have limited exploitative power. By playing a strategy against a variety of counter-strategies created with pretty-good responses, we get a more complete picture of strategy strength than that offered by exploitability alone. In addition, we show how standard no-regret algorithms can me modified to learn strategies that are strong against adaptive opponents. We prove that this technique can produce optimal strategies for playing against pretty-good responses. We empirically demonstrate the effectiveness of the technique by finding static strategies that are strong against Monte Carlo opponents who learn by sampling our strategy, including the UCT Monte Carlo tree search algorithm.
Hide Abstract.
Solving Heads-up Limit Texas Hold'em
Authors
Oskari Tammelin, Neil Burch, Michael Johanson, and Michael Bowling
Purpose
Proceedings of the 24th International Joint Conference on Artificial Intelligence (IJCAI), 2015
Date
May 2015
Show Abstract.
Cepheus is the first computer program to essentially solve a game of imperfect information that is played competitively by humans. The game it plays is heads-up limit Texas hold’em poker, a game with over 10^14 information sets, and a challenge problem for artificial intelligence for over 10 years. Cepheus was trained using a new variant of Counterfactual Regret Minimization (CFR), called CFR+, using 4800 CPUs running for 68 days. In this paper we describe in detail the engineering details required to make this computation a reality. We also prove the theoretical soundness of CFR+ and its component algorithm, regret-matching+. We further give a hint towards understanding the success of CFR+ by proving a tracking regret bound for this new regret matching algorithm. We present results showing the role of the algorithmic components and the engineering choices to the success of CFR+.
Hide Abstract.
Decision-theoretic Clustering of Strategies
Authors
Nolan Bard, Deon Nicholas, Csaba Szepesvári, and Michael Bowling
Purpose
Proceedings of the Fourteenth International Conference on Autonomous Agents and Multi-Agent Systems (AAMAS-15)
Date
May 2015
Show Abstract.
Clustering agents by their behaviour can be crucial for building effective agent models. Traditional clustering typically aims to group entities together based on a distance metric, where a desirable clustering is one where the entities in a cluster are spatially close together. Instead, one may desire to cluster based on actionability, or the capacity for the clusters to suggest how an agent should respond to maximize their utility with respect to the entities. Segmentation problems examine this decision-theoretic clustering task. Although finding optimal solutions to these problems is computationally hard, greedy-based approximation algorithms exist. However, in settings where the agent has a combinatorially large number of candidate responses whose utilities must be considered, these algorithms are often intractable. In this work, we show that in many cases the utility function can be factored to allow for an efficient greedy algorithm even when there are exponentially large response spaces. We evaluate our technique theoretically, proving approximation bounds, and empirically using extensive-form games by clustering opponent strategies in toy poker games. Our results demonstrate that these techniques yield dramatically improved clusterings compared to a traditional distance-based clustering approach in terms of both subjective quality and utility obtained by responding to the clusters.
Hide Abstract.
Heads-up Limit Hold'em Poker is Solved
Authors
Michael Bowling, Neil Burch, Michael Johanson, and Oskari Tammelin
Purpose
Science
Date
January 2015
Show Abstract.
Poker is a family of games that exhibit imperfect information, where players do not have full knowledge of past events. Whereas many perfect-information games have been solved (e.g., Connect Four and checkers), no nontrivial imperfect-information game played competitively by humans has previously been solved. Here, we announce that heads-up limit Texas hold'em is now essentially weakly solved. Furthermore, this computation formally proves the common wisdom that the dealer in the game holds a substantial advantage. This result was enabled by a new algorithm, CFR+, which is capable of solving extensive-form games orders of magnitude larger than previously possible.
Hide Abstract.
Solving Games with Functional Regret Estimation
Authors
Kevin Waugh, Dustin Morrill, J. Andrew Bagnell, and Michael Bowling
Purpose
Proceedings of the Twenty-Ninth Conference on Artificial Intelligence (AAAI)
Date
January 2015
Show Abstract.
We propose a novel online learning method for minimizing regret in large extensive-form games. The approach learns a function approximator online to estimate the regret for choosing a particular action. A no-regret algorithm uses these estimates in place of the true regrets to define a sequence of policies. We prove the approach sound by providing a bound relating the quality of the function approximation and regret of the algorithm. A corollary being that the method is guaranteed to converge to a Nash equilibrium in self-play so long as the regrets are ultimately realizable by the function approximator. Our technique can be understood as a principled generalization of existing work on abstraction in large games; in our work, both the abstraction as well as the equilibrium are learned during self-play. We demonstrate empirically the method achieves higher quality strategies than state-of-the-art abstraction techniques given the same resources.
Hide Abstract.
Automated Abstraction of Large Action Spaces in Imperfect Information Extensive-Form Games
Authors
Johnny Hawkin
Purpose
Ph.D. Dissertation
Date
August 2014
Show Abstract.
An agent in an adversarial, imperfect information environment must sometimes decide whether or not to take an action and, if they take the action, must choose a parameter value associated with that action. Examples include choosing to buy or sell some amount of resources or choosing whether or not to move combined with a distance for that movement. This problem can be expanded to allow for mixing between multiple actions each with distinct associated parameter values and can be expressed as an imperfect information extensive form game. This dissertation describes a new automated method of abstracting the action space of such decision problems. It presents new algorithms for implementing the method, provides some theory about how the method relates to Nash equilibria in a small poker game and assesses the method using several poker games. One of these algorithms was used in the creation of an agent that won one of the divisions of the 2012 Annual Computer Poker Competition. An improvement upon this algorithm produced an action abstraction of two-player no-limit Texas hold’em poker that out-performs a state-of-the-art action abstraction while requiring less than 40% of the memory. The resulting agent had the best overall results in a round robin tournament of six top two-player no-limit Texas hold’em agents.
Hide Abstract.
Solving Imperfect Information Games Using Decomposition
Authors
Neil Burch, Michael Johanson and Michael Bowling
Purpose
Proceedings of the Twenty-Eighth Conference on Artificial Intelligence (AAAI-14)
Date
July 2014
Show Abstract.
Decomposition, i.e., independently analyzing possible subgames, has proven to be an essential principle for effective decision-making in perfect information games. However, in imperfect information games, decomposition has proven to be problematic. To date, all proposed techniques for decomposition in imperfect information games have abandoned theoretical guarantees. This work presents the first technique for decomposing an imperfect information game into subgames that can be solved independently, while retaining optimality guarantees on the full-game solution. We can use this technique to construct theoretically justified algorithms that make better use of information available at run-time, overcome memory or disk limitations at run-time, or make a time/space trade-off to overcome memory or disk limitations while solving a game. In particular, we present an algorithm for subgame solving which guarantees performance in the whole game, in contrast to existing methods which may have unbounded error. In addition, we present an offline game solving algorithm, CFR-D, which can produce a Nash equilibrium for a game that is larger than available storage.
Hide Abstract.
Using Response Functions to Measure Strategy Strength
Authors
Trevor Davis, Neil Burch, and Michael Bowling
Purpose
Proceedings of the Twenty-Eighth Conference on Artificial Intelligence (AAAI-14)
Date
July 2014
Show Abstract.
Extensive-form games are a powerful tool for representing complex multi-agent interactions. Nash equilibrium strategies are commonly used as a solution concept for extensive-form games, but many games are too large for the computation of Nash equilibria to be tractable. In these large games, exploitability has traditionally been used to measure deviation from Nash equilibrium, and thus strategies are aimed to achieve minimal exploitability. However, while exploitability measures a strategy's worst-case performance, it fails to capture how likely that worst-case is to be observed in practice. In fact, empirical evidence has shown that a less exploitable strategy can perform worse than a more exploitable strategy in one-on-one play against a variety of opponents. In this work, we propose a class of response functions that can be used to measure the strength of a strategy. We prove that standard no-regret algorithms can be used to learn optimal strategies for a scenario where the opponent uses one of these response functions. We demonstrate the effectiveness of this technique in Leduc poker against opponents that use the UCT Monte Carlo tree search algorithm.
Hide Abstract.
Asymmetric Abstractions for Adversarial Settings
Authors
Nolan Bard, Michael Johanson and Michael Bowling
Purpose
Proceedings of the Thirteenth International Conference on Autonomous Agents and Multi-Agent Systems (AAMAS)
Date
May 2014
Show Abstract.
In multiagent domains, an agent's beliefs about how other agents will or could act plays a significant role in their own behaviour. In large domains where it is infeasible to uniquely represent every possible decision an agent will face, abstraction is often used to collapse the state and action space to make the problem tractable. By abstracting other agents' views of the environment, the agent makes assumptions about how other agents act. Incorrect abstraction choices can yield less than ideal performance as other agents may, in reality, use coarser or finer abstraction than they were modelled with. The standard approach when abstracting is to use symmetric abstraction: where all agents are assumed to distinguish states in the same way. This work examines the benefits and potential pitfalls of using asymmetric abstractions in two-player zero- sum extensive-form games. Using the domain of two-player limit Texas hold'em poker, we investigate the performance of strategies using both symmetric and asymmetric abstractions in terms of in-game utility and worst-case utility in the real game. Furthermore, we show that combining asymmetric abstractions with robust counter-strategy techniques can produce counter-strategies which dominate their symmetric abstraction counterparts in terms of both exploitative power and worst-case utility.
Hide Abstract.
The Baseline Approach to Agent Evaluation
Authors
Joshua Davidson
Purpose
M. Sc. Thesis
Date
Spring 2014
Show Abstract.
Efficient, unbiased estimation of agent performance is essential for drawing statistically significant conclusions in multi-agent domains with high outcome variance. Naïve Monte Carlo estimation is often insufficient, as it can require a prohibitive number of samples, especially when evaluating slow-acting agents. Classical variance reduction techniques typically require careful encoding of domain knowledge or are intrinsically complex. In this work, we introduce the baseline method of creating unbiased estimators for zero-sum, multi-agent high-variance domains. We provide two examples of estimators created using this approach, one that leverages computer agents in self-play, and another that utilizes existing player data. We show empirically that these baseline estimators are competitive with state-of- the-art techniques for efficient evaluation in variants of computer poker, a zero-sum domain with notably high outcome variance. Additionally, we demonstrate how simple, yet effective, baseline estimators can be created and deployed in domains where efficient evaluation techniques are currently non-existent.
Hide Abstract.
Regret Minimization in Games and the Development of Champion Multiplayer Computer Poker-Playing Agents
Authors
Richard Gibson
Purpose
Ph.D. Dissertation
Date
Spring 2014
Show Abstract.
Recently, poker has emerged as a popular domain for investigating decision problems under conditions of uncertainty. Unlike traditional games such as checkers and chess, poker exhibits imperfect information, varying utilities, and stochastic events. Because of these complications, decisions at the poker table are more analogous to the decisions faced by humans in everyday life. In this dissertation, we investigate regret minimization in extensive-form games and apply our work in developing champion computer poker agents. Counterfactual Regret Minimization (CFR) is the current state-of-the-art approach to computing capable strategy profiles for large extensive-form games. Our primary focus is to advance our understanding and application of CFR in domains with more than two players. We present four major contributions. First, we provide the first set of theoretical guarantees for CFR when applied to games that are not two-player zero-sum. We prove that in such domains, CFR eliminates strictly dominated plays. In addition, we provide a modification of CFR that is both more efficient and can lead to stronger strategies than were previously possible. Second, we provide new regret bounds for CFR, present three new CFR sampling variants, and demonstrate their efficiency in several different domains. Third, we prove the first set of sufficient conditions that guarantee CFR will minimize regret in games with imperfect recall. Fourth, we generalize three previous game tree decomposition methods, present a new decomposition method, and demonstrate their improvement empirically over standard techniques. Finally, we apply the work in this thesis to construct three-player Texas hold'em agents and enter them into the Annual Computer Poker Competition. Our agents won six out of the seven three-player events that we entered from the 2010, 2011, 2012, and 2013 computer poker competitions.
Hide Abstract.
Automating Collusion Detection in Sequential Games
Authors
Parisa Mazrooei, Christopher Archibald, and Michael Bowling
Purpose
Proceedings of the Twenty-Seventh AAAI Conference on Artificial Intelligence
Date
July 2013
Show Abstract.
Collusion is the practice of two or more parties deliberately cooperating to the detriment of others. While such behavior may be desirable in certain circumstances, in many it is considered dishonest and unfair. If agents otherwise hold strictly to the established rules, though, collusion can be challenging to police. In this paper, we introduce an automatic method for collusion detection in sequential games. We achieve this through a novel object, called a collusion table, that captures the effects of collusive behavior, i.e., advantage to the colluding parties, without assuming any particular pattern of behavior. We show the effectiveness of this method in the domain of poker, a popular game where collusion is prohibited.
Hide Abstract.
Evaluating State-Space Abstractions in Extensive-Form Games
Authors
Michael Johanson, Neil Burch, Richard Valenzano, and Michael Bowling
Purpose
Proceedings of the Twelfth International Conference on Autonomous Agents and Multiagent Systems (AAMAS-13)
Date
May 2013
Show Abstract.
Efficient algorithms exist for finding optimal policies in extensive-form games. However, human-scale problems are typically so large that this computation remains infeasible with modern computing resources. State-space abstraction techniques allow for the derivation of a smaller and strategically similar abstract domain, in which an optimal strategy can be computed and then used as a suboptimal strategy in the real domain. In this paper, we consider the task of evaluating the quality of an abstraction, independent of a specific abstract strategy. In particular, we use a recent metric for abstraction quality and examine imperfect recall abstractions, in which agents "forget" previously observed information to focus the abstraction effort on more recent and relevant state information. We present experimental results in the domain of Texas hold'em poker that validate the use of distribution-aware abstractions over expectation-based approaches, demonstrate that the new metric better predicts tournament performance, and show that abstractions built using imperfect recall outperform those built using perfect recall in terms of both exploitability and one-on-one play.
Hide Abstract.
Online Implicit Agent Modelling
Authors
Nolan Bard, Michael Johanson, Neil Burch, and Michael Bowling
Purpose
Proceedings of the Twelfth International Conference on Autonomous Agents and Multiagent Systems (AAMAS-13)
Date
May 2013
Show Abstract.
The traditional view of agent modelling is to infer the explicit parameters of another agent's strategy (i.e., their probability of taking each action in each situation). Unfortunately, in complex domains with high dimensional strategy spaces, modelling every parameter often requires a prohibitive number of observations. Furthermore, given a model of such a strategy, computing a response strategy that is robust to modelling error may be impractical to compute online. Instead, we propose an implicit modelling framework where agents aim to estimate the utility of a fixed portfolio of pre-computed strategies. Using the domain of heads-up limit Texas hold'em poker, this work describes an end-to-end approach for building an implicit modelling agent. We compute robust response strategies, show how to select strategies for the portfolio, and apply existing variance reduction and online learning techniques to dynamically adapt the agent's strategy to its opponent. We validate the approach by showing that our implicit modelling agent would have won the heads-up limit opponent exploitation event in the 2011 Annual Computer Poker Competition.
Hide Abstract.
Rating Players in Games with Real-Valued Outcomes (Extended Abstract)
Authors
Christopher Archibald, Neil Burch, Michael Bowling, and Matthew Rutherford
Purpose
Proceedings of the Twelfth International Conference on Autonomous Agents and Multiagent Systems (AAMAS-13)
Date
May 2013
Show Abstract.
Game-theoretic models typically associate outcomes with real valued utilities, and rational agents are expected to maximize their expected utility. Currently fielded agent rating systems, which aim to order a population of agents by strength, focus exclusively on games with discrete outcomes, e.g., win-loss in two-agent settings or an ordering in the multi-agent setting. These rating systems are not well-suited for domains where the absolute magnitude of utility rather than just the relative value is important. We introduce the problem of rating agents in games with real-valued outcomes and survey applicable existing techniques for rating agents in this setting. We then propose a novel rating system and an extension for all of these rating systems to games with more than two agents, showing experimentally the advantages of our proposed system.
Hide Abstract.
Baseline: Practical Control Variates for Agent Evaluation in Zero-Sum Domains
Authors
Joshua Davidson, Christopher Archibald, and Michael Bowling
Purpose
Proceedings of the Twelfth International Conference on Autonomous Agents and Multiagent Systems (AAMAS-13)
Date
May 2013
Show Abstract.
Agent evaluation in stochastic domains can be difficult. The commonplace approach of Monte Carlo evaluation can involve a prohibitive number of simulations when the variance of the outcome is high. In such domains, variance reduction techniques are necessary, but these techniques require careful encoding of domain knowledge. This paper introduces baseline as a simple approach to creating low variance estimators for zero-sum multi-agent domains with high outcome variance. The baseline method leverages the self play of any available agent to produce a control variate for variance reduction, subverting any extra complexity inherent with traditional approaches. The baseline method is also applicable in situations where existing techniques either require extensive implementation overhead or simply cannot be applied. Experimental variance reduction results are shown for both cases using the baseline method. Baseline is shown to surpass state-of-the-art techniques in three-player computer poker and is competitive in two-player computer poker games. Baseline also shows variance reduction in human poker and in a mock Ad Auction tournament from the Trading Agent Competition, domains where variance reduction methods are not typically employed.
Hide Abstract.
A Parameterized Family of Equilibrium Profiles for Three-Player Kuhn Poker
Authors
Duane Szafron, Richard Gibson, and Nathan Sturtevant
Purpose
Proceedings of the Twelfth International Conference on Autonomous Agents and Multiagent Systems (AAMAS-13)
Date
May 2013
Show Abstract.
This paper presents a parameterized family of equilibrium strategy profiles for three-player Kuhn poker. This family illustrates an important feature of three-player equilibrium profiles that is not present in two-player equilibrium profiles - the ability of one player to transfer utility to a second player at the expense of the third player, while playing a strategy in the profile family. This family of strategy profiles was derived analytically and the proof that the members of this family are equilibrium profiles is an analytic one. In addition, the problem of selecting a robust strategy from an equilibrium profile is discussed.
Hide Abstract.
Measuring the Size of Large No-Limit Poker Games
Authors
Michael Johanson
Purpose
Technical Report TR13-01, Department of Computing Science, University of Alberta
Date
March 2013
Show Abstract.
In the field of computational game theory, games are often compared in terms of their size. This can be measured in several ways, including the number of unique game states, the number of decision points, and the total number of legal actions over all decision points. These numbers are either known or estimated for a wide range of classic games such as chess and checkers. In the stochastic and imperfect information game of poker, these sizes are easily computed in "limit" games which restrict the players' available actions, but until now had only been estimated for the more complicated "no-limit" variants. In this paper, we describe a simple algorithm for quickly computing the size of two-player no-limit poker games, provide an implementation of this algorithm, and present for the first time precise counts of the number of game states, information sets, actions and terminal nodes in the no-limit poker games played in the Annual Computer Poker Competition.
Hide Abstract.
Collusion Detection in Sequential Games
Authors
Parisa Mazrooei
Purpose
M. Sc. Thesis
Date
Fall 2012
Show Abstract.
Collusion is the deliberate cooperation of two or more parties to the detriment of others. While this behaviour can be highly profitable for colluders (for example, in auctions and online games), it is considered illegal and unfair in many sequential decision-making domains and presents many challenging problems in these systems. In this thesis we present an automatic collusion detection method for extensive form games. This method uses a novel object, called a collusion table, that aims to capture the effects of collusive behaviour on the utility of players without committing to any particular pattern of behaviour. We also introduce a general method for developing collusive agents which was necessary to create a dataset of labelled colluding and normal agents. The effectiveness of our collusion detection method is demonstrated experimentally. Our results show that this method provides promising accuracy, detecting collusion by both strong and weak agents.
Hide Abstract.
Efficient Monte Carlo Counterfactual Regret Minimization in Games with Many Player Actions
Authors
Richard Gibson, Neil Burch, Marc Lanctot, and Duane Szafron
Purpose
Proceedings of the Twenty-Sixth Conference on Advances in Neural Information Processing Systems (NIPS-2012)
Date
December 2012
Show Abstract.
Counterfactual Regret Minimization (CFR) is a popular, iterative algorithm for computing strategies in extensive-form games. The Monte Carlo CFR (MCCFR) variants reduce the per iteration time cost of CFR by traversing a smaller, sampled portion of the tree. The previous most effective instances of MCCFR can still be very slow in games with many player actions since they sample every action for a given player. In this paper, we present a new MCCFR algorithm, Average Strategy Sampling (AS), that samples a subset of the player's actions according to the player's average strategy. Our new algorithm is inspired by a new, tighter bound on the number of iterations required by CFR to converge to a given solution quality. In addition, we prove a similar, tighter bound for AS and other popular MCCFR variants. Finally, we validate our work by demonstrating that AS converges faster than previous MCCFR algorithms in both no-limit poker and Bluff.
Hide Abstract.
Finding Optimal Abstract Strategies in Extensive-Form Games
Authors
Michael Johanson, Nolan Bard, Neil Burch, and Michael Bowling
Purpose
Proceedings of the Twenty-Sixth Conference on Artificial Intelligence (AAAI-12)
Date
July 2012
Show Abstract.
Extensive-form games are a powerful model for representing interactions between agents. Nash equilibrium strategies are a common solution concept for extensive-form games and, in two-player zero-sum games, there are efficient algorithms for calculating such strategies. In large games, this computation may require too much memory and time to be tractable. A standard approach in such cases is to apply a lossy state-space abstraction technique to produce a smaller abstract game that can be tractably solved, while hoping that the resulting abstract game equilibrium is close to an equilibrium strategy in the unabstracted game. Recent work has shown that this assumption is unreliable, and an arbitrary Nash equilibrium in the abstract game is unlikely to be even near the least suboptimal strategy that can be represented in that space. In this work, we present for the first time an algorithm which efficiently finds optimal abstract strategies -- strategies with minimal exploitability in the unabstracted game. We use this technique to find the least exploitable strategy ever reported for two-player limit Texas hold'em.
Hide Abstract.
Using Sliding Windows to Generate Action Abstractions in Extensive-Form Games
Authors
John Hawkin, Robert C. Holte, and Duane Szafron
Purpose
Proceedings of the Twenty-Sixth Conference on Artificial Intelligence (AAAI-12)
Date
July 2012
Show Abstract.
In extensive-form games with a large number of actions, careful abstraction of the action space is critically important to performance. In this paper we extend previous work on action abstraction using no-limit poker games as our test domains. We show that in such games it is no longer necessary to choose, a priori, one specific range of possible bet sizes. We introduce an algorithm that adjusts the range of bet sizes considered for each bet individually in an iterative fashion. This flexibility results in a substantially improved game value in no-limit Leduc poker. When applied to no-limit Texas Hold'em our algorithm produces an action abstraction that is about one third the size of a state of the art hand-crafted action abstraction, yet has a better overall game value.
Hide Abstract.
Generalized Sampling and Variance in Counterfactual Regret Minimization
Authors
Richard Gibson, Marc Lanctot, Neil Burch, Duane Szafron, and Michael Bowling
Purpose
Proceedings of the Twenty-Sixth Conference on Artificial Intelligence (AAAI-12)
Date
July 2012
Show Abstract.
In large extensive form games with imperfect information, Counterfactual Regret Minimization (CFR) is a popular, iterative algorithm for computing approximate Nash equilibria. While the base algorithm performs a full tree traversal on each iteration, Monte Carlo CFR (MCCFR) reduces the per iteration time cost by traversing just a sampled portion of the tree. On the other hand, MCCFR's sampled values introduce variance, and the effects of this variance were previously unknown. In this paper, we generalize MCCFR by considering any generic estimator of the sought values. We show that any choice of an estimator can be used to probabilistically minimize regret, provided the estimator is bounded and unbiased. In addition, we relate the variance of the estimator to the convergence rate of an algorithm that calculates regret directly from the estimator. We demonstrate the application of our analysis by defining a new bounded, unbiased estimator with empirically lower variance than MCCFR estimates. Finally, we use this estimator in a new sampling algorithm to compute approximate equilibria in Goofspiel, Bluff, and Texas hold'em poker. Under each of our selected sampling schemes, our new algorithm converges faster than MCCFR.
Hide Abstract.
Efficient Nash Equilibrium Approximation through Monte Carlo Counterfactual Regret Minimization
Authors
Michael Johanson, Nolan Bard, Marc Lanctot, Richard Gibson, and Michael Bowling
Purpose
Autonomous Agents and Multiagent Systems 2012 (AAMAS-12)
Date
June 2012
Show Abstract.
Recently, there has been considerable progress towards algorithms for approximating Nash equilibrium strategies in extensive games. One such algorithm, Counterfactual Regret Minimization (CFR), has proven to be effective in two-player zero-sum poker domains. While the basic algorithm is iterative and performs a full game traversal on each iteration, sampling based approaches are possible. For instance, chance-sampled CFR considers just a single chance outcome per traversal, resulting in faster but less precise iterations. While more iterations are required, chance-sampled CFR requires less time overall to converge. In this work, we present new sampling techniques that consider sets of chance outcomes during each traversal to produce slower, more accurate iterations. By sampling only the public chance outcomes seen by all players, we take advantage of the imperfect information structure of the game to (i) avoid recomputation of strategy probabilities, and (ii) achieve an algorithmic speed improvement, performing O(n^2) work at terminal nodes in O(n) time. We demonstrate that this new CFR update converges more quickly than chance-sampled CFR in the large domains of poker and Bluff.
Hide Abstract.
On Strategy Stitching in Large Extensive Form Multiplayer Games
Authors
Richard Gibson and Duane Szafron
Purpose
Proceedings of the Twenty-Fifth Conference on Neural Information Processing Systems (NIPS 2011)
Date
December 2011
Show Abstract.
Computing a good strategy in a large extensive form game often demands an extraordinary amount of computer memory, necessitating the use of abstraction to reduce the game size. Typically, strategies from abstract games perform better in the real game as the granularity of abstraction is increased. This paper investigates two techniques for stitching a base strategy in a coarse abstraction of the full game tree, to expert strategies in ﬁne abstractions of smaller subtrees. We provide a general framework for creating static experts, an approach that generalizes some previous strategy stitching efforts. In addition, we show that static experts can create strong agents for both 2-player and 3-player Leduc and Limit Texas Hold'em poker, and that a speciﬁc class of static experts can be preferred among a number of alternatives. Furthermore, we describe a poker agent that used static experts and won the 3-player events of the 2010 Annual Computer Poker Competition.
Hide Abstract.
Automated Action Abstraction of Imperfect Information Extensive-Form Games
Authors
John Hawkin, Robert Holte, and Duane Szafron
Purpose
Proceedings of the Twenty-Fifth Conference on Artificial Intelligence (AAAI-11)
Date
August 2011
Show Abstract.
Multi-agent decision problems can often be formulated as extensive-form games. We focus on imperfect information extensive-form games in which one or more actions at many decision points have an associated continuous or many-valued parameter. A stock trading agent, in addition to deciding whether to buy or not, must decide how much to buy. In no-limit poker, in addition to selecting a probability for each action, the agent must decide how much to bet for each betting action. Selecting values for these parameters makes these games extremely large. Two-player no-limit Texas Hold'em poker with stacks of 500 big blinds has approximately 10^71 states, which is more than 10^50 times more states than two-player limit Texas Hold'em. The main contribution of this paper is a technique that abstracts a game's action space by selecting one, or a small number, of the many values for each parameter. We show that strategies computed using this new algorithm for no-limit Leduc poker exhibit significant utility gains over epsilon-Nash equilibrium strategies computed with standard, hand-crafted parameter value abstractions.
Hide Abstract.
Accelerating Best Response Calculation in Large Extensive Games
Authors
Michael Johanson, Kevin Waugh, Michael Bowling, and Martin Zinkevich
Purpose
Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence (IJCAI-11)
Date
July 2011
Show Abstract.
One fundamental evaluation criteria of an AI technique is its performance in the worst-case. For static strategies in extensive games, this can be computed using a best response computation. Conventionally, this requires a full game tree traversal. For very large games, such as poker, that traversal is infeasible to perform on modern hardware. In this paper, we detail a general technique for best response computations that can often avoid a full game tree traversal. Additionally, our method is specifically well-suited for parallel environments. We apply this approach to computing the worst-case performance of a number of strategies in heads-up limit Texas hold'em, which, prior to this work, was not possible. We explore these results thoroughly as they provide insight into the effects of abstraction on worst-case performance in large imperfect information games. This is a topic that has received much attention, but could not previously be examined outside of toy domains.
Hide Abstract.
Using Counterfactual Regret Minimization to Create Competitive Multiplayer Poker Agents
Authors
Nick Abou Risk and Duane Szafron
Purpose
Autonomous Agents and Multiagent Systems 2010 (AAMAS-10)
Date
May 2010
Show Abstract.
Games are used to evaluate and advance Multiagent and Artificial Intelligence techniques. Most of these games are deterministic with perfect information (e.g. Chess and Checkers). A deterministic game has no chance element and in a perfect information game, all information is visible to all players. However, many real-world scenarios with competing agents are stochastic (non-deterministic) with imperfect information. For two-player zero-sum perfect recall games, a recent technique called Counterfactual Regret Minimization (CFR) computes strategies that are provably convergent to an epsilon-Nash equilibrium. A Nash equilibrium strategy is useful in two-player games since it maximizes its utility against a worst-case opponent. However, for multiplayer (three or more player) games, we lose all theoretical guarantees for CFR. However, we believe that CFR-generated agents may perform well in multiplayer games. To test this hypothesis, we used this technique to create several 3-player limit Texas Hold'em poker agents and two of them placed first and second in the 3-player event of the 2009 AAAI/IJCAI Computer Poker Competition. We also demonstrate that good strategies can be obtained by grafting sets of two-player subgame strategies to a 3-player base strategy after one of the players is eliminated.
Hide Abstract.
Strategy Grafting in Extensive Games
Authors
Kevin Waugh, Nolan Bard, and Michael Bowling
Purpose
Advances in Neural Information Processing Systems 22 (NIPS-09)
Date
December 2009
Show Abstract.
Extensive games are often used to model the interactions of multiple agents within an environment. Much recent work has focused on increasing the size of an extensive game that can be feasibly solved. Despite these improvements, many interesting games are still too large for such techniques. A common approach for computing strategies in these large games is to first employ an abstraction technique to reduce the original game to an abstract game that is of a manageable size. This abstract game is then solved and the resulting strategy is played in the original game. Most top programs in recent AAAI Computer Poker Competitions use this approach. The trend in this competition has been that strategies found in larger abstract games tend to beat strategies found in smaller abstract games. These larger abstract games have more expressive strategy spaces and therefore contain better strategies. In this paper we present a new method for computing strategies in large games. This method allows us to compute more expressive strategies without increasing the size of abstract games that we are required to solve. We demonstrate the power of the approach experimentally in both small and large games, while also providing a theoretical justification for the resulting improvement.
Hide Abstract.
State Translation in No-Limit Poker
Authors
David Paul Schnizlein
Purpose
M. Sc. Thesis
Date
Fall 2009
Show Abstract.
One way to create a champion level poker agent is to compute a Nash Equilibrium in an abstract version of the poker game. The resulting strategy is then used to play in the full game. With this approach, translation is required between the full and abstract games in order to use the abstract strategy. In limit poker this translation step is defined when the abstraction is chosen. However, when considering no-limit poker the translation process becomes more complicated. We formally describe the process of translation and investigate its consequences. We examine how the current method, hard translation, can result in exploitable agents and introduce a new probabilistic method, soft translation, that produces more robust players. We also investigate how switching between strategies with different underlying abstractions affects the performance of an agent.
Hide Abstract.
Using Counterfactual Regret Minimization to Create a Competitive Multiplayer Poker Agent
Authors
Nicholas Abou Risk
Purpose
M. Sc. Thesis
Date
Fall 2009
Show Abstract.
Games have been used to evaluate and advance techniques in the field of Artificial Intelligence since before computers were invented. Many of these games have been deterministic perfect information games (e.g. Chess and Checkers). A deterministic game has no chance element and in a perfect information game, all information is visible to all players. However, many real-world scenarios involving competing agents can be more accurately modeled as stochastic (non-deterministic), imperfect information games, and this dissertation investigates such games. Poker is one such game played by millions of people around the world; it will be used as the testbed of the research presented in this dissertation. For a specific set of games, two-player zero-sum perfect recall games, a recent technique called Counterfactual Regret Minimization (CFR) computes strategies that are provably convergent to an -Nash equilibrium. A Nash equilibrium strategy is very useful in two-player games as it maximizes its utility against a worst-case opponent. However, once we move to multiplayer games, we lose all theoretical guarantees for CFR. Furthermore, we have no theoretical guarantees about the performance of a strategy from a multiplayer Nash equilibrium against two arbitrary opponents. Despite the lack of theoretical guarantees, my thesis is that CFR-generated agents may perform well in multiplayer games. I created several 3-player limit Texas Hold\u2019em Poker agents and the results of the 2009 Computer Poker Competition demonstrate that these are the strongest 3-player computer Poker agents in the world. I also contend that a good strategy can be obtained by grafting a set of two-player subgame strategies to a 3-player base strategy when one of the players is eliminated.
Hide Abstract.
Abstraction In Large Extensive Games
Authors
Kevin Waugh
Purpose
M. Sc. Thesis
Date
Fall 2009
Show Abstract.
Extensive games model scenarios where multiple agents interact with a potentially stochastic environment. In the case of two-player zero-sum games, we have efficient techniques for computing solutions. These solutions are optimal in that they guarantee the player the highest expected reward against a worst-case opponent. Unfortunately, there are many interesting two-player zero-sum games that are too large for the state-of-the-art solvers. For example, heads-up Texas Hold\u2019em has 10^18 game states and requires over two petabytes of storage to record a single strategy1. A popular approach for tackling these large games is to use an abstraction technique to create a smaller game that models the original game. A solution to the smaller abstract game can be computed and is presumed good in the original game. A common assumption is that the more accurately the abstract game models the original game the stronger the resulting strategy will be. There is substantial empirical evidence to support this assumption and, prior to this work, it had not been questioned. In this thesis, we show that this assumption is not true. That is, using a larger, more accurate, abstract game can lead to weaker strategies. To show this, we must first formalize what it means to abstract a game as well as what it means for an abstraction to be bigger or more informed. We then show that the assumption fails to hold under two criteria for evaluating a strategy. The first is exploitability, which measures performance against a worst-case adversary. The second is a new evaluation criterion called the domination value, which essentially measures how many mistakes (actions that one should never take) a strategy makes. Despite these abstraction pathologies, we argue that solutions to the larger abstract games tend to make fewer mistakes. This leads us to develop a new technique, called strategy grafting, that can be used to augment a strong base strategy. It does this by decomposing an extensive game into multiple sub-games. These sub-games are then solved separately, but with shared knowledge of the underlying base strategy. This technique allows us to make effective use of much larger strategy spaces than previously possible. We provide a weak theoretical guarantee on the quality of this new technique, but we show in practice that it does quite well even when the conditions on our theoretical results do not hold. Furthermore, when evaluated under our new criterion, we notice that grafted strategies tend to make fewer mistakes than their base strategy.
Hide Abstract.
Probabilistic State Translation in Extensive Games with Large Action Sets
Authors
David Schnizlein, Michael Bowling, and Duane Szafron.
Purpose
Proceedings of the 2009 International Joint Conference on Artificial Intelligence (IJCAI)
Date
2009
Show Abstract.
Equilibrium or near-equilibrium solutions to very large extensive form games are often computed by using abstractions to reduce the game size. A common abstraction technique for games with a large number of available actions is to restrict the number of legal actions in every state. This method has been used to discover equilibrium solutions for the game of no-limit heads-up Texas Hold'em. When using a solution to an abstracted game to play one side in the un-abstracted (real) game, the real opponent actions may not correspond to actions in the abstracted game. The most popular method for handling this situation is to translate opponent actions in the real game to the closest legal actions in the abstracted game. We show that this approach can result in a very exploitable player and propose an alternative solution. We use probabilistic mapping to translate a real action into a probability distribution over actions, whose weights are determined by a similarity metric. We show that this approach significantly reduces the exploitability when using an abstract solution in the real game.
Hide Abstract.
A Practical Use of Imperfect Recall
Authors
Kevin Waugh, Martin Zinkevich, Michael Johanson, Morgan Kan, David Schnizlein, and Michael Bowling.
Purpose
Proceedings of the Eighth Symposium on Abstraction, Reformulation and Approximation (SARA)
Date
2009
Show Abstract.
Perfect recall is the common and natural assumption that an agent never forgets. As a consequence, the agent can always condition its choice of action on any prior observations. In this paper, we explore relaxing this assumption. We observe the negative impact this relaxation has on algorithms: some algorithms are no longer well-defined, while others lose their theoretical guarantees on the quality of a solution. Despite these disadvantages, we show that removing this restriction can provide considerable empirical advantages when modeling extremely large extensive games. In particular, it allows fine granularity of the most relevant observations without requiring decisions to be contingent on all past observations. In the domain of poker, this improvement enables new types of information to be used in the abstraction. By making use of imperfect recall and new types of information, our poker program was able to win the limit equilibrium event as well as the no-limit event at the 2008 AAAI Computer Poker Competition. We show experimental results to verify that our programs using imperfect recall are indeed stronger than their perfect recall counterparts.
Hide Abstract.
Data Biased Robust Counter Strategies
Authors
Michael Johanson, Michael Bowling.
Purpose
Proceedings of the Twelfth International Conference on Artificial Intelligence and Statistics (AISTATS)
Date
2009
Show Abstract.
The problem of exploiting information about the environment while still being robust to inaccurate or incomplete information arises in many domains. Competitive imperfect information games where the goal is to maximally exploit an unknown opponent's weaknesses are an example of this problem. Agents for these games must balance two objectives. First, they should aim to exploit data from past interactions with the opponent, seeking a best-response counter strategy. Second, they should aim to minimize losses since the limited data may be misleading or the opponent's strategy may have changed, suggesting an opponent-agnostic Nash equilibrium strategy. In this paper, we show how to partially satisfy both of these objectives at the same time, producing strategies with favourable tradeoffs between the ability to exploit an opponent and the capacity to be exploited. Like a recently published technique, our approach involves solving a modified game; however the result is more generally applicable and even performs well in situations with very limited data. We evaluate our technique in the game of two-player, Limit Texas Hold'em.
Hide Abstract.
Abstraction Pathologies in Extensive Games
Authors
Kevin Waugh, David Schnizlein, Michael Bowling, and Duane Szafron.
Purpose
Autonomous Agents and Multiagent Systems 2009 (AAMAS-09)
Date
2009
Show Abstract.
Extensive games can be used to model many scenarios in which multiple agents interact with an environment. There has been considerable recent research on finding strong strategies in very large, zero-sum extensive games. The standard approach in such work is to employ abstraction techniques to derive a more tractably sized game. An extensive game solver is then employed to compute an equilibrium in that abstract game, and the resulting strategy is presumed to be strong in the full game. Progress in this line of research has focused on solving larger abstract games, which more closely resemble the full game. However, there is an underlying assumption that by abstracting less, and solving a larger game, an agent will have a stronger strategy in the full game. In this work we show that this assumption is not true in general. Refining an abstraction can actually lead to a weaker strategy. We show examples of these abstraction pathologies in a small game of poker that can be analyzed exactly. These examples show that pathologies arise when abstracting both chance nodes as well as a player's actions. In summary, this paper shows that the standard approach to finding strong strategies for large extensive games rests on shaky ground.
Hide Abstract.
Strategy Evaluation in Extensive Games with Importance Sampling
Authors
Michael Bowling, Michael Johanson, Neil Burch and Duane Szafron
Purpose
Proceedings of the 25th Annual International Conference on Machine Learning (ICML)
Date
2008
Show Abstract.
Typically agent evaluation is done through Monte Carlo estimation. However, stochastic agent decisions and stochastic outcomes can make this approach inefficient, requiring many samples for an accurate estimate. We present a new technique that can be used to simultaneously evaluate many strategies while playing a single strategy in the context of an extensive game. This technique is based on importance sampling, but utilizes two new mechanisms for significantly reducing variance in the estimates. We demonstrate its effectiveness in the domain of poker, where stochasticity makes traditional evaluation problematic.
Hide Abstract.
Using State Estimation for Dynamic Agent Modelling
Authors
Nolan Bard
Purpose
M. Sc. Thesis
Date
March 2008
Show Abstract.
Agent modelling is a challenging problem in many modern artificial intelligence applications. The agent modelling task is especially difficult when handling stochastic choices, deliberately hidden information, dynamic agents, and the need for fast learning. State estimation techniques, such as Kalman filtering and particlefiltering, have addressed many of these challenges, but have received little attention in the agent modelling literature. This thesis explores the use of particle filtering for modelling dynamic opponents in Kuhn poker, a simplified version of Texas Hold'em poker. We demonstrate effective modelling both against static opponents as well as dynamic opponents, when the dynamics are known. We then examine an application of Rao-Blackwellized particle filtering for doing dual estimation, inferring both the opponents state and a model of its dynamics. Finally, we examine the robustness of the approach to incorrect beliefs about the opponent and compare it to previous work on opponent modelling in Kuhn poker.
Hide Abstract.
Regret Minimization in Games with Incomplete Information
Authors
Martin Zinkevich, Michael Johanson, Michael Bowling, Carmelo Piccione
Purpose
Advances in Neural Information Processing Systems 20 (NIPS)
Date
2007
Show Abstract.
Extensive games are a powerful model of multiagent decision-making scenarios with incomplete information. Finding a Nash equilibrium for very large instances of these games has received a great deal of recent attention. In this paper, we describe a new technique for solving large games based on regret minimization. In particular, we introduce the notion of counterfactual regret, which exploits the degree of incomplete information in an extensive game. We show how minimizing counterfactual regret minimizes overall regret, and therefore in self-play can be used to compute a Nash equilibrium. We demonstrate this technique in the domain of poker, showing we can solve abstractions of limit Texas Hold'em with as many as 10^12 states, two orders of magnitude larger than previous methods.
Hide Abstract.
Computing Robust Counter-Strategies
Authors
Michael Johanson, Martin Zinkevich, Michael Bowling.
Purpose
Advances in Neural Information Processing Systems 20 (NIPS)
Date
2007
Show Abstract.
Adaptation to other initially unknown agents often requires computing an effective counter-strategy. In the Bayesian paradigm, one must find a good counter-strategy to the inferred posterior of the other agents' behavior. In the experts paradigm, one may want to choose experts that are good counter-strategies to the other agents' expected behavior. In this paper we introduce a technique for computing {\em robust} counter-strategies for adaptation in multiagent scenarios under a variety of paradigms. The strategies can take advantage of a suspected tendency in the decisions of the other agents, while bounding the worst-case performance when the tendency is not observed. The technique involves solving a modified game, and therefore can make use of recently developed algorithms for solving very large extensive games. We demonstrate the effectiveness of the technique in two-player Texas Hold'em. We show that the computed poker strategies are substantially more robust than best response counter-strategies, while still exploiting a suspected tendency. We also compose the generated strategies in an experts algorithm showing a dramatic improvement in performance over using simple best responses.
Hide Abstract.
Robust Strategies and Counter-Strategies: Building a Champion Level Computer Poker Player
Authors
Michael Johanson
Purpose
M. Sc. Thesis
Date
October 2007
Show Abstract.
Poker is a challenging game with strong human and computer players. In this thesis, we will explore four approaches towards creating a computer program capable of challenging these poker experts. The first approach is to approximate a Nash equilibrium strategy which is robust against any opponent. The second approach is to find an exploitive counter-strategy to an opponent. We will show that these counter-strategies are brittle: they can lose to arbitrary other opponents. The third approach is a compromise of the first two, to find robust counter-strategies. The four approach is to combine several of these agents into a team, and learn during a game which to use. As proof of the value of these techniques, we have used the resulting poker programs to win an event in the 2007 AAAI Computer Poker Competition and play competitively against two human poker professionals in the First Man-Machine Poker Championship.
Hide Abstract.
Particle Filtering for Dynamic Agent Modelling in Simplified Poker
Authors
Nolan Bard and Michael Bowling
Purpose
Proceedings of the Twenty-Second Conference on Artificial Intelligence (AAAI)
Date
2007
Show Abstract.
Agent modelling is a challenging problem in many modern artificial intelligence applications. The agent modelling task is especially difficult when handling stochastic choices, deliberately hidden information, dynamic agents, and the need for fast learning. State estimation techniques, such as Kalman filtering and particle filtering, have addressed many of these challenges, but have received little attention in the agent modelling literature. This paper looks at the use of particle filtering for modelling a dynamic opponent in Kuhn poker, a simplified version of Texas Hold'em poker. We demonstrate effective modelling both against static opponents as well as dynamic opponents, when the dynamics are known. We then examine an application of Rao-Blackwellized particle filtering for doing dual estimation, inferring both the opponent's state as well as a model of its dynamics. Finally, we examine the robustness of the approach to incorrect beliefs about the opponent and compare it to previous work on opponent modelling in Kuhn poker.
Hide Abstract.
A New Algorithm for Generating Equilibria in Massive Zero-Sum Games
Authors
Martin Zinkevich, Michael Bowling, and Neil Burch
Purpose
Proceedings of the Twenty-Second Conference on Artificial Intelligence (AAAI)
Date
2007
Show Abstract.
In normal scenarios, computer scientists often consider the number of states in a game to capture the difficulty of learning an equilibrium. However, players do not see games in the same light: most consider Go or Chess to be more complex than Monopoly. In this paper, we discuss a new measure of game complexity that links existing state-of-the-art algorithms for computing approximate equilibria to a more human measure. In particular, we consider the range of skill in a game, i.e. how many different skill levels exist. We then modify existing techniques to design a new algorithm to compute approximate equilibria whose performance can be captured by this new measure. We use it to develop the first near Nash equilibrium for a four round abstraction of poker, and show that it would have been able to win handily the bankroll competition from last year's AAAI poker competition.
Hide Abstract.
Postgame Analysis of Poker Decisions
Authors
Morgan Kan
Purpose
M. Sc. Thesis
Date
January 2007
Show Abstract.
In Artificial Intelligence research, evaluation is a recurring theme. A newly crafted game-playing program is interesting if it can be shown to be better by some measure. Usually, this is done by running a series of matches against other programs to generate a statistically significant result. In some games, particularly ones with imperfect information or stochastic elements, it may be infeasible to play enough games to achieve statistical significance. A new tool called DIVAT is presented for analyzing the quality of decisions in the game of poker. Instead of using the net monetary outcome of each game to evaluate the player, the tool focuses on the skill elements of the game. The expected value of the player's actions are compared to a reasonable baseline policy. The effect of stochasticity and imperfect information is greatly reduced resulting in an unbiased low-variance estimator for the game of poker.
Hide Abstract.
Algorithms and Assessment in Computer Poker
Authors
Darse Billings
Purpose
Ph.D. Dissertation
Date
September 2006
Show Abstract.
The game of poker offers a clean well-defined domain in which to investigate some truly fundamental issues in computing science, such as how to handle deliberate misinformation, and how to make intelligent guesses based on partial knowledge. In the taxonomy of games, poker lies at the opposite end of the spectrum from well-studied board games like checkers and chess. Poker is a multi-player game with stochasticity (random events occurring over a known probability distribution), imperfect information (some information is hidden from each player), and partially observable outcomes (some information might never be known). Consequently, the use of deception, opponent modeling, and coping with uncertainty are indispensable elements of high-level strategic play. Traditional methods for computer game-playing are incapable of handling these properties. Writing a program to play a skillful game of poker is a challenging proposition from an engineering perspective as well, since each decision must be made quickly (typically about one second). A major theme of this dissertation is the evolution of architectures for poker-playing programs that has occurred since the research began in 1992. Four distinct approaches we address are: knowledge-based systems, simulation, game-theoretic methods, and adaptive imperfect information game-tree search. Each phase of the research explored the strengths and weaknesses of the corresponding architectures, both in theory and in practice. The important problem of obtaining an accurate assessment of performance is also addressed. The creation of a powerful tool for this purpose, called DIVAT, is discussed in detail. The academic papers arising from each of these studies constitute the core chapters of this thesis. The conclusion discusses some of the advantages and shortcomings of each approach, along with the most valuable lessons learned in the process. Although the goal of surpassing all human players has not yet been attained, the path has been cleared. The best poker programs are beginning to pose a serious threat, even in this most ``human'' of games. As programs continue to improve, they provide new insights into poker strategy, and valuable lessons on how to handle various forms of uncertainty.
Hide Abstract.
Optimal Unbiased Estimators For Evaluating Agent Performance
Authors
Martin Zinkevich, Michael Bowling, Nolan Bard, Morgan Kan, and Darse Billings.
Purpose
Proceedings of the Twenty-First National Conference on Artificial Intelligence (AAAI)
Date
2006
Show Abstract.
Evaluating the performance of an agent or group of agents can be, by itself, a very challenging problem. The stochastic nature of the environment plus the stochastic nature of agents' decisions can result in estimates with intractably large variances. This paper examines the problem of finding low variance estimates of agent performance. In particular, we assume that some agent-environment dynamics are known, such as the random outcome of drawing a card or rolling a die. Other dynamics are unknown, such as the reasoning of a human or other black-box agent. Using the known dynamics, we describe the complete set of all unbiased estimators, that is, for any possible unknown dynamics the estimate's expectation is always the agent's expected utility. Then, given a belief about the unknown dynamics, we identify the unbiased estimator with minimum variance. If the belief is correct our estimate is optimal, and if the belief is wrong it is at least unbiased. Finally, we apply our unbiased estimator to the game of poker, demonstrating dramatically reduced variance and faster evaluation.
Hide Abstract.
A Tool for the Direct Assessment of Poker Decisions
Authors
Darse Billings, Morgan Kan
Purpose
The International Computer Games Association Journal, vol 29(3), pp. 119-142, October 2006
Date
2006
Show Abstract.
The element of luck permeates every aspect of the game of poker. Random stochastic outcomes introduce a large amount of noise that can make it very difficult to distinguish a good player from a bad one, much less trying to quantify small differences in skill. Good methods for directly assessing the quality of decisions exist for other stochastic games, including backgammon and blackjack. However, those are perfect information games, where the notion of an objectively best move is well-defined, which unfortunately is not the case for imperfect information games, in general. The Ignorant Value Assessment Tool, DIVAT, uses perfect knowledge (hindsight) analysis to quantify the value of each player decision made in a game of two-player Limit Texas Hold'em. Comparisons are made against a realistic baseline betting sequence, which is based on quasi-equilibrium policies and game-theoretic invariant frequencies for raising and folding. Much of the relevant context involved in each decision is simply ignored, out of necessity; but enough is retained to provide a reasonably accurate yardstick, which is then applied equally to all players. The frequency and magnitude of expected value differences between player actions, relative to the baseline, provides a statistically unbiased low-variance estimate of the long-term expected outcome between the players.
Hide Abstract.
The Effectiveness of Opponent Modelling in a Small Imperfect Information Game
Authors
Bret Hoehn
Purpose
M. Sc. Thesis
Date
Spring 2006
Show Abstract.
Opponent modelling is an important issue in games programming today. Programs which do not perform opponent modelling are unlikely to take full advantage of the mistakes made by an opponent. Additionally, programs which do not adapt over time become less of a challenge to players, causing these players to lose interest. While opponent modelling can be a difficult challenge in perfect information games, where the full state of the game is known to all players at all times, it becomes an even more difficult task in games of imperfect information, where players are not always able to observe the actual state of the game. This thesis studies the problem of opponent modelling in Kuhn Poker, a small imperfect information game that contains several properties that make real-world poker games interesting. Two basic types of opponent modelling are studied, explicit modelling and implicit modelling, and their effectiveness is compared.
Hide Abstract.
Opponent Modelling and Search in Poker
Authors
Terence Schauenberg
Purpose
M. Sc. Thesis
Date
2006
Show Abstract.
Poker is a challenging domain that contains both elements of chance and imperfect information. Though progress has been made in the domain, there is still one major stumbling block on the way to creating a world-class calibre computer player. This is the task of learning how an opponent plays (i.e., opponent modelling) and subsequently coming up with a counter-strategy that can exploit that information. The work in this thesis explores this task. A program is implemented that models the opponent through game play and then plans an exploitive counter-strategy using expectimax search. This program is evaluated using two different heads-up limit poker variations: a small-scale variation called Leduc Hold'em, and a full-scale one called Texas Hold'em.
Hide Abstract.
Effective Short-Term Opponent Exploitation in Simplified Poker
Authors
Bret Hoehn, Finnegan Southey, Robert C. Holte, and Valeriy Bulitko
Purpose
AAAI'05
Date
2005
Show Abstract.
Uncertainty in poker stems from two key sources, the shuffled deck and an adversary whose strategy is unknown. One approach is to find a pessimistic game theoretic solution (i.e. a Nash equilibrium), but human players have idiosyncratic weaknesses that can be exploited if a model of their strategy can be learned by observing their play. However, games against humans last for at most a few hundred hands so learning must be fast to be effective. We explore two approaches to opponent modelling in the context of Kuhn poker, a small game for which game theoretic solutions are known. Parameter estimation and expert algorithms are both studied. Experiments demonstrate that, even in this small game, convergence to maximally exploitive solutions in a small number of hands is impractical, but that good (i.e. better than Nash or breakeven) performance can be achieved in a short period of time. Finally, we show that amongst a set of strategies with equal game theoretic value, in particular the set of Nash equilibrium strategies, some are preferable because they speed learning of the opponent's strategy by exploring it more effectively.
Hide Abstract.
Bayes' Bluff: Opponent Modelling in Poker
Authors
Finnegan Southey, Michael Bowling, Bryce Larson, Carmelo Piccione, Neil Burch, Darse Billings, and Chris Rayner
Purpose
Proceedings of the Twenty-First Conference on Uncertainty in Artificial Intelligence (UAI)
Date
2005
Show Abstract.
Poker is a challenging problem for artcial intelligence, with non-deterministic dynamics, partial observability, and the added diculty of unknown adversaries. Modelling all of the uncertainties in this domain is not an easy task. In this paper we present a Bayesian probabilistic model for a broad class of poker games, separating the uncertainty in the game dynamics from the uncertainty of the opponent's strategy. We then describe approaches to two key subproblems: (i) inferring a posterior over opponent strategies given a prior distribution and observations of their play, and (ii) playing an appropriate response to that distribution. We demonstrate the overall approach on a reduced version of poker using Dirichlet priors and then on the full game of Texas hold'em using a more informed prior. We demonstrate methods for playing effective responses to the opponent, based on the posterior.
Hide Abstract.
Game tree search with adaptation in stochastic imperfect information games
Authors
Darse Billings, Aaron Davidson, Terence Schauenberg, Neil Burch, Michael Bowling, Robert Holte, Jonathan Schaeffer, and Duane Szafron
Purpose
Computers and Games (CG)
Date
2004
Show Abstract.
Building a high-performance poker-playing program is a challenging project. The best program to date, PsOpti, uses game theory to solve a simplified version of the game. Although the program plays reasonably well, it is oblivious to the opponent's weaknesses and biases. Modeling the opponent to exploit predictability is critical to success at poker. This paper introduces Vexbot, a program that uses a game tree search algorithm to compute the expected value of each betting option, and does real-time opponent modeling to improve its evaluation function estimates. The result is a program that defeats PsOpti convincingly, and poses a much tougher challenge for strong human players.
Hide Abstract.
Approximating Game-Theoretic Optimal Strategies for Full-scale Poker
Authors
Darse Billings, Neil Burch, Aaron Davidson, Robert Holte, Jonathan Schaeffer, Terence Schauenberg, and Duane Szafron
Purpose
Proceedings of the 2003 International Joint Conference on Artificial Intelligence
Date
2003
Show Abstract.
The computation of the first complete approximations of game-theoretic optimal strategies for full-scale poker is addressed. Several abstraction techniques are combined to represent the game of 2-player Texas Hold'em, having size O(10^18), using closely related models each having size O(10^7). Despite the reduction in size by a factor of 100 billion, the resulting models retain the key properties and structure of the real game. Linear programming solutions to the abstracted game are used to create substantially improved poker-playing programs, able to defeat strong human players and be competitive against world-class opponents.
Hide Abstract.
The Challenge of Poker
Authors
Darse Billings, Aaron Davidson, Jonathan Schaeffer, and Duane Szafron
Purpose
Artificial Intelligence Journal, vol 134(1-2)
Date
2002
Show Abstract.
Poker is an interesting test-bed for artificial intelligence research. It is a game of imperfect information, where multiple competing agents must deal with probabilistic knowledge, risk assessment, and possible deception, not unlike decisions made in the real world. Opponent modeling is another difficult problem in decision-making applications, and it is essential to achieving high performance in poker. This paper describes the design considerations and architecture of the poker program Poki. In addition to methods for hand evaluation and betting strategy, Poki uses learning techniques to construct statistical models of each opponent, and dynamically adapts to exploit observed patterns and tendencies. The result is a program capable of playing reasonably strong poker, but there remains considerable research to be done to play at a world-class level.
Hide Abstract.
Opponent Modeling in Poker: Learning and Acting in a Hostile and Uncertain Environment
Authors
Aaron Davidson
Purpose
M. Sc. Thesis
Date
2002
Show Abstract.
Artificial intelligence research has had great success in many classic games such as chess, checkers, and othello. In these perfect-information domains, alpha-beta search is sufficient to achieve a high level of play. However, Artificial intelligence research has long been criticized for focusing on deterministic domains of perfect information -- many problems in the real world exhibit properties of imperfect or incomplete information and non-determinism. Poker, the archetypal game studied by game-theorists, is a challenging domain containing both imperfect information and non-deterministic elements. The random shuffling of cards provides a great deal of uncertainty. The cards held by an opponent or in the reamining deck provide the imperfect information. A poker player has only a small incomplete description of the entire game-state. Furthermore, opponents play deceptively in order to hide information about the cards they hold. These properties prevent traditional game-tree search methods from working. In order to build a high-performance poker playing agent, a multitude of difficult problems must be solved in machine learning, search and simulation, and belief management. Poki, a state-of-the-art poker playing program is discussed, with focus on the problem of opponent modelling. Several methods of opponent modelling are evaluated, including nerual networks and decision trees. Poki can currently play poker at the level of an average human player. We introduce a new method of directly searching imperfect information game trees, called miximax. Like mininmax search, we do a search of the game-tree. Since the objective best moves of the opponent cannot be determined (we do not know their cards) we cannot choose a best move in the search. Instead, we use our knowledge of the opponent to choose their mixed strategy. Instead of using a min, we use a mix of their choices. The result is a reliable estimate of the expected value of our choices.
Hide Abstract.
Improved Opponent Modeling in Poker
Authors
Aaron Davidson, Darse Billings, Jonathan Schaeffer, and Duane Szafron
Purpose
Proceedings of the 2000 International Conference on Artificial Intelligence (ICAI'2000)
Date
2000
Show Abstract.
The game of poker has many properties that make it an interesting topic for articial intelligence (AI). It is a game of imperfect information, which relates to one of the most fundamental problems in computer science: how to handle knowledge that may be erroneous or incomplete. Poker is also one of the few games to be studied where deriving an accurate understanding of each opponent's style is an essential element to success. In developing a strong poker program, the opponent modeling method has always been a central component of the system. As other aspects of the program were improved, the techniques for modeling once again became a limiting factor to the overall level of play. As a result, the topic has been revisited. This paper reports on recent progress achieved by improved statistical methods, which were suggested by experiments using articial neural networks.
Hide Abstract.
Probabilities and Simulations in Poker
Authors
Lourdes Peña
Purpose
M. Sc. Thesis
Date
September 10, 1999
Show Abstract.
Poker is an imperfect information game that requires decision-making under conditions of uncertainty, much like many real-world applications. Strong poker players have to skillfully deal with multiple opponents, risk management, opponent modeling, deception and unreliable information. These features make poker an interesting area for Artificial Intelligence research. This thesis describes work done on improving the knowledge representation, betting strategy, and opponent modeling of Loki, a poker-playing program at the University of Alberta. First, a randomized betting strategy that returns a probability triple is introduced. A probability triple is a probabilistic representation of betting decisions that indicates the likelihood of each betting action occurring in a given situation. Second, real-time simulations are used to compute the expected values of betting decisions. These simulations use selective sampling to maximize the information obtained with each simulation trial. Experimental results show that each of these enhancements represents a major advance in the strength of Loki.
Hide Abstract.
Using Probabilistic Knowledge and Simulation to play Poker.
Authors
Darse Billings, Lourdes Pena, Jonathan Schaeffer, Duane Szafron
Purpose
Proceedings of the Sixteenth National Conference of the American Association for Artificial Intelligence (AAAI)
Date
1999
Show Abstract.
Until recently, artificial intelligence researchers who use games as their experimental testbed have concentrated on games of perfect information. Many of these games have been amenable to brute-force search techniques. In contrast, games of imperfect information, such as bridge and poker, contain hidden information making similar search techniques impractical. This paper describes recent progress in developing a high-performance poker-playing program. The advances come in two forms. First, we introduce a new betting strategy that returns a probabilistic betting decision, a probability triple, that gives the likelihood of a fold, call or raise occurring in a given situation. This component unifies all the expert knowledge used in the program, does a better job of representing the type of decision making needed to play strong poker, and improves the way information is propagated throughout the program. Second, real-time simulations are used to compute the expected values of betting decisions. The program generates an instance of the missing data, subject to any constraints that have been learned, and then simulates the rest of the game to determine a numerical result. By repeating this a sufficient number of times, a statistically meaningful sample is used in the program's decision-making process. Experimental results show that these enhancements each represent major advances in the strength of computer poker programs.
Hide Abstract.
Learning to Play Strong Poker
Authors
Jonathan Schaeffer, Darse Billings, Lourdes Pena, and Duane Szafron
Purpose
International Conference on Machine Learning 1999 Workshop Machine Learning in Game Playing (ICML)
Date
1999
Show Abstract.
Poker is an interesting test-bed for artificial intelligence research. It is a game of imperfect knowledge, where multiple competing agents must deal with risk management, opponent modeling, unreliable information, and deception, much like decision-making applications in the real world. Opponent modeling is one of the most difficult problems in decision-making applications and in poker it is essential to achieving high performance. This paper describes and evaluates the implicit and explicit learning in the poker program Loki. Loki implicitly "learns" sophisticated strategies by selectively sampling likely cards for the opponents and then simulating the remainder of the game. The program has explicit learning for observing its opponents, constructing opponent models and dynamically adapting its play to exploit patterns in the opponents' play. The result is a program capable of playing reasonably strong poker, but there remains considerable research to be done to play at a world-class level.
Hide Abstract.
Using Selective-Sampling Simulations in Poker
Authors
Darse Billings, Denis Papp, Lourdes Pena, Jonathan Schaeffer, and Duane Szafron
Purpose
roceedings of the AAAI Spring Symposium on Search Techniques for Problem Solving under Uncertainty and Incomplete Information
Date
1999
Show Abstract.
Until recently, AI research that used games as an experimental testbed has concentrated on perfect information games. Many of these games have been amenable to so-called brute-force search techniques. In contrast, games of imperfect information, such as bridge and poker, contain hidden knowledge making similar search techniques impractical. This paper describes work being done on developing a world-class poker-playing program. Part of the program's playing strength comes from real-time simulations. The program generates an instance of the missing data, subject to any constraints that have been learned, and then searches the game tree to determine a numerical result. By repeating this a sufficient number of times, a statistically meaningful sample can be obtained to be used in the program's decision-making process. For constructing programs to play two-player deterministic perfect information games, there is a well-defined framework based on the alpha-beta search algorithm. For imperfect information games, no comparable framework exists. In this paper we propose selective sampling simulations as a general-purpose framework for building programs to achieve high performance in imperfect information games.
Hide Abstract.
Opponent Modeling in Poker
Authors
Darse Billings, Denis Papp, Jonathan Schaeffer, and Duane Szafron
Purpose
Proceedings of the Fifteenth National Conference of the American Association for Artificial Intelligence (AAAI)
Date
1998
Show Abstract.
Poker is an interesting test-bed for artificial intelligence research. It is a game of imperfect knowledge, where multiple competing agents must deal with risk management, agent modeling, unreliable information and deception, much like decision-making applications in the real world. Agent modeling is one of the most difficult problems in decision-making applications and in poker it is essential to achieving high performance. This paper describes and evaluates Loki, a poker program capable of observing its opponents, constructing opponent models and dynamically adapting its play to best exploit patterns in the opponents' play.
Hide Abstract.
Poker as a Testbed for Machine Intelligence Research
Authors
Darse Billings, Denis Papp, Jonathan Schaeffer, and Duane Szafron
Purpose
Proceedings of AI'98, (Canadian Society for Computational Studies in Intelligence)
Date
1998
Show Abstract.
For years, games researchers have used chess, checkers and other board games as a testbed for machine intelligence research. The success of world-championship-caliber programs for these games has resulted in a number of interesting games being overlooked. Specifically, we show that poker can serve as a better testbed for machine intelligence research related to decision making problems. Poker is a game of imperfect knowledge, where multiple competing agents must deal with risk management, agent modeling, unreliable information and deception, much like decision-making applications in the real world. The heuristic search and evaluation methods successfully employed in chess are not helpful here. This paper outlines the difficulty of playing strong poker, and describes our first steps towards building a world-class poker-playing program.
Hide Abstract.
Dealing with Imperfect Information in Poker
Authors
Denis Papp
Purpose
M. Sc. Thesis
Date
November 30, 1998
Show Abstract.
Poker provides an excellent testbed for studying decision-making under conditions of uncertainty. There are many benefits to be gained from designing and experimenting with poker programs. It is a game of imperfect knowledge, where multiple competing agents must understand estimation, prediction, risk management, deception, counter-deception, and agent modeling. New evaluation techniques for estimating the strength and potential of a poker hand are presented. This thesis describes the implementation of a program that successfully handles all aspects of the game, and uses adaptive opponent modeling to improve performance.
Hide Abstract.
Computer Poker
Authors
Darse Billings
Purpose
M. Sc. Thesis
Date
August 30, 1995
Show Abstract.
Games are an interesting and challenging domain for computer science research, having the nice characteristics of a clearly defined set of rules and a specific goal. Developing a program to play a strategic game well often involves the application of theoretical concepts to practical situations, and the relative success of that endeavour can be measured with quantifiable results. The game of poker is logistically simple yet strategically complex, and offers many properties not exhibited by chess, checkers, and most other well-studied games. Most importantly, poker is a non-deterministic game with imperfect (hidden) information. Handling unreliable or incomplete information is a fundamental problem in computer science, and poker provides an excellent domain for investigating problems of decision making under conditions of uncertainty. Somewhat surprisingly, the potential benefits of studying poker have been largely overlooked by computer scientists and game researchers. In this essay we survey what resources are available for academic researchers, and lay some foundations for the scientific exploration of this fascinating game.
Hide Abstract.

Publications