# jemdoc: addcss{rbh.css}, addcss{jacob.css}
= 2-player 0-sum imperfect information games
~~~
{}{raw}
rps
mmx
lp
nash
cfr
~~~
~~~
{}{raw}
rock paper scissors
~~~
~~~
- play with a classmate
- is your strategy equiprobabe ?
-- yes: why ?
-- no: what is your strategy ? how did you model your opponent ?
~~~
~~~
{rps payoff matrix}{}
two players: row, column
payoffs for row column's strategies
r p s
r 0 -1 1
row's strategies p 1 0 -1
s -1 1 0
~~~
~~~
{}{raw}
minimax
~~~
~~~
- minimax value:
the minimum, over all player\/opponent strategy pairs,
of the maximum the opponent can score
- for 2-player 0-sum games ({{so op't score \+ player score = 0}})
-- if each player plays a fixed strategy for the whole game,
minimax value is easily computed
- e.g. assume row plays equiprobable: rps each 1\/3
- assume col plays rps resp. probs y1 y2 y3
- then expected row payoff {{1/3(y2-y3) + 1/3(y1-y3) + 1/3(y2-y1) = 0}}
- so row's minimax value is at least 0 (some other row strategy might do better)
- but, by symmetry (payoff matrix is symmetric), col's expected value
is 0 by playing equiprobable
- so, for this game, row's minimax value is 0, and can be achieved
by playing equiprobable
~~~
~~~
{}{raw}
2p 0sum minimax via linear programming
~~~
~~~
{}{}
- assume col knows row's strategy
- then, for this row-strat, some pure col strategy is best for col
- so row wants to pick a mixed strategy that maximizes
the minimum score row gets, over all pure col strats
- so row wants to pick x1 x2 x3 to maximize
min x2 - x3, (row payoff against col strat r)
x3 - x1, (" " " " " p)
x1 - x2 (" " " " " s)
- so row wants to solve the LP
max z st
z <= x2 - x3
z <= -x1 + x3
z <= x1 - x2
x1 + x2 + x3 = 1
x1,x2,c3 >= 0
- similarly, col wants to solve an LP that turns out to be the dual of the above
- notice that the above LP has a feasible solution and z is bounded
- so the duality theorem of LP tells us that each problem has the same solution
- we have already seen that (1/3 1/3 1/3) is the primal solution
~~~
~~~
{}{raw}
nash equilibrium
~~~
~~~
- [https://en.wikipedia.org/wiki/Nash_equilibrium wiki]
- for 2-player 0-sum games, the above LP result implies
that the minimax solution (assuming each player has a fixed mixed
strategy for the whole game) is the only Nash equilibrium
- [https://en.wikipedia.org/wiki/Nash_equilibrium\#Prisoner.27s_dilemma prisoner's dilemma]
-- if player's trust each other, they can cooperate and do better than the minimax value
-- but this combination of strategies is not a Nash equilibrium
- [https://en.wikipedia.org/wiki/Nash_equilibrium\#Occurrence sufficient conditions]
~~~
~~~
{}{raw}
counterfactual regret
~~~
~~~
- [http://modelai.gettysburg.edu/2013/cfr/ intro]
- [http://webdocs.cs.ualberta.ca/~hayward/670gga/jem/burch.pdf Burch slides]
- [http://jeskola.net/cfr/ tammelin's cfr (with demo)]
~~~