so far we have discussed only 2-player alternate-turn games (Black move, White move, Black move, White move) such as go, ttt, checkers, chess, hex …
let's talk about the classic 2-player simultaneous-turn game: rock-paper-scissors
typically, when humans play, each player guesses the opponent's next move and then play that move's killer
consider a 3-game match between Alice and Bob
Alice knows that Bob likes to play rock, so she plays paper
game 1: A paper, B rock, A-B 1-0
Bob now thinks that Alice might play paper again, so he plays scissors
game 2: A rock, B scissors, A-B 2-0
Bob thinks that Alice might play scissors again, so he plays rock
game 3: A paper, B rock, A-B 3-0
what happened in that A-B match?
here is Alice's model (prediction of Bob's next move based on his previous moves) of Bob's behaviour:
in game 1 he plays rock
after a game that he loses, Bob will play the move that killed Alice's winning move
Alice's opponent model was perfect, so Alice won all games
what happens if Bob realizes what Alice is doing, namely, if he learns Alice's opponent model?
answer: Bob can change his strategy to one that exploits Alice's strategy
for a hilarious illustration of adaptive opponent modelling in a 2-player game, see the Sicilian poisoned chalice scene in The Princess Bride
the downside of basing your strategy on an opponent model is that you can lose badly if your model is inaccurate
is there a way to avoid opponent modelling altogether and just be conservative? can we use the minimax principle? what strategy should Alice follow if she wants to maximize her expected score against all possible opponent strategies ? is this possible?
yes! here's how
instead of playing deterministically (for each given Bob-game history, play a particular move), Alice will play stochastically
before the match, Alice picks probability values r, p, s (each is positive, their sum is one)
then, in each game, she plays rock with prob. r, paper with prob. p, scissors with prob. p
assume r p s = .5 .3 .2 respectively
whenever Bob plays rock
A plays rock .5, result draw, expected payoff (wins - losses) for Alice is .5*0 = 0
A plays paper .3, result win, expected payoff for Alice is .3*1 = .3
A plays scissors, result loss, expected payoff for Alice is .2*-1 = -.2
so A's total expected payoff when Bob plays rock is 0 + .3 + -.2 = .1
exercise: what is A's expected payoff when Bob plays paper?
exercise: what is A's expected payoff when Bob plays scissors?
conclusion: with r p s = .5 .3 .2, Alice will have a positive expected payoff if Bob plays rock all the time
assume r p s = 1/3 1/3 1/3 respectively
exercise: what is A's expected payoff when Bob plays rock? (ans: 0)
exercise: what is A's expected payoff when Bob plays paper? (ans: 0)
exercise: what is A's expected payoff when Bob plays scissors? (ans: 0)
conclusion: with r p s = 1/3 1/3 1/3 Alice will have a 0 expected payoff
assume Alice plays strategy S = 1/3 1/3 1/3
with S Alice has expected payoff 0 for each move of Bob
so with S Alice has expected payoff 0 against every Bob strategy
also, rock-paper-scissors is fair: each player has the same move options and payoffs
so Bob has a strategy T = 1/3 1/3 1/3
against T, every Alice strategy has expected payoff for Bob 0 (and so also 0 for Alice, because rock-paper-scissors is a zero-sum game)
so Alice cannot do better than expected payoff 0 against every possible Bob strategy
her minimax score for this game is at least 0 (by following S)
her minimax score for this game is less than or equal to 0 (because Bob can play T)
so Alice's minimax score for rock-paper-scissors is this: expected payoff 0
strategy S gets this score for Alice
the entries of this matrix show Alice's payoff in rock-paper-scissors R P S <- Bob's moves Alice's -> R 0 -1 1 moves P 1 0 -1 S -1 1 0