recall minimax
for a game-tree, find root minimax value from leaf values
why minimax?
it gives best worst-case score, i.e. against all possible opponent strategies
now let's pick a game and use minimax to solve (find minimax value) arbitrary states
need a game with relatively small state space, say tic-tac-toe
ancient, many variants: terni lappilli 3-men's morris
Est genus in totidem tenui ratione redactum scriptula, quot menses lubricus annus habet; Parua tabella capit ternos utrimque lapillos, in qua uicisse est continuasse suos. There is another game divided into as many parts as there are months in the year; A small board has three pieces on either side, the winner must get all the pieces in a straight line.
usual minimax pseudocode
good coding practise: avoid code duplication
do we need 2 cases (p1 move? p2 move?)
negamax: at each code, compute minimax value for player to move
now only 1 case
if score for other player needed, use this:
score_for_p2(node) == negative( score_for_p1(node) )
so, in negamax format, at each node:
select move that maximizes neg( score(child) )
warning
when using negamax, make sure that leaf scores are for player-to-move (minimax assumes all node scores are for 1st-player, so maximizer)
to convert leaf scores in a minimax tree to equivalent leaf scores in a negamax tree:
negate leaf scores whose distance-to-root is odd
leaf scores whose distance-to-root is even do not need to be changed (because their player-to-move is max)
def eval(s): for player who would move next, return score of s def negamax(s): if terminal(s): return eval(s) else: return max(for all children c of s, -negamax(c))
at each node, score is for *** player-to-move *** p1 to move * / | \ / | \ p2 to move * * * / | \ / \ / | \ p1 to move * * * * * * * * p1 score 2 3 -1 1 6 -4 2 9 p1 score 1 / | \ / | \ p2 score 1 -1 4 / | \ / \ / | \ p1 score * * * * * * * * 2 3 -1 1 6 -4 2 9
for x to move, x . o draw (part of) the x . . game tree o . . then draw (part of) the minimax tree then draw the proof tree, also called solution tree (show best move winner)
x . o x . . o . . xxo x.o x.o x.o x.o x.. xx. x.x x.. x.. o.. o.. o.. ox. o.x xxo xxo xxo xxo x.o x.o x.o x.o xoo x.o x.o x.o xo. x.o x.. x.. xx. xx. xx. xx. x.x xox x.x x.x ... o.. o.. oo. o.o o.. o.. o.. o.. o.. o.. oo. o.o xxo xxo xxo xxo xxo xxo xxo xxo xxo xxo x.o x.o xxo x.o x.o xx. x.x x.. xx. x.x x.. xx. xxx xx. ... o.. ox. o.x oo. oo. oox o.o o.o oxo o.. o.. o.x (two more levels)
x . o this state has x . . x minimax o . . value 1 o: 1 -1 1 1 1 xxo x.o x.o x.o x.o x.. xx. x.x x.. x.. o.. o.. o.. ox. o.x x: -1 0 0 0 1 1 1 1 1 -1 1 1 xxo xxo xxo xxo xoo x.o x.o x.o xoo x.o x.o x.o xo. x.o x.. x.. xx. xxo xx. xx. x.x xox x.x x.x ... o.. o.. oo. o.o o.. o.. oo. o.o o.. o.. oo. o.o o: 0 1 1 1 1 1 1 1 1 -1 0 -1 xxo xxo xxo xxo xxo xxo xxo xxo xxo xoo xoo xoo xxo x.o x.o xx. x.x x.. xx. x.x x.. xxx xx. xx. o.. ox. o.x oo. oo. oox o.o o.o oxo o.. ox. o.x (two more levels)
for winner, x . o show winning move x . . o . . | for opponent, x . o show all moves x x . o . . / / \ \ / / \ \ x o o x . o x . o x . o x x . x x o x x . x x . o . . o . . o o . o . o | | | | x o o x . o x . o x . o x x x x x o x x x x x x o . . o . x o o . o . o
def negamax(psn, ptm): # position, player-to-move if psn.has_win(ptm): return 1 # previous move created win L = psn.legal_moves() if len(L) == 0: return 0 # board full, so draw so_far = -1 # best score so far for cell in L: psn.brd[cell] = ptm nmx = negamax(psn, opponent(ptm)) so_far = max(so_far,-nmx) psn.brd[cell] = Cell.e # reset brd by erasing cell return so_far # solves 3x3 ttt in 740 170 calls # how many nodes in search tree? # root 1 node at level 0 # at most 9 nodes at level 1 # at most 9*8 nodes at level 2 # at most 9*8*7 nodes at level 3 # ... # at most 9! nodes at level 9 # sum: at most 986 410 nodes # recall: 9! = 362 888 # can we do better ?
ttt search space is not a tree: it has cycles
ttt search space is a graph (a.k.a. directed-acyclic-graph, or dag)
(acyclic because we direct arcs top-down: a directed cycle would have all arcs directed up, or all directed down, impossible because we start and end at one position)
(algorithms on trees and dags are usually faster than algorithms on graphs that allow directed cycles)
move sequences that yield the same position are called transpositions
x o . o . x o x . / \ x o . x o o o o x o . x o x . o x . | | x o . x o o o o x o . x o x x o x x \ / x o o o o x o x x
how many ttt states need to be examined to find minimax value of corner opening move?
a) about 8! b) far fewer than 8!
answer b) why?
nodes can be pruned
win detection: in child minimax for loop, abort if winning child found (prune remaining siblings)
solve ttt with 129 988 nodes
early draw detection: return draw once neither player can win
save minimax values in transposition table, by position
before solving position
is position in table (transposition) ?
is isomorphic position in table (symmetry) ?
how big is ttt search space?
root, 9 children, 9*8 grandchildren, 9*8*7 greatgrandchildren, …
total number nodes < 1 + 9 + 9*8 + 9*8*7 + ... + 9*8*7*6*5*4*3*2*1 = 986 410
why < ?
some terminal nodes not at max depth (because win is a leaf node)
transpositions
if we treat search space as a tree (ignore transpositions, allow a position in multiple nodes) then above is a reasonable estimate
how much of a space reduction do we get if allow each position to appear in at most one node?
number different possible positions = 39 = 19 683
each of 9 board positions is empty/ x / o
represent each location with 0 / 1 / 2
so each board position is represented by a 9-digit base 3 number
good news: many of these positions are not reachable
also good news: many of these states isomorphic
from root state: only 3 non-iso moves, so expect 6 500 nodes
alphabeta search from root examines 3025 non-iso states
eg. this position represented by this number: x . o 1 0 2 x o . 1 2 0 102 120 010 (base 3) = 8 427 (decimal) . x . 0 1 0 an unreachable ttt position: x . o x . o x . o
recall min/max format: p1 max, p2 min, show p1 values a / \ b c / \ /|\ d e f g h 7 5 3 ? ? assume dfs with children explored left-to-right * / \ 5 <=3 / \ /|\ 7 5 3 ? ? after examining node we know ... d val(b) <= 7 e val(b) = min(7,5) = 5 c val(c) <= 3 now, no need to learn more about val(c) ... val(c) <=3 < val(b) ... p1 prefers b to c ... so prune remaining children of c from our search
recall negamax format: value(t) is for *** player to move from t *** p1_score( t ) = - p2_score( t ) def abnega(v, alpha, beta): if terminal(v): return eval(v) so_far = neg_infinity # best score so far for each child c of v: so_far = max(so_far, - abnega( c, - beta, - alpha) ) alpha = max( alpha, so_far ) if alpha >= beta: break return so_far abnega(root, neg_infinity, infinity)
A / \ B C / \ /|\ D E F G H 7 5 3 1 9 directory /games-puzzles-algorithms/simple/alphabeta python3 alphabeta.py < t1.in A -999 999 -999 B -999 999 -999 D -999 999 -999 D leaf value 7 D now best child of B D improved alpha( B ) to -7 E -999 7 -999 E leaf value 5 E now best child of B E improved alpha( B ) to -5 B -5 999 -5 B now best child of A B improved alpha( A ) to 5 C -999 -5 -999 F 5 999 -999 F leaf value 3 F now best child of C F improved alpha( C ) to -3 alpha >= beta, prune remaining children of C C -3 -5 -3 A 5 999 5
αβ search with no enhancements can be weak
good news: many enhancements strengthen αβ
eg. move ordering: to maximize pruning, consider children in order from strongest to weakest
child strength not known apriori; use heuristic to guess
a limited form of move-ordering: threat-search
check for wins or lose-threats (forced moves) first
win-threat?
does player have winning move? if yes, make it
if player has no win-threat: lose-threat?
does opponent have next-move-win threat?
if yes, the player must (try to) block all such threats
tictactoe alphabeta revisions:
stop search once win for ptm found
before starting search, order children: win-threats (with this move, can we win?), lose-threats (with their next move, can opponent win?)
result:
if we have winning move, make it
elif opponent can win, block such moves (in ttt can block only 1)
else find best other move
heuristic + limited depth αβ search basis of many strong chess programs
a b c x to play 1 x o . 2 o x . 3 . . . from this position, before start/resume minimax search, check for winning moves for x c1 no c2 no a3 no b3 no c3 yes make move: x plays c3 (and wins)
a b c o to play 1 x . . 2 x x o 3 . o . from this position, before start/resume minimax search, * check for winning moves for o: none found * next, check for winning moves for opponent: x b1 no c1 no a3 yes in ttt we can block at most one lose-threat, so make move once first lose-threat found make move: o plays a3 (we blocked lose-threat a3, but lose anyway because x has a second win-threat)