ancient, many variants: terni lappilli 3-men's morris

Ovid: Ars Amatoria III, lines 365-369

Est genus in totidem tenui ratione redactum scriptula, quot menses lubricus annus habet; Parua tabella capit ternos utrimque lapillos, in qua uicisse est continuasse suos. There is another game divided into as many parts as there are months in the year; A small board has three pieces on either side, the winner must get all the pieces in a straight line.

in puzzle (single-agent game): 1 agents makes all actions

to solve puzzle, search state space for best possible outcome

in 2-player game: 2 agents alternately make actions

what does it mean to solve a game?

**reachable state**: any state reachable by sequence of legal moves**strategy**: for a player, a function that, for any reachable state, returns a legal movein 2-player game, to

**solve a state**is to find a strategy with best worst-case performance, ie. that guarantees the best possible score, over all possible opponent strategiesthe initial move for any such strategy can be found by

**minimax search****adversarial search**: searching a 2-player game tree

consider only 2-player games

for a state,

**minimax value**is maximum score player can achieve, over all possible opponent strategiesfor find minimax value, explore game-state space in any order that finds values of children before value of node

minimax algorithm: search space to find root state minimax value

play that follows a minimax strategy called

**perfect play**

minimax algm, max/min format

root node is a max node each child of a max node is a min node each child of a min node is a max node def eval(s): for player MAX, return score of s def minimax(s): if terminal(s): return eval(s) if s.player() == MAX: return max(for all children c of s, minimax(c)) if s.player() == MIN: return min(for all children c of s, minimax(c))

node: state edge (root, level 1): p1 move edge (level 1, level 2): p2 move leaf label: p1 score for each node, p1 minimax value ? best play? * / | \ / | \ * * * / | \ / \ / | \ * * * * * * * * 2 3 -1 1 6 -4 2 9 1 / | \ / | \ -1 1 -4 / | \ / \ / | \ * * * * * * * * 2 3 -1 1 6 -4 2 9 p1 best play from root: middle p2 best play at level 1: after p1 left: right (p1 scores -1) after p1 middle: left (p1 scores 1) after p1 right: left (p1 scores -4)

when coding, awkward to have 2 cases (p1 move? p2 move?)

negamax: at each code, compute minimax value for

**player to move**now only 1 case to consider

if score for other player needed, use this:

score_for_p1(node) == negative( score_for_p2(node) )

so, in negamax format, at each node:

select move that maximizes neg( score(child) )

minimax algm, negamax format

def eval(s): for player who would move next, return score of s def negamax(s): if terminal(s): return eval(s) else: return max(for all children c of s, -negamax(c))

at each node, score is for *** player-to-move *** p1 to move * / | \ / | \ p2 to move * * * / | \ / \ / | \ p1 to move * * * * * * * * p1 score 2 3 -1 1 6 -4 2 9 p1 score 1 / | \ / | \ p2 score 1 -1 4 / | \ / \ / | \ p1 score * * * * * * * * 2 3 -1 1 6 -4 2 9

for x to move, x . o draw (part of) the x . . game tree o . . then draw (part of) the minimax tree then draw the solution tree (show best move winner)

part of game tree

x . o x . . o . . xxo x.o x.o x.o x.o x.. xx. x.x x.. x.. o.. o.. o.. ox. o.x xxo xxo xxo xxo x.o x.o x.o x.o xoo x.o x.o x.o xo. x.o x.. x.. xx. xx. xx. xx. x.x xox x.x x.x ... o.. o.. oo. o.o o.. o.. o.. o.. o.. o.. oo. o.o xxo xxo xxo xxo xxo xxo xxo xxo xxo xxo x.o x.o xxo x.o x.o xx. x.x x.. xx. x.x x.. xx. xxx xx. ... o.. ox. o.x oo. oo. oox o.o o.o oxo o.. o.. o.x (two more levels)

part of minimax tree

x . o this state has x . . x minimax o . . value 1 o: 1 -1 1 1 1 xxo x.o x.o x.o x.o x.. xx. x.x x.. x.. o.. o.. o.. ox. o.x x: -1 0 0 0 1 1 1 1 1 -1 1 1 xxo xxo xxo xxo xoo x.o x.o x.o xoo x.o x.o x.o xo. x.o x.. x.. xx. xxo xx. xx. x.x xox x.x x.x ... o.. o.. oo. o.o o.. o.. oo. o.o o.. o.. oo. o.o o: 0 1 1 1 1 1 1 1 1 -1 0 -1 xxo xxo xxo xxo xxo xxo xxo xxo xxo xoo xoo xoo xxo x.o x.o xx. x.x x.. xx. x.x x.. xxx xx. xx. o.. ox. o.x oo. oo. oox o.o o.o oxo o.. ox. o.x (two more levels)

solution tree

for winner, x . o show winning move x . . o . . | for opponent, x . o show all moves x x . o . . / / \ \ / / \ \ x o o x . o x . o x . o x x . x x o x x . x x . o . . o . . o o . o . o | | | | x o o x . o x . o x . o x x x x x o x x x x x x o . . o . x o o . o . o

when solving a state, usually not necessary to examine the whole tree

once one winning move is found, remaining moves can be ignored

this motivates alpha-beta search

eg tic-tac toe: how many states need to be examined to find minimax value of corner opening move?

a) about 8! b) far fewer than 8!

answer b) why?

when computing values of children, if winning move found, we can immediately return (effectively pruning remaining siblings from our search)

(another factor: can prune symmetric transpositions)

number states at most number nodes in search tree

so number states ≤ 9! = 362 880

why ≤ ?

some terminal nodes not at max depth (because win is a leaf node)

transpositions

better: number states at most number different positions

so number states ≤ 3

^{9}= 19 683each of 9 board positions is empty/ x / o

represent each location with 0 / 1 / 2

so each board position is represented by a 9-digit base 3 number

better: many of these positions are not reachable

also better: many of these states isomorphic

from root state: only 3 non-iso moves, so expect 6 500 nodes

alphabeta search from root examines 3025 non-iso states

ttt board representation example

eg. this position represented by this number: x . o x o . 102 120 010 (base 3) = 8 427 (decimal) . x . an unreachable ttt position: x . o x . o x . o

minimax pruning example, min/max format

recall min/max format: p1 max, p2 min, show p1 values a / \ b c / \ /|\ d e f g h 7 5 3 ? ? assume dfs with children explored left-to-right * / \ 5 <=3 / \ /|\ 7 5 3 ? ? after examining node we know ... d val(b) <= 7 e val(b) = min(7,5) = 5 c val(c) <= 3 now, no need to learn more about val(c) ... val(c) <=3 < val(b) ... p1 prefers b to c ... so prune remaining children of c from our search

αβ-search = minimax search with αβ pruning

α: value, best p1 option so far, on path current-to-root

β: value, best p2 option so far, on path current-to-root

minimax, so can be used as a solver

leaf scores must be true scores

must be able to reach all leaf nodes in reasonable time

can

**also**be used as heuristic playerfind fast heuristic, use on all leaves at a fixed depth

eg. simple chess player

alpha-beta search (negamax format)

recall negamax format: value(t) is for *** player to move from t *** p1_score( t ) = - p2_score( t ) def abnega(v, alpha, beta): if terminal(v): return eval(v) so_far = neg_infinity # best score so far for each child c of v: so_far = max(so_far, - abnega( c, - beta, - alpha) ) alpha = max( alpha, so_far ) if alpha >= beta: break return so_far abnega(root, neg_infinity, infinity)

alphabeta trace example

A / \ B C / \ /|\ D E F G H 7 5 3 1 9 A -999 999 -999 B -999 999 -999 D -999 999 -999 D leaf value 7 D now best child of B D improved alpha( B ) to -7 E -999 7 -999 E leaf value 5 E now best child of B E improved alpha( B ) to -5 B -5 999 -5 B now best child of A B improved alpha( A ) to 5 C -999 -5 -999 F 5 999 -999 F leaf value 3 F now best child of C F improved alpha( C ) to -3 alpha >= beta, prune remaining children of C C -3 -5 -3 A 5 999 5

another example

to see our algorithm run on this example, go to directory

`simple/alphabeta`and run`alphabeta.py`on input`t2.in`

αβ search with no enhancements can be weak

eg. class code,

`ttt -n 4 -m 4 -k 3 alpha_beta`is weak player

good news: many enhancements strengthen αβ

eg. move ordering: to maximize pruning, consider children in order from strongest to weakest

child strength not known apriori; use heuristic to guess

check win-threats or lose-threats (forced moves) first

eg. threat search in tic-tac-toe: if opponent can win on next move, then player must block at that cell: all other moves lose and can be pruned

tictactoe alphabeta revisions:

initialize (-) infinity = (-) 1 (why ?)

answer: then search stops once win/loss found

before starting search, order children so that win-threats precede lose-threats precede rest of moves (why ?)

answer: that gives this order of execution

if there is a winning move, make it

elif the opponent can win, block one such move,

else find best other move

heuristic + limited depth αβ search basis of many strong chess programs