tic-tac-toe game

ttt   adv   mmx   eg   ngx   eg   prune   num_states   αβ   enhance  

tic-tac-toe

Ovid: Ars Amatoria III, lines 365-369
Est genus in totidem tenui ratione redactum
    scriptula, quot menses lubricus annus habet;
Parua tabella capit ternos utrimque lapillos,
    in qua uicisse est continuasse suos.

There is another game divided into as many parts
  as there are months in the year;
A small board has three pieces on either side,
  the winner must get all the pieces in a straight line.

2-player games: adversarial search

  • in puzzle (single-agent game): 1 agents makes all actions

    • to solve puzzle, search state space for best possible outcome

  • in 2-player game: 2 agents alternately make actions

    • what does it mean to solve a game?

  • reachable state: any state reachable by sequence of legal moves

  • strategy: for a player, a function that, for any reachable state, returns a legal move

  • in 2-player game, to solve a state is to find a strategy with best worst-case performance, ie. that guarantees the best possible score, over all possible opponent strategies

  • the initial move for any such strategy can be found by minimax search

  • adversarial search: searching a 2-player game tree

  • adversarial search by Joshua Eckroth

minimax

  • consider only 2-player games

  • for a state, minimax value is maximum score player can achieve, over all possible opponent strategies

  • for find minimax value, explore game-state space in any order that finds values of children before value of node

  • minimax algorithm: search space to find root state minimax value

  • play that follows a minimax strategy called perfect play

minimax algm, max/min format
root node is a max node
each child of a max node is a min node
each child of a min node is a max node

def eval(s):
  for player MAX, return score of s

def minimax(s):
  if terminal(s):
    return eval(s)
  if s.player() == MAX:
    return max(for all children c of s, minimax(c))
  if s.player() == MIN:
    return min(for all children c of s, minimax(c))

minimax example, min/max format

node: state
edge (root,    level 1):  p1 move
edge (level 1, level 2):  p2 move
leaf label: p1 score

for each node,
  p1 minimax value ?
best play?
                    *
                /   |   \
              /     |     \
            *       *       *
          / | \    / \    / | \
         *  *  *  *   *  *  *  *
         2  3 -1  1   6 -4  2  9

                    1
                /   |   \
              /     |     \
           -1       1      -4
          / | \    / \    / | \
         *  *  *  *   *  *  *  *
         2  3 -1  1   6 -4  2  9

p1 best play from root: middle

p2 best play at level 1:
  after p1 left:   right (p1 scores -1)
  after p1 middle: left  (p1 scores  1)
  after p1 right:  left  (p1 scores -4)

negamax

  • when coding, awkward to have 2 cases (p1 move? p2 move?)

  • negamax: at each code, compute minimax value for player to move

  • now only 1 case to consider

  • if score for other player needed, use this:

    • score_for_p1(node) == negative( score_for_p2(node) )

  • so, in negamax format, at each node:

    • select move that maximizes neg( score(child) )

minimax algm, negamax format
def eval(s):
  for player who would move next, return score of s

def negamax(s):
  if terminal(s):
    return eval(s)
  else:
    return max(for all children c of s, -negamax(c))

minimax example, negamax format

at each node, score is for *** player-to-move ***

p1 to move            *
                  /   |   \
                /     |     \
p2 to move    *       *       *
            / | \    / \    / | \
p1 to move *  *  *  *   *  *  *  *
p1 score  2  3 -1  1   6 -4  2  9


p1 score             1
                 /   |   \
               /     |     \
p2 score     1      -1       4
           / | \    / \    / | \
p1 score  *  *  *  *   *  *  *  *
          2  3 -1  1   6 -4  2  9

ttt example trees

for x to move,             x . o
draw (part of) the         x . .
game tree                  o . .

then draw (part of)
the minimax tree

then draw
the solution tree
(show best move winner)
part of game tree
                            x . o
                            x . .
                            o . .

        xxo        x.o        x.o        x.o        x.o
        x..        xx.        x.x        x..        x..
        o..        o..        o..        ox.        o.x

  xxo  xxo  xxo  xxo   x.o  x.o  x.o  x.o    xoo  x.o  x.o  x.o
  xo.  x.o  x..  x..   xx.  xx.  xx.  xx.    x.x  xox  x.x  x.x  ...
  o..  o..  oo.  o.o   o..  o..  o..  o..    o..  o..  oo.  o.o

     xxo xxo xxo  xxo xxo xxo  xxo xxo xxo  xxo x.o x.o
     xxo x.o x.o  xx. x.x x..  xx. x.x x..  xx. xxx xx.  ...
     o.. ox. o.x  oo. oo. oox  o.o o.o oxo  o.. o.. o.x

  (two more levels)
part of minimax tree
                        x . o     this state has
                        x . .       x  minimax
                        o . .        value  1

o:       1         -1          1          1          1
        xxo        x.o        x.o        x.o        x.o
        x..        xx.        x.x        x..        x..
        o..        o..        o..        ox.        o.x

x: -1   0    0    0     1    1    1    1      1   -1    1    1
  xxo  xxo  xxo  xxo   xoo  x.o  x.o  x.o    xoo  x.o  x.o  x.o
  xo.  x.o  x..  x..   xx.  xxo  xx.  xx.    x.x  xox  x.x  x.x  ...
  o..  o..  oo.  o.o   o..  o..  oo.  o.o    o..  o..  oo.  o.o

o:    0   1   1    1   1   1    1   1   1   -1   0  -1
     xxo xxo xxo  xxo xxo xxo  xxo xxo xxo  xoo xoo xoo
     xxo x.o x.o  xx. x.x x..  xx. x.x x..  xxx xx. xx.
     o.. ox. o.x  oo. oo. oox  o.o o.o oxo  o.. ox. o.x

  (two more levels)
solution tree
for winner,        x . o
show winning move  x . .
                   o . .
                     |
for opponent,      x . o
show all moves     x x .
                   o . .
             /     /   \    \
           /      /     \     \
     x o o    x . o    x . o    x . o
     x x .    x x o    x x .    x x .
     o . .    o . .    o o .    o . o
       |        |        |        |
     x o o    x . o    x . o    x . o
     x x x    x x o    x x x    x x x
     o . .    o . x    o o .    o . o
xkcd

pruning game trees

when solving a state, usually not necessary to examine the whole tree

  • once one winning move is found, remaining moves can be ignored

  • this motivates alpha-beta search

  • eg tic-tac toe: how many states need to be examined to find minimax value of corner opening move?

    • a) about 8!     b) far fewer than 8!

    • answer b)           why?

    • when computing values of children, if winning move found, we can immediately return (effectively pruning remaining siblings from our search)

    • (another factor: can prune symmetric transpositions)

number ttt states

  • number states at most number nodes in search tree

  • so number states ≤ 9! = 362 880

  • why ≤ ?

    • some terminal nodes not at max depth (because win is a leaf node)

    • transpositions

  • better: number states at most number different positions

    • so number states ≤ 39 = 19 683

    • each of 9 board positions is empty/ x / o

    • represent each location with 0 / 1 / 2

    • so each board position is represented by a 9-digit base 3 number

  • better: many of these positions are not reachable

  • also better: many of these states isomorphic

    • from root state: only 3 non-iso moves, so expect 6 500 nodes

    • alphabeta search from root examines 3025 non-iso states

ttt board representation example
eg. this position       represented by this number:

   x . o
   x o .           102 120 010 (base 3) = 8 427 (decimal)
   . x .

an unreachable ttt position:

   x . o
   x . o
   x . o
minimax pruning example, min/max format
recall min/max format:
  p1 max, p2 min, show p1 values

         a
      /     \
     b       c
    / \     /|\
   d   e   f g h
   7   5   3 ? ?

assume dfs with children explored left-to-right
         *
      /     \
     5      <=3
    / \     /|\
   7   5   3 ? ?

after examining node     we know ...
  d                      val(b) <= 7
  e                      val(b) = min(7,5) = 5
  c                      val(c) <= 3

now, no need to learn more about val(c)
... val(c) <=3   <   val(b)
... p1  prefers   b   to   c
... so prune remaining children of c from our search

alpha-beta search

  • αβ-search = minimax search with αβ pruning

  • α: value, best p1 option so far, on path current-to-root

  • β: value, best p2 option so far, on path current-to-root

  • minimax, so can be used as a solver

    • leaf scores must be true scores

    • must be able to reach all leaf nodes in reasonable time

  • can also be used as heuristic player

    • find fast heuristic, use on all leaves at a fixed depth

    • eg. simple chess player

alpha-beta search (negamax format)
recall negamax format:
  value(t) is for *** player to move from t ***
  p1_score( t ) = - p2_score( t )

def abnega(v, alpha, beta):
  if terminal(v):
    return eval(v)

  so_far = neg_infinity  # best score so far
  for each child c of v:
    so_far = max(so_far, - abnega( c, - beta, - alpha) )
    alpha = max( alpha, so_far )
    if alpha >= beta:
      break
   return so_far

abnega(root, neg_infinity, infinity)
alphabeta trace example
         A
      /     \
     B       C
    / \     /|\
   D   E   F G H
   7   5   3 1 9

A -999 999 -999
  B -999 999 -999
    D -999 999 -999
    D leaf value 7
    D now best child of B
    D improved alpha( B ) to -7
    E -999 7 -999
    E leaf value 5
    E now best child of B
    E improved alpha( B ) to -5
  B -5 999 -5
  B now best child of A
  B improved alpha( A ) to 5
  C -999 -5 -999
    F 5 999 -999
    F leaf value 3
    F now best child of C
    F improved alpha( C ) to -3
    alpha >= beta, prune remaining children of C
  C -3 -5 -3
A 5 999 5
another example
  • video example

  • to see our algorithm run on this example, go to directory simple/alphabeta and run alphabeta.py on input t2.in

αβ enhancements

  • αβ search with no enhancements can be weak

    • eg. class code, ttt -n 4 -m 4 -k 3 alpha_beta is weak player

  • good news: many enhancements strengthen αβ

  • eg. move ordering: to maximize pruning, consider children in order from strongest to weakest

    • child strength not known apriori; use heuristic to guess

  • check win-threats or lose-threats (forced moves) first

    • eg. threat search in tic-tac-toe: if opponent can win on next move, then player must block at that cell: all other moves lose and can be pruned

  • tictactoe alphabeta revisions:

    • initialize (-) infinity = (-) 1     (why ?)

    • answer: then search stops once win/loss found

    • before starting search, order children so that win-threats precede lose-threats precede rest of moves (why ?)

    • answer: that gives this order of execution

      • if there is a winning move, make it

      • elif the opponent can win, block one such move,

      • else find best other move

  • heuristic + limited depth αβ search basis of many strong chess programs

  • slides: tt etc enhancements