tic-tac-toe game

ttt   eg   ngx   eg   tneg   trans   prune   num_states   αβneg   enhance  

tic-tac-toe

  • recall minimax

    • for a game-tree, find root minimax value from leaf values

  • why minimax?

    • it gives best worst-case score, i.e. against all possible opponent strategies

  • now let's pick a game and use minimax to solve (find minimax value) arbitrary states

  • need a game with relatively small state space, say tic-tac-toe

  • ancient, many variants: terni lappilli   3-men's morris

Ovid: Ars Amatoria III, lines 365-369
Est genus in totidem tenui ratione redactum
    scriptula, quot menses lubricus annus habet;
Parua tabella capit ternos utrimque lapillos,
    in qua uicisse est continuasse suos.

There is another game divided into as many parts
  as there are months in the year;
A small board has three pieces on either side,
  the winner must get all the pieces in a straight line.

negamax

  • usual minimax pseudocode

  • good coding practise: avoid code duplication

  • do we need 2 cases (p1 move? p2 move?)

  • negamax: at each code, compute minimax value for player to move

  • now only 1 case

  • if score for other player needed, use this:

    • score_for_p2(node) == negative( score_for_p1(node) )

  • so, in negamax format, at each node:

    • select move that maximizes neg( score(child) )

warning

  • when using negamax, make sure that leaf scores are for player-to-move (minimax assumes all node scores are for 1st-player, so maximizer)

  • to convert leaf scores in a minimax tree to equivalent leaf scores in a negamax tree:

    • negate leaf scores whose distance-to-root is odd

  • leaf scores whose distance-to-root is even do not need to be changed (because their player-to-move is max)

minimax algm, negamax format
def eval(s):
  for player who would move next, return score of s

def negamax(s):
  if terminal(s):
    return eval(s)
  else:
    return max(for all children c of s, -negamax(c))

minimax example, negamax format

at each node, score is for *** player-to-move ***

p1 to move            *
                  /   |   \
                /     |     \
p2 to move    *       *       *
            / | \    / \    / | \
p1 to move *  *  *  *   *  *  *  *
p1 score  2  3 -1  1   6 -4  2  9


p1 score             1
                 /   |   \
               /     |     \
p2 score     1      -1       4
           / | \    / \    / | \
p1 score  *  *  *  *   *  *  *  *
          2  3 -1  1   6 -4  2  9

ttt example trees

for x to move,             x . o
draw (part of) the         x . .
game tree                  o . .

then draw (part of)
the minimax tree

then draw
the proof tree, also called solution tree
(show best move winner)
part of game tree
                            x . o
                            x . .
                            o . .

        xxo        x.o        x.o        x.o        x.o
        x..        xx.        x.x        x..        x..
        o..        o..        o..        ox.        o.x

  xxo  xxo  xxo  xxo   x.o  x.o  x.o  x.o    xoo  x.o  x.o  x.o
  xo.  x.o  x..  x..   xx.  xx.  xx.  xx.    x.x  xox  x.x  x.x  ...
  o..  o..  oo.  o.o   o..  o..  o..  o..    o..  o..  oo.  o.o

     xxo xxo xxo  xxo xxo xxo  xxo xxo xxo  xxo x.o x.o
     xxo x.o x.o  xx. x.x x..  xx. x.x x..  xx. xxx xx.  ...
     o.. ox. o.x  oo. oo. oox  o.o o.o oxo  o.. o.. o.x

  (two more levels)
part of minimax tree
                        x . o     this state has
                        x . .       x  minimax
                        o . .        value  1

o:       1         -1          1          1          1
        xxo        x.o        x.o        x.o        x.o
        x..        xx.        x.x        x..        x..
        o..        o..        o..        ox.        o.x

x: -1   0    0    0     1    1    1    1      1   -1    1    1
  xxo  xxo  xxo  xxo   xoo  x.o  x.o  x.o    xoo  x.o  x.o  x.o
  xo.  x.o  x..  x..   xx.  xxo  xx.  xx.    x.x  xox  x.x  x.x  ...
  o..  o..  oo.  o.o   o..  o..  oo.  o.o    o..  o..  oo.  o.o

o:    0   1   1    1   1   1    1   1   1   -1   0  -1
     xxo xxo xxo  xxo xxo xxo  xxo xxo xxo  xoo xoo xoo
     xxo x.o x.o  xx. x.x x..  xx. x.x x..  xxx xx. xx.
     o.. ox. o.x  oo. oo. oox  o.o o.o oxo  o.. ox. o.x

  (two more levels)
proof tree
for winner,        x . o
show winning move  x . .
                   o . .
                     |
for opponent,      x . o
show all moves     x x .
                   o . .
             /     /   \    \
           /      /     \     \
     x o o    x . o    x . o    x . o
     x x .    x x o    x x .    x x .
     o . .    o . .    o o .    o . o
       |        |        |        |
     x o o    x . o    x . o    x . o
     x x x    x x o    x x x    x x x
     o . .    o . x    o o .    o . o
xkcd

simple ttt negamax

def negamax(psn, ptm): # position, player-to-move
  if psn.has_win(ptm):
    return 1  # previous move created win
  L = psn.legal_moves()
  if len(L) == 0:
    return 0  # board full, so draw
  so_far = -1  # best score so far
  for cell in L:
    psn.brd[cell] = ptm
    nmx = negamax(psn, opponent(ptm))
    so_far = max(so_far,-nmx)
    psn.brd[cell] = Cell.e   # reset brd by erasing cell
  return so_far

# solves 3x3 ttt with 740 170 calls
#                9! = 362 880
# nodes at most       986 410
# can we do better ?

transpositions

  • ttt search space is not a tree: it has undirected cycles

  • ttt search space is directed-acyclic (no directed cycles) graph, or dag

  • move sequences that yield the same position are called transpositions

transposition example
                 x o .
                 o . x
                 o x .
               /       \
         x o .           x o o
         o o x           o . x
         o x .           o x .
           |               |
         x o .           x o o
         o o x           o . x
         o x x           o x x
	       \       /
                 x o o
                 o o x
                 o x x

pruning ttt game trees

  • how many ttt states need to be examined to find minimax value of corner opening move?

    • a) about 8!     b) far fewer than 8!

    • answer b)           why?

  • nodes can be pruned

  • win detection: in child minimax for loop, abort if winning child found (prune remaining siblings)

    • solve ttt with 129 988 nodes

  • early draw detection: return draw once neither player can win

  • save minimax values in transposition table, by position

  • before solving position

    • is position in table (transposition) ?

    • is isomorphic position in table (symmetry) ?

number ttt states

  • how big is ttt search space?

  • root, 9 children, 9*8 grandchildren, 9*8*7 greatgrandchildren, …

  • total number nodes < 1 + 9 + 9*8 + 9*8*7 + ... + 9*8*7*6*5*4*3*2*1 = 986 410

  • why < ?

    • some terminal nodes not at max depth (because win is a leaf node)

    • transpositions

  • if we treat search space as a tree (ignore transpositions, allow a position in multiple nodes) then above is a reasonable estimate

  • how much of a space reduction do we get if allow each position to appear in at most one node?

    • number different possible positions = 39 = 19 683

    • each of 9 board positions is empty/ x / o

    • represent each location with 0 / 1 / 2

    • so each board position is represented by a 9-digit base 3 number

  • good news: many of these positions are not reachable

  • also good news: many of these states isomorphic

    • from root state: only 3 non-iso moves, so expect 6 500 nodes

    • alphabeta search from root examines 3025 non-iso states

ttt board representation example
eg. this position       represented by this number:

   x . o
   x o .           102 120 010 (base 3) = 8 427 (decimal)
   . x .

an unreachable ttt position:

   x . o
   x . o
   x . o
minimax pruning example, min/max format
recall min/max format:
  p1 max, p2 min, show p1 values

         a
      /     \
     b       c
    / \     /|\
   d   e   f g h
   7   5   3 ? ?

assume dfs with children explored left-to-right
         *
      /     \
     5      <=3
    / \     /|\
   7   5   3 ? ?

after examining node     we know ...
  d                      val(b) <= 7
  e                      val(b) = min(7,5) = 5
  c                      val(c) <= 3

now, no need to learn more about val(c)
... val(c) <=3   <   val(b)
... p1  prefers   b   to   c
... so prune remaining children of c from our search

alpha-beta search, negamax format

recall negamax format:
  value(t) is for *** player to move from t ***
  p1_score( t ) = - p2_score( t )

def abnega(v, alpha, beta):
  if terminal(v):
    return eval(v)

  so_far = neg_infinity  # best score so far
  for each child c of v:
    so_far = max(so_far, - abnega( c, - beta, - alpha) )
    alpha = max( alpha, so_far )
    if alpha >= beta:
      break
   return so_far

abnega(root, neg_infinity, infinity)
alphabeta trace example
         A
      /     \
     B       C
    / \     /|\
   D   E   F G H
   7   5   3 1 9

directory /games-puzzles-algorithms/simple/alphabeta
python3 alphabeta.py < t1.in

A -999 999 -999
  B -999 999 -999
    D -999 999 -999
    D leaf value 7
    D now best child of B
    D improved alpha( B ) to -7
    E -999 7 -999
    E leaf value 5
    E now best child of B
    E improved alpha( B ) to -5
  B -5 999 -5
  B now best child of A
  B improved alpha( A ) to 5
  C -999 -5 -999
    F 5 999 -999
    F leaf value 3
    F now best child of C
    F improved alpha( C ) to -3
    alpha >= beta, prune remaining children of C
  C -3 -5 -3
A 5 999 5

search enhancements

  • αβ search with no enhancements can be weak

    • eg. class code, ttt -n 4 -m 4 -k 3 alpha_beta is weak player

  • good news: many enhancements strengthen αβ

  • eg. move ordering: to maximize pruning, consider children in order from strongest to weakest

    • child strength not known apriori; use heuristic to guess

  • threat-search: check for wins or lose-threats (forced moves) first

    • win-threat: if player has winning move, make it

    • lose-threat (ttt): if opponent has next-move-win, player must block at that cell: all other moves lose and can be pruned

  • tictactoe alphabeta revisions:

    • initialize (-) infinity = (-) 1     (why ?)

    • answer: then search stops once win/loss found

    • before starting search, order children so that win-threats precede lose-threats precede rest of moves (why ?)

    • answer: that gives this order of execution

      • if there is a winning move, make it

      • elif the opponent can win, block one such move,

      • else find best other move

  • heuristic + limited depth αβ search basis of many strong chess programs

  • slides: tt etc enhancements