Up until now our environments have have been single-agent, now we’ll consider competitive environments where two or more agents have conflicting goals “adversarial search”
Real world conflict is often complex and messy, so we’ll look at games!
Namely, games like: chess, Go, poker, etc. as physical sports are also very complicated
We can look at multi-agent environments in one of three ways:
With nontrivial games, we likely won’t be able to find an optimal choice (even with pruning), we have to stop thinking and make a move!
At each point we stop searching we use an evaluation function (a heuristic) to estimate who is currently winning
Or… we could do several fast simulations from that state, then average the results.
Though… several games include elements of chance or have imperfect information, (like poker)
The most common games to study in AI are, deterministic, two-player, turn-taking, perfect information, zero-sum games.
“Perfect Information” here, just means “fully observable”
“zero-sum” means what’s good for me is bad for you and vice versa (no win-win)
We also call actions moves and states positions
Let’s call our two players \(MIN\) and \(MAX\)
\(MAX\) will move first, and they take turns until the game ends.
At the end, points are awarded to the winner, and penalties are given to the loser.
Formally:
\(S_0\): The initial state (game setup)
\(ToMove(s)\): The player who has the turn at state \(s\)
\(Actions(s)\): The set of legal moves at state \(s\)
\(Result(s,a)\): The transition model
\(IsTerminal(s)\): True when the game is over, false otherwise. Game-ending states are called terminal states
\(Utility(s,p)\): A utility function (also objective function or payoff function), which gives a final numeric value to player \(p\) at game over. Chess would assign either \((1,\ 0,\ \frac{1}{2})\). Chess is still considered zero-sum… how if the sum isn’t zero?
As before, we can use \(S_0\), \(Actions\), and \(Result\) to define the state-space graph, which can superimpose a search tree over.
A complete game tree is one which follows every possible sequence of moves to every state until game end.
Is the game tree for the following games finite or infinite?
Tic-tac-toe
Chess
Checkers
Go
Sokoban
Partial game tree, tic-tac-toe
Tic-tac-toe has a relatively small number of terminal states upper bound?
But chess is way bigger…
\(MAX\) wants to find a sequence of moves leading to a win… but so does \(MIN\) and they’re at cross-purposes and they can both influence the sequence of moves.
Which means that we need a conditional plan!
If our game ends in a (win/lose), then a \(AndOr\) search is sufficient… but if the conditions are more complex, we need something more general, called minimax search.
Let’s consider a trivial game, in which each player takes a turn then the game ends.
Because some games count a “turn” as when each player has performed a move, we’ll use the term ply to mean one move by one player
Two-ply game tree
Given a game tree, the optimal strategy is given by \(Minimax(s)\), which gives the utility (for \(MAX\)) assuming both players play optimally from that point
\(MAX\) will prefer to move to a higher value and \(MIN\) will prefer a lower value.
\[ Minimax(s)=\\ \begin{cases} Utility(x,MAX), &\text{if }IsTerminal(s)\\ max_{a\in Actions(s)}Minimax(Result(s,a)),& \text{if }ToMove(s)=MAX\\ min_{a\in Actions(s)}Minimax(Result(s,a)),& \text{if }ToMove(s)=MIN \end{cases} \]
Let’s go apply this to our tree!
Our definition of optimal play depends on the other player playing optimally, is this a good assumption?
Will we do just as well when facing a suboptimal player?
Consider a situation where making a technically suboptimal move sets up an even better outcome than available from the technically optimal move. Is a 9/10 chance of winning better than a certain draw?
Algorithm for calculating optimal move using minimax
What kind of search is this?
If the maximum depth of the tree is \(m\) and the branching factor is \(b\), then the time complexity is \(O(b^m)\) and the space complexity is \(O(bm)\) (if we generate all options at once) or \(O(m)\) (if we generate them one at a time).
Chess has a branching factor of roughly \(35\) and has an average depth of \(80\) ply. Is pure minimax reasonable?
Many games allow more than two players, technically it’s easy to generalize minimax, but there are still interesting things to consider.
First, we need to replace the value of a node with a vector of values, so for players \(A, B, C\) we would produce a vector of \(\langle v_a,v_b,v_c\rangle\)
Why do we get three values for three players when we get only one value for two players?
We can just produce these by having the \(Utility\) function return a vector of utilities.
Let’s take a look:
Three ply three player game (best move marked at root)
Anyone every play Diplomacy or Catan?
Notice how much more is going on than in two-player games?
Who as ever made a formal or informal alliance in a multiplayer game with another player?
Are alliances natural natural consequence of optimal strategies? They can be!
For example if \(A\) and \(B\) are in weak positions relative to \(C\), then it’s better for both \(A\) and \(B\) to play against \(C\), rather than each other.
Sometimes, an alliance is an explanation of what would have happened anyway (by following a string of optimal moves)
Other times, there’s a social stigma (or worse) attached to breaking alliances, depending on how explicit the alliance (breaking a current alliance may cost you future alliances).
Can an alliance be advantageous in a two-player game?
The number of states increases exponentially with respect to depth of the tree. No algorithm can avoid this generally.
However, with pruning we can reduce the states to be considered.
Alpha-Beta Pruning
You may have noticed that the order in which the nodes are evaluated can have a big effect on how many nodes need to be evaluated
When done perfectly, alpha-beta would only need to evaluate \(O(b^{m/2})\) nodes (instead of \(O(b^m)\))!
This turns the effective branching ratio from \(b\) to \(\sqrt{b}\)! or from \(35 \rightarrow\ \sim6\) for chess!
This means in the same time, we can explore a tree of double the size
With random move ordering this gives about \(O(b^{3m/4})\) for moderate \(b\)
Now we can never get a perfect game with a mere ordering function (or it could play the game for us), but we can close to the best case
We could try moves that were effective in the past (opening theory)
Or could try iterative deepening
The “best move” is called the killer move, and trying them first is the killer move heuristic
In the past, we talked about culling repeated paths, in games, repeated states can occur from transpositions
We can cache the results of evaluating a certain state in a transposition table which can check for transpositions
Even with alpha-beta pruning and good move ordering, games like Chess and Go are still out of reach
In the very first paper on computer game-playing Programming a Computer for Playing Chess (Shannon, 1950), Claude Shannon points this out.
He suggests, two strategies type A and type B
Type A: Wide but Shallow… search tree to a certain depth then evaluate according to a heuristic function (most Chess programs do this)
Type B: Ignore moves that look bad, then follow promising moves as far as possible (most Go programs do this, higher \(b\))
However, Type B programs are claiming world-champion performance in many games including Chess.