The primary challenge with searching game trees is that they can grow very large.
Even using normal Alpha-Beta Pruning, we still have to evaluate many subtrees to find their child terminal nodes
Instead we can apply a heuristic evaluation function to states, allowing us to pause our search early
So we can replace \(Utility\) with \(Eval\), which estimates the utility of a state.
Our \(IsTerminal\) can also be replaced by a cutoff test (which always is true for a terminal state)
We get this new Minimax for some state \(s\) at some depth \(d\):
\[ H\text{-}Minimax(s,d)=\\ \begin{cases} Eval(s,MAX), &\text{if }IsCutoff(s,d)\\ max_{a\in Actions(s)}H\text{-}Minimax(Result(s,a),d+1),& \text{if }ToMove(s)=MAX\\ min_{a\in Actions(s)}H\text{-}Minimax(Result(s,a),d+1),& \text{if }ToMove(s)=MIN \end{cases} \]
A “heuristic evaluation function” \(Eval(s,p)\), returns an estimate of utility for player \(p\), just like our map heuristics returned estimated distance
For terminal states \(Eval(s,p) = Utility(s,p)\) otherwise, it’s somewhere in the range \(Utility(loss,p) \le Eval(s,p) \le Utility(win,p)\)
So what do you think would make for a good evaluation function, what properties should it have?
Shouldn’t take too long (we’re trying to save time!)
We want the estimations to be correlated with winning! (what does a chance of winning even mean?)
Chess is fully-observable game and is deterministic, but by ending a search early, we introduce uncertainty about the final outcome of decisions.
We can think about a game state having certain features (like the pieces on the board), which we can define “categories” or “classes” of game states (e.g. two pawn vs one pawn endgames).
Generally, categories will still have paths that lead to wins, losses, or ties.
Evaluation functions usually use these features and categories
\(Eval\) may not know one state from another, but it can estimate the proportion of states with a certain outcome.
For example, what does your intuition say about the outcome of two-pawn vs one-pawn endgames?
Let’s say that \(82\%\) of previously encountered 2pV1p states have lead to a win, \(16\%\) to a tie \((1/2)\) , \(2\%\) to a loss \((0)\) .
We could evaluate the estimated utility of a state in this category as:
\[ (0.82\times1)+(0.02\times0)+(0.16\times1/2)=0.90 \]
Not a bad prediction!
Could you guess a problem with using this categorical approach?
Well, it requires a lot of categories…
The usual solution is assigning numerical contributions to each feature, then combining those.
This is how humans have analyzed chess long before computers
Pieces have value, certain arrangements have value, etc.
We can treat this as a weighted linear function
\[ Eval(s)=w_1f_1(s)+w_2f_2(s)+...+w_nf_n(s)=\sum^n_{i=1}w_if_i(s) \]
Where each \(f\) is a feature and \(w\) is a weight (how valuable that feature is), the weights should be normalized such that \(Eval(s)\) is always between \((0)\) and \((+1)\)
There’s a pretty big assumption we’re making her (with some nasty consequences!) View this game:
Chess game, differing by only one piece
A linear sum ignores the contribution of other features relative to any given feature.
The state-of-the-art don’t use linear combinations, but rather nonlinear combinations. E.g. Having two bishops may more valuable than twice the value of one bishop, who’s value may change based on the game stage.
None of this is in chess rules… so where does it come from? Spend some time and talk about it!
We need human experience
When human experience is unavailable, machine learning will do (Ch. 22)
We need to cut our search off and apply our heuristic \(Eval\) to save on performance.
How do we do this?
Cutoff at some depth \(d\)
Iterative deepening
We have to be careful with being to simple (witness last chess example), if only counting by material, black can be expected to win in both situations. But a quick look ahead tells us that white has the advantage in case (b).
So… our \(Eval\) only really works on states that are quiescent (no significant pending moves).
Nonquiescent states should be explored further, this incurs a quiescence search
The horizon effect is a bit trickier to deal with. Consider a situation where a really bad outcome is unavoidable… but can be delayed:
There is no escape (6 ply)
In the example, with black to move, the black bishop is surely doomed… but if you are exploring several plys deep, it may look like you can trade some pawns for the bishop to escape… you can’t. It just costs you more.
This can me mitigated by allowing singular extensions, which allow the noting of “clearly better moves” and allow further investigation.
Alpha-Beta pruning removes subtrees which have no effect on the final evaluation (right?)
Forward Pruning allows the pruning of seemingly bad moves which might be good later, but at the cost of making a resulting mistake
Shannon would have called this a Type B strategy
Humans do this
\(PROBCUT\) algotithm (Buro, 1995), is a forward-pruning Alpha-Beta search.
It takes advantage of gathered statistics regarding actions to not only remove moves provably outside the \([\alpha,\beta]\) window, but also moves probably outside the \([\alpha,\beta]\) window.
“How likely is it that a future move resultant from this move is any good?”
\(LOGISTELLO\) Buro’s Othello program could beat the normal version \(64\%\) of the time, even with the normal version was given double the time!
Late move reduction is another viable strategy only if move ordering has been done well
That is, moves appearing later in the search are less likely to be good, so we can reduce our depth of search (with the option to rerun a deeper search if something comes up promising)
Taken together, this isn’t a bad chess player.
Presume we have our \(Eval\), with a decent cutoff, and quiescence search.
Also presume we have a good performing implementation, which can evaluate \(1M\) nodes per second on a modern PC.
Given the branching factor of chess being \(35\), and \(35^5\) is roughly \(50M\), we can use \(Minimax\) to look ahead \(5\) ply in under a minute.
Though… we couldn’t do more given competition rules, and human players can look 6-8 ply ahead.
Using Alpha-Beta search, and a large transposition table, we can achieve 14 ply, which is expert-level play.
Using a “real computer” workstation with 8 GPU, and \(1B\) nodes per second… but we still need a very well tuned \(Eval\) and database of ending moves to reach Grandmaster status. \(STOCKFISH\) does all of this and often reaches \(d\ge30\) and far exceeds any human player.
Early in the semester we discussed using lookup tables to make decisions. Generally, these tables are large and unwieldly (sometimes infinite), making them poor substitutes for an intelligent search strategy.
However, the beginning and end of many games are much more constrained than the middle
Modern chess makes heavy use of opening theory, and we can use the intuitions of humans to inform our computers, we can go 10-15 ply in before needing to switch to search.
The ends of games also have far fewer moving pieces (only so many valid moves left)
For example: (KRK) ending vs (KBNK) ending
Every ending with 7 or fewer pieces has been solved in this way, they’re ~400T entries, 8 piece endings would require ~40Q entries.
Alpha Beta searches have limitations. Any guesses on the branching factor of Go?
\(Eval\) is hard to define for Go, material value is not a useful indicator and most positions could benefit either side until the endgame
Instead another strategy is used: Monte Carlo Tree Search (MCTS)
Instead of using a heuristic function, the utility is estimated by running several simulations of complete games starting from that state, then averaged
We “roll the dice” over and over, letting the rules decide who wins, rather than estimating
How should moves be decided during these (playouts/rollouts)?
And what information do we get from these simulations?
To get useful information, we need a playout policy
So… how do we decide the initial position to start the playouts, and how many playouts per position?
Pure Monte Carlo search performs \(N\) simulations and keeps track of win percentage
As \(N\) increases, then this will converge to optimal play (for some stochastic games)
But… usually we need more. We need a selection policy, that focuses effort on the most important parts of the game tree
This requires a balance of exploration and exploitation
MCTS does this by maintaining a search tree, and growing it on each iteration of four steps:
Selection
Expansion
Simulation
Back-propagation
One iteration of MCTS
A very good selection policy is ranks moves based on a confidence bound formula:
\[ UCB1(n)=\frac{U(n)}{N(n)}+C\times\sqrt{\frac{logN(Parent(n))}{N(n)}} \]
Where \(U(n)\) is utility of all playouts that went through node \(n\), \(N(n)\) is the number of playouts through \(n\), and \(Parent(n)\) is the parent node of \(n\).
So: \(\frac{U(n)}{N(n)}\) is the average utility of \(n\)
The term with the square root is the exploration term, which will be high for terms not well explored
\(C\) is a constant to balance the other two terms, usually game programmers have to try several values and pick from the best
Monte Carlo strategies are good for games:
with high branching rations
or when \(Eval\) is hard to define
or for new games, where conventional wisdom doesn’t exist yet
However, in games where a single move could turn the tide of the game entirely, it doesn’t work as well (it might fail to consider that move entirely!)
It also doesn’t pick out “obvious” moves very well, as it doesn’t use any information except for the rules of the game
Historically Alpha-Beta search did better for games like chess, but recent Monte Carlo approaches are doing well in chess and other games like it.