More Game Tree Searching

We get this new Minimax for some state \(s\) at some depth \(d\):

\[ H\text{-}Minimax(s,d)=\\ \begin{cases} Eval(s,MAX), &\text{if }IsCutoff(s,d)\\ max_{a\in Actions(s)}H\text{-}Minimax(Result(s,a),d+1),& \text{if }ToMove(s)=MAX\\ min_{a\in Actions(s)}H\text{-}Minimax(Result(s,a),d+1),& \text{if }ToMove(s)=MIN \end{cases} \]

Evaluation Functions

A “heuristic evaluation function” \(Eval(s,p)\), returns an estimate of utility for player \(p\), just like our map heuristics returned estimated distance

For terminal states \(Eval(s,p) = Utility(s,p)\) otherwise, it’s somewhere in the range \(Utility(loss,p) \le Eval(s,p) \le Utility(win,p)\)

So what do you think would make for a good evaluation function, what properties should it have?

  • Shouldn’t take too long (we’re trying to save time!)

  • We want the estimations to be correlated with winning! (what does a chance of winning even mean?)

Chess is fully-observable game and is deterministic, but by ending a search early, we introduce uncertainty about the final outcome of decisions.

We can think about a game state having certain features (like the pieces on the board), which we can define “categories” or “classes” of game states (e.g. two pawn vs one pawn endgames).

Generally, categories will still have paths that lead to wins, losses, or ties.

Evaluation functions usually use these features and categories

\(Eval\) may not know one state from another, but it can estimate the proportion of states with a certain outcome.

For example, what does your intuition say about the outcome of two-pawn vs one-pawn endgames?

Let’s say that \(82\%\) of previously encountered 2pV1p states have lead to a win, \(16\%\) to a tie \((1/2)\) , \(2\%\) to a loss \((0)\) .

We could evaluate the estimated utility of a state in this category as:

\[ (0.82\times1)+(0.02\times0)+(0.16\times1/2)=0.90 \]

Not a bad prediction!

Could you guess a problem with using this categorical approach?

Well, it requires a lot of categories…

The usual solution is assigning numerical contributions to each feature, then combining those.

This is how humans have analyzed chess long before computers

Pieces have value, certain arrangements have value, etc.

We can treat this as a weighted linear function

\[ Eval(s)=w_1f_1(s)+w_2f_2(s)+...+w_nf_n(s)=\sum^n_{i=1}w_if_i(s) \]

Where each \(f\) is a feature and \(w\) is a weight (how valuable that feature is), the weights should be normalized such that \(Eval(s)\) is always between \((0)\) and \((+1)\)

There’s a pretty big assumption we’re making her (with some nasty consequences!) View this game:

Chess game, differing by only one piece

A linear sum ignores the contribution of other features relative to any given feature.

The state-of-the-art don’t use linear combinations, but rather nonlinear combinations. E.g. Having two bishops may more valuable than twice the value of one bishop, who’s value may change based on the game stage.

None of this is in chess rules… so where does it come from? Spend some time and talk about it!

  • We need human experience

  • When human experience is unavailable, machine learning will do (Ch. 22)

The horizon effect is a bit trickier to deal with. Consider a situation where a really bad outcome is unavoidable… but can be delayed:

There is no escape (6 ply)

In the example, with black to move, the black bishop is surely doomed… but if you are exploring several plys deep, it may look like you can trade some pawns for the bishop to escape… you can’t. It just costs you more.

This can me mitigated by allowing singular extensions, which allow the noting of “clearly better moves” and allow further investigation.

Forward Pruning

Alpha-Beta pruning removes subtrees which have no effect on the final evaluation (right?)

Forward Pruning allows the pruning of seemingly bad moves which might be good later, but at the cost of making a resulting mistake

Shannon would have called this a Type B strategy

Humans do this

\(PROBCUT\) algotithm (Buro, 1995), is a forward-pruning Alpha-Beta search.

It takes advantage of gathered statistics regarding actions to not only remove moves provably outside the \([\alpha,\beta]\) window, but also moves probably outside the \([\alpha,\beta]\) window.

“How likely is it that a future move resultant from this move is any good?”

\(LOGISTELLO\) Buro’s Othello program could beat the normal version \(64\%\) of the time, even with the normal version was given double the time!

Late move reduction is another viable strategy only if move ordering has been done well

That is, moves appearing later in the search are less likely to be good, so we can reduce our depth of search (with the option to rerun a deeper search if something comes up promising)

Taken together, this isn’t a bad chess player.

Presume we have our \(Eval\), with a decent cutoff, and quiescence search.

Also presume we have a good performing implementation, which can evaluate \(1M\) nodes per second on a modern PC.

Given the branching factor of chess being \(35\), and \(35^5\) is roughly \(50M\), we can use \(Minimax\) to look ahead \(5\) ply in under a minute.

Though… we couldn’t do more given competition rules, and human players can look 6-8 ply ahead.

Using Alpha-Beta search, and a large transposition table, we can achieve 14 ply, which is expert-level play.

Using a “real computer” workstation with 8 GPU, and \(1B\) nodes per second… but we still need a very well tuned \(Eval\) and database of ending moves to reach Grandmaster status. \(STOCKFISH\) does all of this and often reaches \(d\ge30\) and far exceeds any human player.

Search vs Lookup

Early in the semester we discussed using lookup tables to make decisions. Generally, these tables are large and unwieldly (sometimes infinite), making them poor substitutes for an intelligent search strategy.

However, the beginning and end of many games are much more constrained than the middle

Modern chess makes heavy use of opening theory, and we can use the intuitions of humans to inform our computers, we can go 10-15 ply in before needing to switch to search.

The ends of games also have far fewer moving pieces (only so many valid moves left)

For example: (KRK) ending vs (KBNK) ending

Every ending with 7 or fewer pieces has been solved in this way, they’re ~400T entries, 8 piece endings would require ~40Q entries.

Instead of using a heuristic function, the utility is estimated by running several simulations of complete games starting from that state, then averaged

We “roll the dice” over and over, letting the rules decide who wins, rather than estimating

How should moves be decided during these (playouts/rollouts)?

And what information do we get from these simulations?

To get useful information, we need a playout policy

So… how do we decide the initial position to start the playouts, and how many playouts per position?

Pure Monte Carlo search performs \(N\) simulations and keeps track of win percentage

As \(N\) increases, then this will converge to optimal play (for some stochastic games)

But… usually we need more. We need a selection policy, that focuses effort on the most important parts of the game tree

This requires a balance of exploration and exploitation

MCTS does this by maintaining a search tree, and growing it on each iteration of four steps:

  • Selection

  • Expansion

  • Simulation

  • Back-propagation

One iteration of MCTS

Upper Confidence Bounds Applied to Trees

A very good selection policy is ranks moves based on a confidence bound formula:

\[ UCB1(n)=\frac{U(n)}{N(n)}+C\times\sqrt{\frac{logN(Parent(n))}{N(n)}} \]

Where \(U(n)\) is utility of all playouts that went through node \(n\), \(N(n)\) is the number of playouts through \(n\), and \(Parent(n)\) is the parent node of \(n\).

So: \(\frac{U(n)}{N(n)}\) is the average utility of \(n\)

The term with the square root is the exploration term, which will be high for terms not well explored

\(C\) is a constant to balance the other two terms, usually game programmers have to try several values and pick from the best

Monte Carlo strategies are good for games:

  • with high branching rations

  • or when \(Eval\) is hard to define

  • or for new games, where conventional wisdom doesn’t exist yet

However, in games where a single move could turn the tide of the game entirely, it doesn’t work as well (it might fail to consider that move entirely!)

It also doesn’t pick out “obvious” moves very well, as it doesn’t use any information except for the rules of the game

Historically Alpha-Beta search did better for games like chess, but recent Monte Carlo approaches are doing well in chess and other games like it.