When we did tree-search in Chapter 3, we had to use domain-specific heuristics
Now, we can make our search more efficient using domain-independent heuristics!
Which variable should be assigned next \((SelectUnassaginedValue)\), and in what order should its values be tried \((OrderDomainValues)\)
What inferences should be performed at each step in the search \((Inference)\)
Can we \(Backtrack\) more than one step (when appropriate)?
Can we save and reuse partial results from the search?
The backtracking algorithm contains the line
\[ var\leftarrow \mathrm{SelectUnassignedVariable}(csp,assignment) \]
What’s the simplest way to consider our variables?
Randomly
In order
Neither is optimal (generally).
Consider after assigning \(NT=red\) and \(NT=green\), there is only one legal value for \(SA\).
Actually, once \(SA\) is assigned, the values for \(Q,\ NSW,\ V\) are are forced!
The strategy of choosing the variable with the fewest remaining legal values is called minimum-remaining-values (MRV)… (also called “most constrained variable” or “fail-first” heuristic).
Why would we call this the fail-first heuristic?
The MRV wouldn’t help when ordering the first move, in that case we could use the degree-heuristic
Once a variable has been selected, we have to consider the values it could take…
An effective strategy is the least-constraining-value… why would this help?
Why would variable selection be fail-first but value selection be fail last?
We have seen how we can reduce the number of variables before a search, but it can be made even more effective during a search!
Every time we assign a variable, we can now take the opportunity to infer the domain reduction of another variable
One of the simplest ways to do this is forward checking. When a variable \(X\) is assigned, establish arc-consistency for each variable \(Y\) connected to it, removing the value of \(X\) from \(Y\)’s domain.
Maintaining Arc Consistency (MAC) is a more powerful algorithm which does more work with the potential to save more work.
The simplest strategy of backtracking is to undo the most recent decision, called chronological backtracking
Let’s consider a contrived example:
We’re trying to color the map with a fixed variable ordering:
\[ Q,\ NSW,\ V,\ T,\ SA,\ WA,\ NT \]
Suppose we have made the first few partial assignments:
\[ \{Q=red,NSW=green,V=blue,T=red\} \]
When trying to assign to the next variable… there are no legal values! So we backtrack one step and reassign Tasmania… not helping at all… (conflict set)
What’s a smarter way to do this?
But what about forward checking?
Despite the fact that forward checking prevents the same problems, backjumping is still a good idea… that is to backtrack in case of failure.
Backjumping detects failure when a variable’s domain becomes empty, but sometimes a branch is doomed long before that happens.
Consider this partial assignment: \(\{WA=red,NSW=red\}\), (which is inconsistent)
Suppose then, we assign \(T=red\), then \(NT,Q,V,SA\).
\(NT\) does have consistent variables in this case, but eventually we’ll run out of valid assignments.
So where to backtrack? Why?
Because of our set of preceding variables, all future assignments will result in a failed constraint satisfaction. So we should skip back to \(WA,NSW\) and not \(T\).
This is a deeper notion of our conflict set, that is our set of preceding variables that caused \(NT\), together with any subsequent variables.
If we backjump with this knowledge, that’s called conflict-directed backjumping
When a contradiction is reached, backjumping can bring us back to consider a variable which might fix the problem. Good… but we’d prefer to not run into the problem again.
When we find a contradiction, we know some subset of the conflict set is responsible.
Constraint Learning is finding the minimum set of variables (along with their respective values) that are causing the conflict. This set, is called a no-good (lol)
We can either add a constraint to the system that encodes the no-good, or have a separate cache we can check to forbid the combination.
Consider our previous figure showing partial assignments.
In the bottom row we have the state:
\[ \{WA=red,NT=green,Q=blue\} \]
Forward checking can identify this state as a no-good, but since it’s getting pruned anyway, it won’t help!
However, if this search tree is part of a larger search tree that was started by assigning values to \(V\) and \(T\), then it’s good to record the no-good since this absolutely will come up again for each possible assignment to \(V\) and \(T\).
No-goods can be used by forward checking or backjumping, and is vital to the performance of modern CSP solvers.
Local search can be a great way of solving CSPs (as seen in Chapter 4)
This means using a “complete-state formulation” Which means?
Let’s reconsider the 8-queens problem (each variable assumed to be in its own column)
Starting with a random, complete assignment, we’ll end up with a state with many constraint violations (probably).
Then we can randomly pick a conflicted variable, and choose a new assignment for it with the fewest number of conflicts (using the min-conflicts heuristic).
This approach can be very effective for certain types of problems:
n-queens CSP local search
This strategy solves even million-queens in an average of 50 steps.
Actually, not counting queen placement, the runtime is roughly independent of problem size
In the 1990s this caused a great deal of stir! Mostly around finding what constitutes an easy vs hard local search.
What say you?
\(n\)-queens may be considered easy for local search due to it’s high density of solutions for its problem states.
Min-conflicts also works for hard problems, it’s reduced scheduling tasks on the hubble telescope from three weeks to 10 minutes!
All the local search techniques from 4.1 work here (beware of plateaus!)
The previously ways of managing constraints still work, also we could use tabu search which involves keeping a record of recently visited states and forbidding reuse of those during search.
Constraint weighting is another strategy which assigns each constraint a weight (initially all one). The next value will be adjusted to cause the lowest conflict (adjusted by weight), then all conflicting variables’ weights are incremented. This results in the most troublesome variables getting assigned a high weight.
As we’ve seen with CSPs in general, sometimes just the act of phrasing the problem in the right way can make it easy to solve!
Dealing with the complexity of the real world is, in general, impossible. That is, without first decomposing the world into subproblems.
When coloring Australia, we can find that coloring Tasmania is an Independent subproblem* with respect to coloring the mainland… any solution for Tasmania can be combined with any solution from the mainland
Independence can be derived from the connected components of the constraint graph.
Each component corresponds to a subproblem \(CSP_i\).
If some assignment \(S_i\) is a solution of \(CSP_i\), then \(\bigcup_iS_i\) is a solution of \(\bigcup_i CSP_i\)(bang!)
Why do we care?
Let’s say each \(CSP_i\) has \(c\) variables, from the total \(n\) variables, where \(c\) is a constant.
Now, there are \(n/c\) subproblems, each taking at most \(d^c\) work to solve, where \(d\) is the size of the domain.
So… the total work is \(O(d^cn/c)\), which is linear in \(n\)(bang!)
As an example: dividing a boolean CSP with 100 variables into 4 subproblems reduces the time to solve the problem from the lifetime of the universe down to less than a second!
Completely independent problems are unfortunately… rare.
However, there are other structures that turn up. E.g. A constraint graph is a tree when any two variables are connected by only one path.
It can be shown that any tree-structured CSP can be solved in time linear in the number of variables!
The key is to observe a new kind of consistency, called directional arc consistency (DAC)
A CSP is DAC under an ordering of variables \(X_1,X_2,...,X_n\) iff every \(X_i\) is arc-consistent with each \(X_j\) for \(j>i\).
To solve a tree-structured CSP, pick any variable to be the root, then choose an ordering of variables such that each variable appears after its parent in the tree (topological sort).
Any tree with \(n\) nodes has \((n-1)\) edges, so it can be made arc-consistent with \(O(n)\) steps, each of which compares up to \(d\) possible domain values for two variables, giving a total time of \(O(nd^2)\).
Tree-like graph and possible ordering