Searching

Search Algorithms

A “search algorithm” accepts a search problem and returns a search solution

For now, we’re going to use trees to express our state-space graph

Abstract Romania

Tree Romania

Root node is the initial state. We can “expand” a node, populating (or generating) child nodes via a result function,

That’s the core idea of search: get our options, follow-up on one and set aside other options for later.

The set of unexpanded nodes is called the frontier

Graph Romania

Rectangular Grid Graph Search

The “interior” and “exterior” nodes are separated by the “frontier” nodes.

How do we decide the best course of search?

A general approach is best-first search where some node \(n\) is chosen that has the minimum value of an evaluation function \(f(n)\) of all available nodes.

Then, if the state is the goal state, return, otherwise expand the node and add child nodes to the frontier (if the nodes aren’t already in the frontier). If a node already exists, but the path is shorter (less costly), then it is re-added with the less expensive path.

This algorithm will either return some indication of failure, or the goal node.

By tweaking \(f(n)\) , we can get different exact algorithms

Best First

Search Data Structures

Our algorithm requires some data structure to hold our search tree.

Our nodes need four basic components

node.STATE: what state does our node represent?
node.PARENT: node which generated this node
node.ACTION: the action that was applied to generate this node
node.PATH-COST: total cost of the path from the root node (initial state) to this node ( \(g(node)\) )

If we follow the pointers of the goal node back to the parent node, we get our path!

We also need a data structure for the frontier (a queue is a good choice), we need the following operations:

IS-EMPTY(frontier): return true if frontier is empty
POP(frontier): remove top node and return it
TOP(frontier): return (but don’t remove) top node of frontier
ADD(node, frontier): insert node into queue

There are different types of queues (three are used for searches):

Priority Queue: first pops node with minimum cost (best-first search)
FIFO queue: first pops the node added first (breadth-first search)
LIFO queue: first pops the node most recently added (also called stack), (depth-first search)

Reached states can be stored in some lookup table (dictionary or hash table), where key is a state and value is the node for that state

Redundant Paths

Let’s look at the tree again:

Graph Romania

Notice there’s a path from Arad from Sibu back to Arad

Arad is a repeated state in the search tree resulting from a cycle (loopy path)

This means that even with only 20 states, we already have an infinite complete search tree.

A cycle is a special case of a redundant path, we can reach several nodes through more costly means

Let’s talk about a \(10 \times 10\) grid world:

We can reach any of the 100 spaces in 9 or fewer moves, but there are ~\({8^9}\) paths of length 9 (>100M), average path of about 1M. So… we can speed up our path by about 1M times by eliminating redundant paths.

So how can we address this?

A. remember all reached states (like best-first). If there are many redundant paths, then this is good, especially when your table of states can fit in memory

B. Don’t care about repeated states. If problem domain doesn’t have repeated states, (or they are rare). Or if memory is constrained.

C. Detect cycles but not redundant paths. This is more computationally expensive, but requires no more memory. We can either follow the entire chain, or only check a few links.

Measuring performance

Four ways: How might we measure performance?

Completeness: Do we correctly report both the presence of success and the absence of it?
Cost optimality: Does it find the lowest-cost path?
Time Complexity: This could be some unit time, or just steps taken
Space complexity: How much memory does it need?

Completeness for finite problems is easier, as long as we account for loops and keep track of paths, we can explore every reachable state.

Infinite states are much harder, as we can always reach new states, but leave other parts entirely unexplored

This requires a systematic approach… consider the infinite \(2D\) grid again. However, if there is no solution, a sound algorithm will search forever

Uninformed Search Strategies

(reminder: we don’t know how close to the goal we are)

Breadth-first Search

Systematic strategy, works in infinite spaces

A FIFO queue, keeps us in order and allows for early goal testing, as opposed to the late goal testing of the best-first search.

Let’s draw it!

This always finds a solution (should it exist) with a minimal number of actions. However watch how the the states grow…

\[ 1+b+b^2+b^3+...+b^d=O(b^d) \]

YIKES

For example, consider a branching problem with \(b=10\), procecessing 1M nodes/s, and 1 Kbyte/node. To search to depth \(d=10\) would take < 3h, but 10TB memory!

Memory requirements dominate BFS

But at \(d=14\), even assuming infinite memory, the search would take 3.5y.

In general, exponential complexity problems can’t be solved by an uninformed search, unless the space is very small.

Dijkstra’s Algorithm (uniform-cost search)

When actions have different costs, we can use the best-first search with the evaluation function being the path-cost to current node.

This algorithm check for goals upon node expansion rather than node generation

Uniform Cost Romania

The complexity of Uniform Cost is in terms of \(C^*\) (cost of optimal solution) and \(\epsilon\) (lower bound on cost per action) with \(\epsilon > 0\)

The worst case time and space complexity is:

\[ O(b^{1+\lfloor C^* / \epsilon \rfloor}) \]

which might be greater than \(b^d\) . Uniform-cost search can explor large trees of low-cost actions first, rather than taking a leap to large-cost but optimal actions.

When all costs are equal: it’s just \(b^{d+1}\)

Depth First Search

Let’s draw!

For finite trees, it’s efficient and complete, for acyclic state spaces, it can re-expand the same space many times but will explore the whole space, for cycles it is prone to infinite loops.

For infinite spaces, it can just barrel down an infinite path, incompletely filling the search space.

For proper application, it is very memory efficient (no reached table) and the frontier is small. This can be made even more efficient with its backtracking variant.