Searching

Search Algorithms

A “search algorithm” accepts a search problem and returns a search solution

For now, we’re going to use trees to express our state-space graph

Abstract Romania

Tree Romania

Root node is the initial state. We can “expand” a node, populating (or generating) child nodes via a result function,

That’s the core idea of search: get our options, follow-up on one and set aside other options for later.

The set of unexpanded nodes is called the frontier

Graph Romania

Rectangular Grid Graph Search

The “interior” and “exterior” nodes are separated by the “frontier” nodes.

Best First

Search Data Structures

Our algorithm requires some data structure to hold our search tree.

Our nodes need four basic components

  • node.STATE: what state does our node represent?

  • node.PARENT: node which generated this node

  • node.ACTION: the action that was applied to generate this node

  • node.PATH-COST: total cost of the path from the root node (initial state) to this node ( \(g(node)\) )

If we follow the pointers of the goal node back to the parent node, we get our path!

We also need a data structure for the frontier (a queue is a good choice), we need the following operations:

  • IS-EMPTY(frontier): return true if frontier is empty

  • POP(frontier): remove top node and return it

  • TOP(frontier): return (but don’t remove) top node of frontier

  • ADD(node, frontier): insert node into queue

There are different types of queues (three are used for searches):

  • Priority Queue: first pops node with minimum cost (best-first search)

  • FIFO queue: first pops the node added first (breadth-first search)

  • LIFO queue: first pops the node most recently added (also called stack), (depth-first search)

Reached states can be stored in some lookup table (dictionary or hash table), where key is a state and value is the node for that state

Redundant Paths

Let’s look at the tree again:

Graph Romania

Notice there’s a path from Arad from Sibu back to Arad

Arad is a repeated state in the search tree resulting from a cycle (loopy path)

This means that even with only 20 states, we already have an infinite complete search tree.

A cycle is a special case of a redundant path, we can reach several nodes through more costly means

Let’s talk about a \(10 \times 10\) grid world:

We can reach any of the 100 spaces in 9 or fewer moves, but there are ~\({8^9}\) paths of length 9 (>100M), average path of about 1M. So… we can speed up our path by about 1M times by eliminating redundant paths.

So how can we address this?

A. remember all reached states (like best-first). If there are many redundant paths, then this is good, especially when your table of states can fit in memory

B. Don’t care about repeated states. If problem domain doesn’t have repeated states, (or they are rare). Or if memory is constrained.

C. Detect cycles but not redundant paths. This is more computationally expensive, but requires no more memory. We can either follow the entire chain, or only check a few links.

Measuring performance

Four ways: How might we measure performance?

  • Completeness: Do we correctly report both the presence of success and the absence of it?

  • Cost optimality: Does it find the lowest-cost path?

  • Time Complexity: This could be some unit time, or just steps taken

  • Space complexity: How much memory does it need?

Completeness for finite problems is easier, as long as we account for loops and keep track of paths, we can explore every reachable state.

Infinite states are much harder, as we can always reach new states, but leave other parts entirely unexplored

This requires a systematic approach… consider the infinite \(2D\) grid again. However, if there is no solution, a sound algorithm will search forever

Uninformed Search Strategies

(reminder: we don’t know how close to the goal we are)

Systematic strategy, works in infinite spaces

A FIFO queue, keeps us in order and allows for early goal testing, as opposed to the late goal testing of the best-first search.

Let’s draw it!

This always finds a solution (should it exist) with a minimal number of actions. However watch how the the states grow…

\[ 1+b+b^2+b^3+...+b^d=O(b^d) \]

YIKES

For example, consider a branching problem with \(b=10\), procecessing 1M nodes/s, and 1 Kbyte/node. To search to depth \(d=10\) would take < 3h, but 10TB memory!

Memory requirements dominate BFS

But at \(d=14\), even assuming infinite memory, the search would take 3.5y.

In general, exponential complexity problems can’t be solved by an uninformed search, unless the space is very small.

The complexity of Uniform Cost is in terms of \(C^*\) (cost of optimal solution) and \(\epsilon\) (lower bound on cost per action) with \(\epsilon > 0\)

The worst case time and space complexity is:

\[ O(b^{1+\lfloor C^* / \epsilon \rfloor}) \]

which might be greater than \(b^d\) . Uniform-cost search can explor large trees of low-cost actions first, rather than taking a leap to large-cost but optimal actions.

When all costs are equal: it’s just \(b^{d+1}\)