Search with Nondeterminate Actions

Search with Non-deterministic Actions

Remember back in chapter 3, we were talking about fully observable, deterministic, known, environments?

We could find our path then just… follow it. And never need to use our following percepts.

If our environment is partially observable, then we can’t be sure what state we’re in

If our environment is non-deterministic then we don’t know how the states transition after taking an action

Whatever state we believe ourselves to be in is called our belief state

When our solution is no longer simply a sequence in a deterministic state, we must use a conditional plan

Erratic Vacuum World

Describe to me the Vacuum World

Actions
Goal
Environment

Describe the action sequence to go from State 1 to a Goal State

Vacuum World

We could imagine some erratic vacuum world where the suck action:

Cleans the current square and sometimes the next square over
When applied to a clean square it sometimes deposits dirt on the square (lol)

Now we have to do some more work (sigh), instead of just having our transition model return a state, we need a set of possible states:

\[ RESULTS(1,Suck)=\{5,7\} \]

How might this change our conditional plans, since a simple sequence will no longer suffice?

\[ [Suck, \mathbf{if}\ State = 5\ \mathbf{then}\ [Right, Suck]\ else\ []\ ]. \]

Many problems you might encounter in “real life” must be dealt with in a similar way. This is also why you may choose to keep your eyes open when walking to class.

And-OR search trees

When dealing with deterministic problems, we used search trees, with branching determined by the agent

With an OR node, the agent can decide on one action or another

However, sometimes we have to take the environment into account, and formulate a plan based on contengencies

AND-OR Tree

So… when our solution requires contingencies, instead of returning an action sequence, we must instead return what?

AND-OR Search Pseudocode

Where do we return failure? Why?
What kind of search is this?
How would we have to adjust the heuristic function?

Slippery Vacuum World

Consider the normal vacuum world, except movements sometimes fail. For example:

\[ RESULTS(1,Right)=\{1,2\} \]

One problem… there’s no acyclic solution to solve this. Our \(AND-OR-SEARCH\) would return only failure. So… we have to find our cyclic solution:

\[ [Suck, \mathbf{while}\ State=5\ \mathbf{do}\ Right,Suck]. \]

Because we’re allowing cyclic solutions, we should also consider why. What is a condition where retrying will work, when will it not work?

Slippery Vacuum Tree

We could understand the difference by reconsidering our environment:

Fully observable, nondeterministic) vs (Partially observable, deterministic). Elaborated in Ch. 12

Search in Partially Observable Environments

What about when our percepts aren’t enough?

Our actions will have to reduce our uncertainty!

When our percepts provide no information at all then we call it a sensorless problem (or conformant problem)

Sometimes pursuing a solution with a sensorless plan is better than relying on sensors! Example? Medicine?

Consider Vacuum world again… What if we know our geography but not our location nor the dirt locations?

\[ BELIEF: \{1,2,3,4,5,6,7,8\} \]

But if we move \(RIGHT\) once, then we’re left with \(\{2,4,6,8\}\)!

We gained information without sensing anything!

After \([Right, Suck]\), we can only be in \(\{4,8\}\) .

After \([Right, Suck, Left, Suck]\), we can only be in state… what?

This is called Coercion.

If we consider our belief-space, the problem is fully-observable, we always know our own belief!

And our solution is a sequence of actions (not a conditional plan), because our percepts are always known (empty).

This is always true even if the environment is nondeterministic.

Let’s try to transform our actual (physical) problem into a belief-state problem:

States:
- Our belief-state space contains every possible subset of physical states. If our problem \(P\) has \(N\) states, we have \(2^N\) Belief states (though some may be unreachable)
Initial State:
- Our belief state is usually any given state in \(P\), though sometimes we may be able to narrow this down
Actions:
- Let’s assume that \(b=\{s_1,s_2\}\) but \(Actions_P(s_1)\neq Actions_P(s_2)\).
- If we assume that illegal actions have no effect then it’s safe to take a union: \(Actions(b)=\bigcup_{s\in b}Actions_P(s)\)
- But if not, then \(Actions(b)=Actions(s_1)\bigcap Actions(s_2)\)
Transition Model:
- For deterministic actions, we get one resultant state per possible state: \(b'=Result(b,a)=\{s':s'=Result(s,a)\ and\ s \in b \}\)
- However, if we’re nondeterministic, then we must also consider possible results: \(b'=Result(b,a)=\{s':s' \in Results_P(s,a)\ and\ s \in b \} = \bigcup_{s\in b} Results_P(s,a)\)
- Is \(b'\) bigger or smaller than \(b\) with deterministic actions? What about nondeterministic?
Goal Test:
- The agent possibly achieves our goal if any state in \(s\) satisfies the goal test. It necessarily achives the goal if every goal satisfies the goal. Which of these is preferable?
Action cost:
- Tricky… as the same action could have different costs depending on the state. For now we assume the same cost. (Maybe homework, we’ll see).

We can see our reachable belief states here:

Sensorless, Deterministic, Vacuum World

Though there are \(2^8=256\) possible belief states, only \(12\) are reachable.
Usually, in a tree search, we can check new states to see if we have seen them before… here too!
- \([Suck,Left,Suck]\) has the same result as \([Right, Left, Suck]\)
- We can prune supersets, see \([Left]\), as any solution for a set must also belong to a superset (smaller sets easier)
- Also, if a superset has been found to be solvable, any subset is also solvable (a solution where we are very confused also works when less confused)
But alas, these spaces are very large, how large? Max size of single belief state?

Another approach is to try a incremental belief-state search where each possible belief state is explored one after another.

This approach can detect failure fast, usually when a belief state is unsolvable, subsets that consist of failed belief states, are also considered unsolvable.

Searching in Partially Observable Environments

Some problems are unsolvable without sensing (8-puzzle).

That said, how much sensing do I need to make the 8-puzzle solvable?

Local Sensing Vacuum Worlds

Consider a vacuum world, were we can get accurate information in the square the agent occupies:

We get some information about the state, though there are several states that can produce \([L,Dirty]\)

We can reason about the transition model in three steps:

Prediction: computing belief states
- \(\hat{b}=Result(b,a)\)
Possible percepts: computing the percepts that might be observed ( \(o\) fo observation)
- \(PossiblePercepts(\hat{b})=\{o:o=Percept(s)\ and\ s \in \hat{b}\}\)
Update: compute the belief states resultant from each possible percepts:
- \(b_0=Update(\hat{b},o)=\{s:o=Percept(s)\ and\ s\in \hat{b}\}\)

Altogether, we get the possible belief states resulting from an action and subsequent updates:

\[ Results(b,a)=\{b_o:b_o = Update(Predict(b,a),o)\ and\ o\in PossiblePercepts(Predict(b,a))\} \]

Solving Partially Observable Problems

Once we get our search tree… how do we solve it?

First Level of Search Tree

Look familiar?

Our solution is a conditional plan!

\[ [Suck, Right, \mathbf{if}\ Bstate=\{6\}\ \mathbf{then}\ Suck\ \mathbf{else}\ []] \]

Notice how this tests our belief state not the actual state. Why?

We could also use strategies from sensorless problems like checking for previously produced states

We can also develop incremental solutions as well, in a similar way.

An agent for partially observable environments

What are the minimum tasks of an agent?

Formulate the problem
Use a search algorithm to solve it
Execute the solution

What are the differences between this and an agent for a partially observable environment?

Our solution will be a conditional plan, rather than a sequence
The agent will have to remember its belief state, recieive percepts, then update the belief state (actually easier than calculating the observation)
- \(b' = Update(Predict(b,a),o)\)

We might consider a kindergarten vacuum world, where the agent can only perceive its current square, but any square could become dirty if not being currently cleaned.

Two prediction-update cycles in the KVW with local sensing Maintaining a belief state is one of the most important functions of a agent in this type of environment, notice that the previous update equation uses only a single percept at a time. Speed is of the essence, as the environment may change while we “calculate”.

Example (Discrete, deterministic sensors, nondeterministic actions)

We’re going to look at a state-estimation (belief-state maintenance) task called localization that is, find out where we are, given a map and a sequence of actions and percepts.

Let’s put a robot in a maze-like environment, equipped with four sonar sensors that locate obstacles or walls in any of the immediate cardinal directions, presented as a bit-vector (NESW).

Possible positions of robot

Unfortunately, the robot’s navigation system is broken and whenever it executes a \(Right\) action, it randomly moves in some direction.

So:

If we switch on the robot, and don’t know where it is, what is the belief state?
Okay, now our sensors give the following bit string (1011) and updates its belief \(b_0=Update(1011)\) What is \(b\) now?
What the agent attempts to execute \(Right\)? Where could it be?
Now, if our sensors detect (1010), where could it be? After \(b_0=Update(b_a,1010)\) ?

The location in the lower part of the image is the only possible result of:

\[ Update(Predict(Update(b,1011),Right),1010) \]

Given nondeterministic actions, \(Predict\) grows the belief state and \(Update\) shrinks it.

Sometimes… percepts don’t help localization much? Like when? Hallways?

What about when sensors are faulty?

Well… how often are they faulty?

Online Search Agents and Unknown Environments

So far, we’ve used offline search algorithms, that find a solution before ever acting.

Online searching involves planning and acting together

This is useful in a dynamic environment (or semi-dynamic) environment where sitting and thinking would be bad. OR in a nondeterministic environment where computation might be wasted thinking about possibilities that might come to pass but likely won’t.

However, there’s a cost, sokoban?

Mapping is a classic example, D&D, Minotaur?

Online Search Problems

Let’s assume the environment is deterministic and fully observable, but the agent only knows the following:

\(Actions(s)\), the legal actions in state \(s\)
\(c(s,a,s')\), the cost of applying \(a\) in state \(s\) to arrive at \(s'\) (only possible when the agent is aware \(s'\) is the outcome
\(IsGoal(s)\), the goal test

Also, \(Result(s,a)\) cannot be known unless the agent is in \(s\) and does \(a\). (sometimes the uncertainty can be reduced, depending on application)

Sometimes, the agent might have a heuristic function, \(h(s)\), that estimates distance to a goal.

Simple Maze, unknown environment

Usually the agent is trying to reach a goal with the lowest cost (other goals are possible).

We measure performance by using a competitive ratio. Comparing the performance to an agent that knew the environment in advance.

As mentioned before, online explorers are succeptable to dead ends from which there is no path to a goal state.

In general, there is no algorithm that can avoid dead ends in all state spaces…

Adversary Argument Examples

We can construct an environment that gives these agents great difficulty (dead ends or arbitrarily inefficient paths)

Dead ends can be a real hazard (staircases, ramps, cliffs, one-way streets, etc), these paths could be irreversible

We’ll talk about an algorithm that only really works in safely-explorable spaces (examples?).

Online Search Agents

If we’re interleaving searching and acting, then we need search strategies that are appropriate.

Offline algorithms explore a model of the search space, not the actual space. What search strategy that we’ve seen works well for physically searching?

Online Search (what kind?)

Online Local Search

Much like DFS, hill-climbing search also has this property of “locality”.

Actually, since it keeps only one state in memory, it’s already an online algorithm!

However, what’s our primary problem with hill-climbing (or gradient descent)?

There are locality reasons why we can’t easily fix them.

Random walk could be a solution to random restarts, and will always eventually reach the goal, given the space is finite and safely explorable.

Exponential Random Walk

Though this example is contrived, there are many real-world spaces in which this is the case.

Turns out, adding memory to the solution, rather than randomness, is more helpful!

1D Learning Real-Time A* Search

This is certain to find the goal in a finite, safely explorable, environment.