a controller is a that maintains its own state. constituents X: a set of nodes (hidden, internal states) \Psi(a|x): probability of taking an action \eta(x’|x,a,o) : transition probability between hidden states requirements Controllers are nice because we: don’t have to maintain a belief over time: we need an initial belief, and then we can create beliefs as we’d like without much worry controllers can be made shorter than conditional plans additional information finite state controller A finite state controller has a finite amount of hidden internal state. Consider the crying baby problem. We will declare two internal state:
Given our observations and our internal states, we can declare transitions and an action probability \Psi: We essentially declare a policy vis a vi your observations. It can be a sequence, for instance, if we want to declare a policy whereby if you cry twice then you feed, you can declare: finite state controller evaluation \begin{equation} U(x, s) = \sum_{a}^{} \Psi(a|x) \left[R(s,a) + \gamma \left(\sum_{s’}^{} T(s’|s,a) \sum_{o}^{} O(o|a, s’) \sum_{x’}^{} \eta(x’|x,a,o) U(x’, s’)\right) \right] \end{equation} which is a conditional plan evaluation but we know even litle and, to construct alpha vectors:
we just make one alpha vector per node. So the entire plan is represented as usual by \Gamma a set of alpha vectors. And yes you can alpha vector pruning.
node we want to start at:
solving for \Psi and \eta policy iteration: incrementally add nodes and evaluate it nonlinear programming: this can be a nonlinear optimization problem controller gradient ascent