These are basically MDP methods but tweaked. We make some changes: for everywhere that we need a state, we use a belief to sample the next state given an action (random next step), we call our generative model to get a new observation, and call update(b,a,o) with our filter to propegate our belief forward if we need an action-value, we use the one-step lookahead in POMDP: \begin{equation} Q(b,a) = R(b,a)+\gamma \left(\sum_{o}^{}P(o|b,a) U^{\Gamma}(update(b,a,o))\right) \end{equation} where,

\begin{equation} R(b,a) = \sum_{s}^{} R(s,a)b(s) \end{equation}
\begin{align} P(o|b,a) &= \sum_{s}^{} p(o|s,a) b(s) \\ &= \sum_{s}^{} \sum_{s’}^{} T(s’|s,a) O(o|s’,a) b(s) \end{align}

and where, if needed (i.e. most algorithms estimate this):

\begin{equation} U^{\Gamma}(b) = \max_{\alpha \in \Gamma} \alpha^{\top} b \end{equation}

we also revise our generative model: each step requires belief and action, and we sample from our belief a next state, propegate belief forward, and use a traditional generative model to get the rewards and next states (which we don’t use). Receeding Horizon: plan to a depth d, select action, replan Rollout with Lookahead: simple to implement, no grantees of optimality or even boundedness Forward Search: quite expensive—exponential given the size of horizon monte-carlo tree search, but instead our counts are stored not in terms of states (which we don’t know), but sequences of action observations: h = a_1o_2a_2o_1a_2o_1 etc. Then, the counter takes N(h,a) as input: will head towards optimality and it requires a generative model to sample tracks Branch and Bound, but you use the POMDP Approximation methods to estimate the upper and lower bounds of your utility: its Forward Search with pruning Sparse Sampling: its Forward Search, but the next action-value is determined by a finite sampling of next observations and rewards and you average their future utility. that is, the action-value before depth d is obtained by: Q(b,a) = \frac{1}{m} \sum_{i=1}^{m} \left(r_{a}^{(i)}+\gammaU_{d-1}(Update(b,a,o_{a}^{(i)}))\right)

[[curator]]
I'm the Curator. I can help you navigate, organize, and curate this wiki. What would you like to do?