Recall, from conditional plan evaluation, we had that:
let’s write it as:
where \U_{\pi}(s) is the conditional plan evaluation starting at each of the initial states.
You will notice, then the utility of b is linear on b for different policies \alpha_{\pi}: At every belief b, there is a policy which has the highest U(b) at that b given be the alpha vector formulation. Additional Information top action you can just represent a policy out of alpha vectors by taking the top (root) action of the conditional plan with the alpha vector on top. optimal value function for POMDP with alpha vector Recall:
NOTE! This function (look at the chart above from b to u) is: piecewise linear convex (because the “best” (highest) line) is always curving up and so, for a policy instantiated by a bunch of alpha vectors \Gamma, we have:
To actually extract a policy out of this set of vectors \Gamma, we turn to one-step lookahead in POMDP one-step lookahead in POMDP Say you want to extract a policy out of a bunch of alpha vectors. Let \alpha \in \Gamma, a set of alpha vectors.
where:
and
alpha vector pruning Say we had as set of alpha vectors \Gamma: \alpha_{3} isn’t all that useful here. So we ask: “Is alpha dominated by some \alpha_{i} everywhere?” We formulate this question in terms of a linear program:
where \delta is the gap between \alpha and the utility o subject to:
if \delta < 0, then we can prune \alpha because it had been dominated. if each value on the top of the set