LRTDP

Real-Time Dynamic Programming RTDP is a asynchronous value iteration scheme. Each RTDP trial is a result of:

\begin{equation} V(s) = \min_{ia \in A(s)} c(a,s) + \sum_{s’ \in S}^{} P_{a}(s’|s)V(s) \end{equation}

the algorithm halts when the residuals are sufficiently small.
Labeled RTDP We want to label converged states so we don’t need to keep investigate it:
a state is solved if:
state has less then \epsilon all reachable states given s’ from this state has residual lower than \epsilon Labelled RTDP We stochastically simulate one step forward, and until a state we haven’t marked as “solved” is met, then we simulate forward and value iterate