Approximate Value Function — Jemoka Knowledge Base

How do we deal with Markov Decision Process solution with continuous state space? Let there be a value function parameterized on \theta:

\begin{equation} U_{\theta}(s) \end{equation}

Let us find the value-function policy of this utility:

\begin{equation} \pi(s) = \arg\max_{a} \left(R(s,a) + \gamma \sum_{s’}^{} T(s’|s,a) U_{\theta}(s’)\right) \end{equation}

We now create a finite sampling of our state space, which maybe infinitely large (for instance, continuous):

\begin{equation} S \in \mathcal{S} \end{equation}

where, S is a set of discrete states \{s_1, \dots, s_{m}\}. Now, what next? generally: Loop until convergence: Initialize u_{\theta} For all s_{i} \in S, let u_{i} = \max_{a} R(s,a) + \gamma \sum_{s’}^{}T(s’|s,a) u_{\theta}(s’), the utility at those discrete state samples s_{i} Then, fit a \theta so that U_{\theta}(s_{i}) is close to u_{i} to get T: get a finite sampling of next states, or fit a function to it. BUT: Convergence is not guaranteed. There are two main specific approaches to achieve this: global approximation linreg a best-fit line of state value vs. utility value polynomial fit a best-fit line, whereby U_{\theta}(s) = \theta^{T}\beta(s), where each \beta_{j}(s)=s^{j-1}. a frigin neural network (train a model with parameters \theta which produces the utility calculations for you M_{\theta}(s) = U_{\theta}(s)) local approximation make a sampling in your continuous state space to discretized it do any utility function thing you’d like (policy evaluation or value iteration) to get some set of \theta_{i}, which is the utility for being in each sampled discrete state s_{i} whenever you need to calculate U(s) of a particular state… linearly interpolate k nearest neighbor kernel smoothing