Public View
Suggest
Download this page (.md) Download entire wiki (.zip)
Clone entire wiki

Approximate Value Function

How do we deal with Markov Decision Process solution with continuous state space?
Let there be a value function parameterized on \theta:

\begin{equation} U_{\theta}(s) \end{equation}

Let us find the value-function policy of this utility:

\begin{equation} \pi(s) = \arg\max_{a} \left(R(s,a) + \gamma \sum_{s’}^{} T(s’|s,a) U_{\theta}(s’)\right) \end{equation}

We now create a finite sampling of our state space, which maybe infinitely large (for instance, continuous):

\begin{equation} S \in \mathcal{S} \end{equation}

where, S is a set of discrete states \{s_1, \dots, s_{m}\}.
Now, what next?
generally: Loop until convergence:
Initialize u_{\theta} For all s_{i} \in S, let u_{i} = \max_{a} R(s,a) + \gamma \sum_{s’}^{}T(s’|s,a) u_{\theta}(s’), the utility at those discrete state samples s_{i} Then, fit a \theta so that U_{\theta}(s_{i}) is close to u_{i} to get T: get a finite sampling of next states, or fit a function to it.
BUT: Convergence is not guaranteed.
There are two main specific approaches to achieve this:
global approximation linreg a best-fit line of state value vs. utility value polynomial fit a best-fit line, whereby U_{\theta}(s) = \theta^{T}\beta(s), where each \beta_{j}(s)=s^{j-1}. a frigin neural network (train a model with parameters \theta which produces the utility calculations for you M_{\theta}(s) = U_{\theta}(s)) local approximation make a sampling in your continuous state space to discretized it do any utility function thing you’d like (policy evaluation or value iteration) to get some set of \theta_{i}, which is the utility for being in each sampled discrete state s_{i} whenever you need to calculate U(s) of a particular state… linearly interpolate k nearest neighbor kernel smoothing

[[curator]]
I'm the Curator. I can help you navigate, organize, and curate this wiki. What would you like to do?