How do we deal with Markov Decision Process solution with continuous state space? Let there be a value function parameterized on \theta:
Let us find the value-function policy of this utility:
We now create a finite sampling of our state space, which maybe infinitely large (for instance, continuous):
where, S is a set of discrete states \{s_1, \dots, s_{m}\}. Now, what next? generally: Loop until convergence: Initialize u_{\theta} For all s_{i} \in S, let u_{i} = \max_{a} R(s,a) + \gamma \sum_{s’}^{}T(s’|s,a) u_{\theta}(s’), the utility at those discrete state samples s_{i} Then, fit a \theta so that U_{\theta}(s_{i}) is close to u_{i} to get T: get a finite sampling of next states, or fit a function to it. BUT: Convergence is not guaranteed. There are two main specific approaches to achieve this: global approximation linreg a best-fit line of state value vs. utility value polynomial fit a best-fit line, whereby U_{\theta}(s) = \theta^{T}\beta(s), where each \beta_{j}(s)=s^{j-1}. a frigin neural network (train a model with parameters \theta which produces the utility calculations for you M_{\theta}(s) = U_{\theta}(s)) local approximation make a sampling in your continuous state space to discretized it do any utility function thing you’d like (policy evaluation or value iteration) to get some set of \theta_{i}, which is the utility for being in each sampled discrete state s_{i} whenever you need to calculate U(s) of a particular state… linearly interpolate k nearest neighbor kernel smoothing