Public View
Suggest
Download this page (.md) Download entire wiki (.zip)
Clone entire wiki

continuous state MDP

Bellman Equation, etc., are really designed for state spaces that are discrete. However, we’d really like to be able to support continuous state spaces! Suppose we have: S \in \mathbb{R}^{n}, what can we do?
Discretization We can just pretend that our system is a discrete-state MDP by chopping the state space up into small blocks. If you do it, you can cast your V back to a step function. Recall that this could start exploding: for S \in \mathbb{R}^{n} and we want to divide each axes into k values, we will get k^{n} values!
Also, instead of using the same size grid for every state variable, we may use more steps on the states for which output sensitivity is higher.
Using a Function Use a traditional function approximation (linear regression, neural network, etc.) as a proxy.
To do this, we need a model f of the MDP such that we have s’ = f\left(s,a\right) such that s’ \sim T\left(.|s,a\right). You may also have stochastic models, namly we can add something like s’ = f\left(s,a\right) + \varepsilon where \epsilon \sim \mathcal{N} to make a more robust model. You can obtain f via data or via physics / expert design.
After we have this, given a state s, call \phi\left(s\right) the features of state s. Then, we write:

\begin{equation} V\left(s\right) = \hat{\t}^{T} \phi\left(s\right) \end{equation}

Recall also, if we determinize our MDP, we have the Bellman equation as:

\begin{equation} V\left(s\right) = R\left(s\right) + \gamma \max_{a} V\left(s’\right), \text{ where } s’ = T\left(s,a\right) \end{equation}

Now we have all the pieces to perform a particular type of value iteration:
Fitted Value Iteration Sample s_1, …, s_{n} \in S. Initialize parameters \theta to seed a model V_{\theta}\left(s\right) = \theta^{T} s.
Repeat, for each i = 1 … n
compute: y^{(i)} = R\left(s^{(i)}\right) + \gamma \max_{a} V\left(T\left(s,a\right)\right) update your model V_{\theta} as usual If your value has stochasticity, we can just run it 10 times, etc. and get a next state approximation in a monte-carlo way.

[[curator]]
I'm the Curator. I can help you navigate, organize, and curate this wiki. What would you like to do?