Key insight: suppose you have some fairly rare event and you want the likelihood of it. We can do this by drawing normal samples and reweighing them. Suppose we want p_{\text{fail}}; and we have q the proposal distribution and p the nominal distribution: \tau \sim q\left(\cdot\right), p_{\text{fail}} = \int 1 \left\{\tau \not\in \psi\right\} p\left(\tau\right) \dd{\tau } What if we define a weird 1 such that:

\begin{equation} 1 = \frac{q\left(\tau\right)}{q\left(\tau\right)} \end{equation}

let’s see:

\begin{align} \int 1 \left\{\tau \not\in \psi\right\} p\left(\tau\right) \dd{\tau } &= \int \frac{q\left(\tau\right)}{q\left(\tau\right)}1 \left\{\tau \not\in \psi\right\} p\left(\tau\right) \dd{\tau } \\ &= \int q\left(\tau\right) \left(\frac{p\left(\tau\right)}{q\left(\tau\right)} \mathbb{1}\left\{\tau \not \in \psi\right\}\right) \dd{\tau} \\ &= \mathbb{E}_{\tau \sim q\left(\cdot\right)} \left[\frac{p\left(\tau\right)}{q\left(\tau\right)} \mathbb{1}\left\{\tau \not \in \psi\right\}\right] \end{align}

we can now estimate this by using an average of what we observed:

\begin{equation} \hat{p}_{\text{fail}} = \frac{1}{m}\sum_{i=1}^{m} \frac{p\left(\tau_{i}\right)}{q\left(\tau_{i}\right)} \mathbb{1} \left\{\tau_{i}\not \in \psi\right\} \end{equation}

Suppose you have a function f(s) which isn’t super well integrate-able, yet you want:

\begin{equation} \mu = \mathbb{E}(f(s)) = \int_{0}^{1} f(s)p(s) \dd{s} \end{equation}

how would you sample various f(s) effectively such that you end up with \hat{\mu} that’s close enough? Well, what if you have an importance distribution q(s): S \to \mathbb{R}^{[0,1]}, which tells you how “important” to the expected value of the distribution a particular state is? Then, we can formulate a new, better normalization function called the “importance weight”:

\begin{equation} w(s) = \frac{p(s)}{q(s)} \end{equation}

Therefore, this would make our estimator:

\begin{equation} \hat{\mu} = \frac{\sum_{n} f(s_{n}) w(s_{n})}{\sum_{n} w(s_{n})} \end{equation}

Theoretic grantees So, there’s a distribution over f:

\begin{equation} q(s) = \frac{b(s)}{w_{\pi}(s)} \end{equation}

where

\begin{equation} w(s) = \frac{\mathbb{E}_{b} \left( \sqrt{[\mathbb{E}(v|s, \pi )]^{2} + [Var(v|s, \pi )]}\right)}{[\mathbb{E}(v|s, \pi )]^{2} + [Var(v|s, \pi )]} \end{equation}

which measures how important a state is, where \pi is the total discounted reward.

[[curator]]
I'm the Curator. I can help you navigate, organize, and curate this wiki. What would you like to do?