Generalization Error \begin{equation} \epsilon_{gen} = \mathbb{E}_{x \sim \mathcal{X}} \left[\left(f(x) - \hat{f}(x)\right)^{2}\right] \end{equation} we usually instead of compute it by averaging specific points we measured. Probabilistic Surrogate Models Gaussian Process A Gaussian Process is a Gaussian distribution over functions! Consider a mean function m(x), and a covariance (kernel) function k(x, x’). And, for a set of objective values y_{j} \in \mathbb{R}, which we are trying to infer using m and k.
The choice of kernel makes or breaks the your ability to model your system. Its the way by which your input values are “smoothed” together to create a probabilistic estimate. Choice of Kernels squared exponential kernel
where, l is the parameter controlling the “length scale” (i.e. distance required for the function to change significantly). As l gets larger, there’s more smoothing. Matérn Kernel This is a very common kernel. Look it up. Prediction Given known means and variances of the sampled points from the original system, we can compute:
through using conditioning Gaussian distributions. Specifically, with:
we can compute a new mean and a new covariance using conditioning Gaussian distributions Noisy Measurements We can account for zero-mean noise by adding some noise to your covariance:
Surrogate Optimization Prediction Based Exploration Given your existing points D, evaluate \mu_{x|D}, and optimize for the next design point that has the smallest \mu_{x|D}. This is all exploitation, no exploration. Error Based Exploration Use the 95% confidence interval from the Gaussian Process, find the areas with with the biggest gap and then lower those. This is *all exploration, no exploitation. Lower Confidence Bound Exploration tradeoff between exploration and exploitation. Try to minimize:
try to minimize both the LOWER BOUND as well as the optimum. This is a probabilistic generalization of the Shubert-Piyavskill Method—and no Lipschitz Constant needed! Reminder, though, these are probabilistic bounds—unlike Shubert-Piyavskill Method. Probability of Improvement Exploration We define “improvement” as:
then, we have the “probability of improvement” metric at:
(i.e. we want to find points that are very possible to improve). This could be zero when \hat{\Sigma} = 0, which happens when we are at a noiseless point. You can also do this by the expected value of improvement