This method introduces a search distribution instead of discrete points:
We want to know how parameters \theta are distributed, given some input parameters \psi (for instance, we assume parameters are gaussian distributed such as the mean/variance). Given this distribution, we sample m samples of \theta from the distribution. Those are our starting candidate points. We then check its policy for its utility via the Roll-out utility We want to take top k of our best performers, called “elite samples” m_{elite} Use the set of m_{elite} points, we fit a new distribution parameter \psi that describes those sample This allows us to bound how many Roll-out utilities we are doing. For each dimension, we should have 10x elite sample points (1d should have 10 samples, 2d should have 20, etc.)