We want to compare features of the model to features of the data: Visual diagnostics PDF plot CDF of data vs. CDF of model Quantile-Quantile plot Calibration Plot Summative Metrics KL Divergence Expected Calibration Error Maximum Calibration Error Marginalization Ignores Covariances Notice on the figure on the right captures distribution much better, yet the marginal distributions don’t show this. This is because marginalizing over the datasets ignores the covariances. Hence, remember to keep dimensions and any projections hould capture covariances, etc. Conditional Distributions Bin the conditions into groups and perform evals on each. Turing Test If expert knowledge is available, you can show an expect roll outs from data and model, and see if they can tell.