Let’s say we want to find MLE parameters \theta for a conditional Gaussian with constant variance. That is:
\begin{equation} p\left(y_{i} | x_{i}\right) = \mathcal{N} \left(y_{i}|f_{\theta } \left(x_{i}\right), \sigma^{2}\right) \end{equation}
and we have a corresponding dataset: \left(x_1, y_1\right), …, \left(x_{m}, y_{m}\right). where:
\begin{align} \hat{\theta} &= \arg\max_{\theta} \sum_{i=1}^{m} \log p\left(y_{i}|x_{i}\right) \\ &= \arg\max_{\theta} \sum_{i=1}^{m} \log \mathcal{N} \left(y_{i}| f_{\theta} \left(x_{i}\right), \sigma^{2}\right) \\ &= \arg\max_{\theta } \sum_{i=1}^{m} \log \frac{1}{\sqrt{{2 \pi \sigma^{2}}}} \exp \left(- \frac{\left(y_{i}- f_{\theta }\left(x_{i}\right)\right)^{2}}{2\sigma^{2}}\right) \end{align}
taking the log of an exp, and removing constants (since they don’t affect optimization), this gives us:
\begin{equation} \arg\max_{\theta} \sum_{i=1}^{m} - \left(y_{i} - f_{\theta} (x_{i})\right)^{2} \end{equation}
which is the…
\begin{equation} \arg\min_{\theta} \sum_{i=1}^{m} \left(y_{i} - f_{\theta} (x_{i})\right)^{2} \end{equation}
woah, least-squares error!