Notation New Concepts supervised learning training set cost function least-squares error machine learning evaluation root-mean-square error (this is least-squares error with normalization, so “units work well” for test benchmarks) linear regression gradient descent stochastic gradient descent normal equation Important Results / Claims gradient descent for least-squares error when does gradient descent provably work? Questions Why can’t we use root-mean-square error for the training objective? It seems like its just more normalization…