descent method — Jemoka Knowledge Base

Descent methods are generally of shape:

\begin{align} x^{(k+1)} = x^{(k)} + t^{(k)} \delta x^{(k)} \end{align}

choosing these is a matter of which descent method you choose. The Hessian thus exposes:

\begin{align} \left\{x + v \mid v^{T} \nabla^{2} f\left(x\right) v \leq 1\right\} \end{align}

line search backtracking line search A first-order Taylor line in t, centered about x:

\begin{align} f\left(x\right) + t \nabla f\left(x\right)^{T} \Delta x \end{align}

We can degrade this to raise it to be slightly higher:

\begin{align} f\left(x\right) + \alpha t \nabla f\left(x\right)^{T} \Delta x \end{align}

and we take all choices of t for which

\begin{align} f\left(x + t\Delta x\right) \leq f\left(x\right) + \alpha t \nabla f\left(x\right)^{T} \Delta x \end{align}

gradient descent Set:

\begin{align} \Delta x = -\nabla f\left(x\right) \end{align}

convergence: for strongly convex f:

\begin{align} f\left(x^{(k)}\right) - p^{*} \leq c^{k} \left(f^{(0)} - p^{*}\right) \end{align}

for some constant c. Gradient methods can be slow, and in particular exhibit “zigzagging” whenever your sublevelsets are not spherical (unisotrphic); so the best gradient method performance are on sublevels sets that are isotropic. You can measure this by condition number. steepest descent Gradient methods can be very slow. Thus we want to take a “reasonably sized” direction in the steepest descent.

\begin{align} \Delta x = \left\{\nabla f\left(x\right) v \mid \norm{v} \leq 1\right\} \end{align}

Some choices of norms: Euclidian norm: -\nabla f\left(x\right) (its just quadratic norm: for \norm{x}_{P} = \left(x^{T} P x\right)^{\frac{1}{2}}, we have -P^{-1}\nabla f\left(x\right) (you can imagine this as a change of variables in P^{\frac{1}{2}}x. Newton’s Method See Newton’s Method