Newton's method is an approach for unconstrained optimization.
In this article, we will motivate the formulation of this approach and provide interactive demos over multiple univariate and multivariate functions to show it in action.
Newton's method is an approach for unconstrained optimization.
In this article, we will motivate the formulation of this approach and provide interactive demos over multiple univariate and multivariate functions to show it in action.
To understand this article, we recommend familiarity with the concepts in
Follow the above links to first get acquainted with the corresponding concepts.
In this article, we will consider an objective function \( f : \real^\ndim \to \real \) that we wish to minimize. We express the unconstrained optimization problem as,
\begin{aligned} \min_{\vx} \text{ }& f(\vx) \\ \end{aligned}
Newton's method is an iterative approach. Starting from an arbitrary point, we will progressively discover points with lower function values till we converge to a point with minimum function value.
The next few sections will outline the strategy used by Newton's method to discover the next point \( \va + \vp \), from the current point \( \va \) on its journey to the optimal solution.
Owing to the Taylor's Theorem, for a twice differentiable function \( f: \real^\ndim \to \ndim \), the value of the function at a point \(\va+\vp \) can be written in terms of the function value at a point \( \va \) as follows.
\begin{equation} f(\va+\vp) = f(\va) + \nabla_{\va}^T \vp + \frac{1}{2} \vp^T \mH_{\va} \vp \label{eqn:second-order-taylor-approx} \end{equation}
Here, \( \nabla_{\va} \) and \( \mH_{\va} \) denote the gradient and Hessian of the function, evaluated at the point \( \va \).
To choose the next point \( \va + \vp \), we need to discover a suitable value of \( \vp \). Since \( \va \) is already known (the current point), this is a function in \( \vp \). At its local minimizer, \( \star{\vp} \), the gradient with respect to \( \vp \) will be zero.
\begin{aligned} \doh{f(\va + \star{\vp})}{\vp} = 0 \\\\ \label{eqn:first-order-condition-on-p} \end{aligned}
Thus, we now have a system of equations to solve for the value of \( \vp \) to choose our next point \( \va + \vp \).
The Newton's method arrives at the next iterate \( \va + \vp \) from the current iterate \( \va \) by solving the above equation.
$$\begin{aligned} & \frac{\partial f(\va + \star{\vp})}{\partial \vp} = 0 \\ \implies & \frac{\partial}{\partial \vp} \left( f(\va) + \nabla_{\va}^T \vp + \frac{1}{2} \vp^T \mH_{\va} \vp \right) = 0 \\ \implies & \nabla_{\va} + \mH_{\va} \vp = 0 \\ \implies & \vp = -\inv{\mH_{\va}} \nabla_{\va} \\ \end{aligned} $$
Thus, Newton's method proceeds as follows:
Quite straightforward!
Let's see Newton's method in action with a simple univariate function \( f(x) = x^2 \), where \( x \in \real \).
Note that the function has a global minimum at \( x = 0 \). The goal of the Newton's method is to discover this point of least function value, starting at any arbitrary point.
In this accompanying demo, change the starting point by dragging the orange circle and see the trajectory of Newton's method towards the global minimum.
No matter where you start for this convex quadratic objective function, Newton's method finds the minimum in exactly 1 step.
In fact, this is a property worth noting: Newton's method always finds the minimum of a convex quadratic objective function in exactly one step. And why not? The second-order Taylor series expansion perfectly models the function and Newton's method finds the exact minimum of that second-order model in each step.
So, let's go beyond quadratic and beyond convex.
Let's see Newton's method on a nonconvex function with multiple minima and a single maximum.
$$ f(x) = x^4 - 8x^2 $$
This is clearly a nonconvex objective fuction with two minima and one local maximum, the bump in the middle.
First, note that the Newton's method, without any restrictions on its movement, does not discriminate between the minimum and the maximum. It just gets to a nearby stationary point. So, if you start very close to the bump in the center, you will land at the local maximum.
Second, note that the iterates jump around before they reach a stationary point, unlike our previous example, a consequence of the objective function being nonconvex.
Also worth nothing is that starting near a particular local minimum does not guarantee that the algorithm will converge to that local minimum. It may actually go over the bump and land up in another local minimum altogether.
Of course, Newton's method can be applied to multivariate functions.
We have introduced Rosenbrock's function in the module on Calculus.
$$ \min_{x,y} (1-x)^2 + 100 (y-x^2)^2 $$
The global minimum of the function lies at \( x = 1, y = 1 \).
Note that unless you start close enough to the minimum, the Newton's method may not find it within 10 iterations. This is a well-known shortcoming of the Newton's method. Iterations should begin with a point that is likely close to the minimum.
So far, our function had only a single minimum. Such functions, albeit easy to understand, are rare in machine learning. Let us study a function with multiple minima. We have introduced Himmelblau's function in the module on Calculus.
$$ f(x,y) = (x^2 + y - 11)^2 + (x + y^2 - 7)^2 $$
Himmelblau's function is nonconvex with 4 minima (marked in green) and 1 local maximum (marked in blue).
As expected, Newton's method seeks out one of these stationary points. It does not necessarily reach the closest stationary point.
Newton's method seems quite straightforward.
But! What if the function is not twice differentiable? Or what if the Hessian is computationally infeasible.
In these cases, we cannot calculate \( \mH_{\vx} \) , let alone it's inverse, \( \inv{\mH_{\vx}} \).
In such scenarios, we use alternative approaches such as
Expand your knowledge of optimization approaches with our detailed interactive articles on this subject.
To brush up on calculus, explore our articles on topics in calculus and multivariate calculus. Or start with strong mathematical foundations.
Already an optimization expert? Check out comprehensive courses on machine learning or deep learning.
Help us create more engaging and effective content and keep it free of paywalls and advertisements!
Stay up to date with new material for free.