Taylor's Theorem and optimization
Owing to the Taylor's Theorem, for a twice differentiable function \( f(\vx) \), the value of the function at a point \(\va+\vp \) can be written in terms of the function value at a point \( \va \) as follows.
\begin{equation}
f(\va+\vp) = f(\va) + \nabla_{\va}^T \vp + \frac{1}{2} \vp^T \mH_{\va} \vp
\label{eqn:second-order-taylor-approx}
\end{equation}
Here, \( \nabla_{\va} \) and \( \mH_{\va} \) denote the gradient and Hessian of the function, evaluated at the point \( \va \).
To choose the next point \( \va + \vp \), we need to discover a suitable value of \( \vp \).
Since \( \va \) is already known (the current point), this is a function in \( \vp \).
At its local minimizer, \( \star{\vp} \), the gradient with respect to \( \vp \) will be zero.
\begin{aligned}
\doh{f(\va + \star{\vp})}{\vp} = 0 \\\\
\label{eqn:first-order-condition-on-p}
\end{aligned}
Thus, we now have a system of equations to solve for the value of \( \vp \) to choose our next point \( \va + \vp \).
\begin{aligned}
\frac{\partial}{\partial \vp} \left( f(\va) + \nabla_{\va}^T \vp + \frac{1}{2} \vp^T \mH_{\va} \vp \right) = 0
\end{aligned}