# Higher order derivatives

##### Calculus

If the rate of change of a function is the derivative, then how do we characterize the rate of change of the derivative itself? In this article, we explore this concept — the higher-order derivatives.

## Prerequisites

To understand higher order derivatives, we recommend familiarity with the concepts in

Follow the above links to first get acquainted with the corresponding concepts.

## Derivative of a derivative

The derivative $f'(x)$ is itself a function. So it we might be able to further differentiate it. A derivative of a derivative is called a second-order derivative. By that logic the first derivative of the function is called the first-order derivative of the function. The second-order derivative is denoted as

$$f''(x) = \frac{d^2 f(x)}{dx^2} = \frac{d^2 f}{dx^2} = \frac{d}{dx} \left(\frac{df}{dx}\right)$$

If the derivative of a function does not exist, then obviously, its second-order derivative does not make any sense. If the derivative of a function exists, then for the second-order derivative, we have the same constraints. The second-order derivative exists at a point, if and only if, the first-order derivative is smooth at that point.

And, we do not have to stop at second-order derivative. We can continue taking derivatives of derivatives as long as they exist. Such higher order derivatives are denoted as $\frac{d^nf}{dx^n}$.

## The meaning of the second-order derivative

But, what does it mean, the second-order derivative?

We defined earlier that the derivative of a function at a point provides us the rate of change of that function at that point. So a second-order derivative at a point tells us the rate of change of the first-order derivative at that point.

• Positive second order derivative $\implies$ that the derivative of function is increasing.
• Negative second order derivative $\implies$ that the derivative of function is decreasing.
• Zero second order derivative $\implies$ that the derivative of function is constant.

Note, that the above statements are about the slope of the function, not the function itself!

## Example: $x^2$

Let us try to understand second-order derivatives in the context of some functions, starting with $f(x) = x^2$.

## Second-order derivative of $x^2$

For the quadratic function $f(x) = x^2$, the derivatives are $f'(x) = 2x$ and $f''(x) = 2$. The second order derivative is a constant. It means the slope of the slope of $x^2$ does not vary as a function of $x$.

But most importantly, it is positive. And that is the most important information for us from an optimization perspective.

But first, let's imagine our function was $g(x) = -x^2$. Since $g(x)$ is the negative of $f(x)$ it must be negative for all input and achieve a maximum of zero at $x = 0$.

The derivatives would be $g'(x) = -2x$ and $g''(x) = -2$.

In both cases, we know that the functions $f(x)$ and $g(x)$ would attain their minimum and maximum, respectively, at $x = 0$. But it is their second order derivative that tell us which one it is.

If $f'(a) = 0$ and $f''(a) > 0$, then $f(x)$ attains a local minimum at $x = a$. This is easy to understand. $f'(a) = 0$ means that the function has flattened out there. $f''(a) > 0$ means that the function starts increasing all sides of $x = a$. Hence minimum.

If $g'(a) = 0$ and $g''(a) < 0$, then $g(x)$ attains a local maximum at $x = a$. Analogous to before $g'(a) = 0$ means that the function has flattened out there. $g''(a) < 0$ means that the function falls off on all sides of $x = a$. Hence maximum.

We will elucidate these concepts on the next set of charts.

## Example: $x^4 - 8x^2$

Let us try to understand second-order derivatives in the context of another function, $f(x) = x^4 - 8x^2$.

## Second-order derivative of $x^4 - 8x^2$

We have concocted this function to highlight our observation about the nature of the second order derivative.

For the function $g(x) = x^2 - 8x^2$, the derivatives are $g'(x) = 4x^3 - 16x$ and $g''(x) = 12x^2 - 16$.

Notice that the derivative $g'(x)$ is zero at 3 locations in the chart, namely $x \in \{-2, 0, 2 \}$. Note that the function is either a minimum or a maximum at these points. But note that the function is not a global maximum, only a local. So, remember that the derivative being zero does not imply global maximum or minimum. Only local.

Now note the second-order derivative. At the locations $x \in \{-2, 2\}$, the second order derivative is positive. At the point $x = 0$, the $g''(0) < 0$.

This means that the local extremum achieved by the function at $x \in \{-2,2\}$ is a local minimum. The critical point at $x = 0$ is a maximum.

One more interesting thing to note here. Try dragging the orange circle between the local minimum at $x = -2$ to the local maximum at $x = 0$. Notice that the tangent to the function goes from being under the function to over the function. Such a point of change can be detected by change in the sign of the second derivative.

Such points, when the second-order derivative changes sign from positive to negative, or vice-versa, are known as inflection points. So

• $f'(x) = 0 \implies$ critical point
• $f''(x) = 0 \implies$ inflection point

## Example: $\sin x$

Let us try to understand second-order derivatives in the context of yet another function, a much more wavy one: $f(x) = \sin x$.

## Second-order derivative of $\sin x$

Many machine learners naively assume that if they have a function to optimize, then it must have a maximum or a minimum. That may not be the case as this next example shows.

$h(x) = \sin x$ is a cyclic function. It's derivatives are $h'(x) = \cos x$ and $h''(x) = -\sin x$, both cyclic too. They both cyclically vary in the range $[-1,1]$ implying many local minima and many local maxima.

So, even a second order derivative does not tell you whether you have reached a global extremum. Only local minima or maxima can be detected and should be treated as such.

## Example: $x \sin x$

Now let's try a function that is not only wavy, but also goes through changes in the amplitude of the waves as a function of the input. $f(x) = x \sin x$.

## Second-order derivative of $x \sin x$

The function $k(x) = x \sin x$ is cyclic, just like sine function we saw before. But note that the cycles keep getting amplified, as we move farther from $x = 0$.

The derivatives $k'(x) = \sin x + x \cos x$ and $k''(x) = 2 \cos x - x \sin x$ are also cyclic like $k(x)$.

There are plenty of local minima and maxima when $k'(x) = 0$. You can also distinguish between them by checking if $k''(x) > 0$ or $k''(x) < 0$.

But it all does not mean anything from an optimization perspective.

Just because you arrived at a zero first-order derivative and just because you have an appropriate sign on the second-order derivative does not mean much.

You may be doing great in identifying local extrema, but that does not imply anything about the global stage. Be humble.

## Smoothness and higher-order derivatives

In a previous article, we explored the idea of smoothness.

The existence of an $n$-th order derivative implies that the $(n-1)$-th order derivative is continuous. So, the smoothness of a function is measured in terms of number of derivatives it has which are continuous.

• The class $C^0$ includes all continuous functions
• The class $C^1$ includes all continuously differentiable (differentiable and the derivative is also continuous) functions

In general, the class $C^n$ includes all functions whose derivative is in the class $C^{n-1}$.

And a function, which has derivatives of all orders, everywhere in its domain, belongs to the class $C^\infty$. Such a function is known as the smooth function.

## Where to next?

Now that you are an expert in derivatives, explore the counterpart to derivatives — integrals.

Already a calculus expert? Check out comprehensive courses on multivariate calculus, machine learning or deep learning