Interactive tutorial on derivatives

This learning module has many interactive demos. It is easier to work with them on a larger screen. Bookmark and revisit if you are currently on a small screen device.

\(\DeclareMathOperator*{\argmax}{arg\,max} \DeclareMathOperator*{\argmin}{arg\,min} \DeclareMathOperator*{\asterisk}{\ast} \newcommand{\sup}{\text{sup}} \newcommand{\inf}{\text{inf}} \newcommand{\min}{\text{min}\;} \newcommand{\max}{\text{max}\;} \newcommand{\maxunder}[1]{\underset{#1}{\max}} \newcommand{\minunder}[1]{\underset{#1}{\min}} \newcommand{\real}{\mathbb{R}} \newcommand{\natural}{\mathbb{N}} \newcommand{\integer}{\mathbb{Z}} \newcommand{\rational}{\mathbb{Q}} \newcommand{\irrational}{\mathbb{I}} \newcommand{\complex}{\mathbb{C}} \newcommand{\cardinality}[1]{|#1|} \newcommand{\vec}[1]{\mathbf{#1}} \newcommand{\mat}[1]{\mathbf{#1}} \newcommand{\star}[1]{#1^*} \newcommand{\inv}[1]{#1^{-1}} \newcommand{\indicator}[1]{\mathcal{I}(#1)} \renewcommand{\BigO}[1]{\mathcal{O}(#1)} \renewcommand{\BigOsymbol}{\mathcal{O}} \renewcommand{\smallo}[1]{\mathcal{o}(#1)} \renewcommand{\smallosymbol}[1]{\mathcal{o}} \newcommand{\set}[1]{\mathbb{#1}} \newcommand{\complement}[1]{#1^c} \newcommand{\powerset}[1]{\mathcal{P}(#1)} \newcommand{\setdiff}{\setminus} \newcommand{\setsymmdiff}{\oplus} \newcommand{\dash}[1]{#1^{'}} \newcommand{\permutation}[2]{{}_{#1} \mathrm{ P }_{#2}} \newcommand{\combination}[2]{{}_{#1} \mathrm{ C }_{#2}} \newcommand{\prob}[1]{P(#1)} \newcommand{\pmf}[1]{P(#1)} \newcommand{\pdf}[1]{p(#1)} \newcommand{\cdf}[1]{F(#1)} \newcommand{\expect}[2]{E_{#1}\left[#2\right]} \newcommand{\entropy}[1]{\mathcal{H}\left[#1\right]} \newcommand{\expe}[1]{\mathrm{e}^{#1}} \newcommand{\textexp}[1]{\text{exp}\left(#1\right)} \def\independent{\perp\!\!\!\perp} \def\notindependent{\not\!\independent} \newcommand{\yhat}{\hat{y}} \newcommand{\vs}{\vec{s}} \newcommand{\vt}{\vec{t}} \newcommand{\vu}{\vec{u}} \newcommand{\vv}{\vec{v}} \newcommand{\vw}{\vec{w}} \newcommand{\vx}{\vec{x}} \newcommand{\vy}{\vec{y}} \newcommand{\vz}{\vec{z}} \newcommand{\va}{\vec{a}} \newcommand{\vb}{\vec{b}} \newcommand{\vc}{\vec{c}} \newcommand{\vd}{\vec{d}} \newcommand{\ve}{\vec{e}} \newcommand{\vg}{\vec{g}} \newcommand{\vh}{\vec{h}} \newcommand{\vi}{\vec{i}} \newcommand{\vk}{\vec{k}} \newcommand{\vo}{\vec{o}} \newcommand{\vp}{\vec{p}} \newcommand{\vq}{\vec{q}} \newcommand{\vr}{\vec{r}} \newcommand{\vs}{\vec{s}} \newcommand{\vmu}{\vec{\mu}} \newcommand{\vsigma}{\vec{\sigma}} \newcommand{\vphi}{\vec{\phi}} \newcommand{\vtau}{\vec{\tau}} \newcommand{\vtheta}{\vec{\theta}} \newcommand{\mA}{\mat{A}} \newcommand{\mB}{\mat{B}} \newcommand{\mC}{\mat{C}} \newcommand{\mD}{\mat{D}} \newcommand{\mE}{\mat{E}} \newcommand{\mH}{\mat{H}} \newcommand{\mK}{\mat{K}} \newcommand{\mP}{\mat{P}} \newcommand{\mQ}{\mat{Q}} \newcommand{\mR}{\mat{R}} \newcommand{\mS}{\mat{S}} \newcommand{\mU}{\mat{U}} \newcommand{\mV}{\mat{V}} \newcommand{\mW}{\mat{W}} \newcommand{\mX}{\mat{X}} \newcommand{\mY}{\mat{Y}} \newcommand{\mZ}{\mat{Z}} \newcommand{\mI}{\mat{I}} \newcommand{\mLambda}{\mat{\Lambda}} \newcommand{\mSigma}{\mat{\Sigma}} \newcommand{\mTheta}{\mat{\theta}} \newcommand{\setsymb}[1]{#1} \newcommand{\sA}{\setsymb{A}} \newcommand{\sB}{\setsymb{B}} \newcommand{\sC}{\setsymb{C}} \newcommand{\sO}{\setsymb{O}} \newcommand{\sP}{\setsymb{P}} \newcommand{\sQ}{\setsymb{Q}} \newcommand{\sH}{\setsymb{H}} \newcommand{\sX}{\setsymb{X}} \newcommand{\sY}{\setsymb{Y}} \newcommand{\norm}[2]{||{#1}||_{#2}} \newcommand{\infnorm}[1]{\norm{#1}{\infty}} \newcommand{\fillinblank}{\text{ }\underline{\text{ ? }}\text{ }} \newcommand{\lbrace}{\left\{} \newcommand{\rbrace}{\right\}} \newcommand{\set}[1]{\lbrace #1 \rbrace} \newcommand{\seq}[1]{\left( #1 \right)} \newcommand{\ndim}{N} \newcommand{\ndimsmall}{n} \newcommand{\dataset}{\mathbb{D}} \newcommand{\ndata}{D} \newcommand{\ndatasmall}{d} \newcommand{\labeledset}{\mathbb{L}} \newcommand{\nlabeled}{L} \newcommand{\nlabeledsmall}{l} \newcommand{\unlabeledset}{\mathbb{U}} \newcommand{\nunlabeled}{U} \newcommand{\nunlabeledsmall}{u} \newcommand{\nclass}{M} \newcommand{\nclasssmall}{m} \newcommand{\loss}{\mathcal{L}} \newcommand{\sign}{\text{sign}} \newcommand{\Gauss}{\mathcal{N}} \newcommand{\hadamard}{\circ} \newcommand{\doh}[2]{\frac{\partial #1}{\partial #2}} \newcommand{\dox}[1]{\doh{#1}{x}} \newcommand{\doy}[1]{\doh{#1}{y}} \newcommand{\doxx}[1]{\doh{#1}{x^2}} \newcommand{\doyy}[1]{\doh{#1}{y^2}} \newcommand{\doxy}[1]{\frac{\partial #1}{\partial x \partial y}} \newcommand{\doyx}[1]{\frac{\partial #1}{\partial y \partial x}} \newcommand{\qed}{\tag*{$\blacksquare$}}\)

        Derivatives
        Calculus
      

With a sound understanding of limits and continuity, we are ready to tackle derivatives — the rate of change of a function.

Prerequisites

To understand derivatives, we recommend familiarity with the concepts in

Follow the above links to first get acquainted with the corresponding concepts.

Slope of a line

A quick refresher on this basic concept in geometry before we delve into derivatives.

Every point $ (x,y) $ along a line is related according to the equation $ y = mx + c $. Here, $ m $ is known as the slope and $ c $ is the intercept. In other words, $ y = f(x) $, a function $ f(x) = mx + c $.

Imagine two points along the line $ y = f(x) $ — $ (x,f(x)) $ and $ (x+h, f(x+h)) $.

Note that

\begin{aligned} & f(x+h) - f(x) = m(x+h) + c - \left( mx + c \right) \\\\ \implies& f(x+h) - f(x) = mh \\\\ \implies& m = \frac{f(x+h) - f(x)}{h} \\\\ \end{aligned}

This formulation is a building block for derivatives.

Functions are made of line-segments

Zoom in close enough and most continuous functions will look piecewise linear; made by juxtaposing line segments together. Zoom into a discontinuity, and it remains a discontinuity; a gap that just keeps getting wider. Note that the key is zooming-in enough that the function behaves like a line-segment in the magnified viewport.

In this interactive, zoom into the function $ f(x) = \sin x $ to note that even a seemingly wavy function is composed of line segments if you zoom enough. Drag the circle in the function plot to re-center the zoom plot. Use the zoom slider to achieve the desired level of magnification.

Caveat: This reasoning is not applicable to fractals

Derivatives: Rate of change of a function

Now that we know that any function is composed of line segments, suppose our line segment extends from $ a $ to $ b $ on the $ X$-axis. Imagine two points along this line such that $ x \in [a,b] $ and $ (x+h) \in [a,b] $. If $ m $ is the slope of the line segment, basic line geometry, that we studied earlier, tells us that

$$\begin{align} & f(x + h) = f(x) + m(x+h - x) \\ & \implies m = \frac{f(x+h) - f(x)}{h} \\ \end{align}$$

Note that the key is zooming-in enough that the function behaves like a line-segment in the magnified viewport. In other words, zooming in means, $ h $ should be miniscule.

According to the limits concept introduced earlier, we are interested in $ h \to 0 $. This means, we can compute the slope $ m_x $ of a function at a point $x$ as follows.

$$ m_x = \lim_{h \to 0} \frac{f(x+h) - f(x)}{h} $$

The slope of a line is constant throughout. If the line segment changes, so does the slope.

Since the line segment at every point along the domain of the function is different, and the slope is measured at that point, as you shall notice in this next interactive.

This slope of the function is known as a derivative. It is by far one of the most important mathematical concept required to optimize machine learning models on data.

Derivatives: notation

Here's a list of symbols that you might find mathematical literature, named after their founders. They have been ordered according to their popularity in machine learning literature.

Leibniz's notation : $ \frac{df}{dx} $
Lagrange's notation: $ f'(x) $
Newton's notation : $ \dot{f} $
Euler's notation : $ D_x f(x) $

We will restrict usage to the Leibniz and Lagrange notation. We will use the Lagrange as a succinct representation when $ x $ is clear from the context and using Leibniz notation otherwise.

What does a derivative tell you?

The magnitude of $ f'(x) $ tells us how fast the function is changing. The sign of $ f'(x) $ tells us the direction of change.

Positive $ f'(x) $ means the function $ f(x) $ is increasing at $ x $ if we slightly increase the value of $ x $.
Negative $ f'(x) $ implies inverse relationship; the function $ f(x) $ is decreasing at $ x $.
If $ f'(x) = 0 $, then there is no change in $ f(x) $ at $ x $. The function is effectively constant at $ x $. Such points of zero-derivatives are known as critical points, stable points, or stationary points of the function.

The geometry of derivatives

Derivatives provide us an easy way to intuitively understand function behavior, even without plotting them.

The upcoming sections will demonstrate the relationship of functions and their derivatives. Here's how to understand them.

Each demo has two plots, the function $ f(x) $ and its derivative $ f'(x) $
The same point is highlighted on both the plots. You can move this point by dragging the orange circle on the $X$-axis. This will help you better inspect the derivative.
At the point of interest $ x $, we have drawn a line segment that passes through $ (x, f(x)) $, and has slope equal to $ f'(x) $. Notice that this line is always tangent to the function, if $ f'(x) $ exists.
The tangent line has a arrow on its head, attached in the direction of increasing $ x $. Thus, the arrow denotes the direction of change in function $ f(x) $ as $ x $ increases. An arrow pointing downward means the function decreases as $ x $ increases. An upward arrow means the function increases.

Understanding derivative of $ x^2 $

Let's start with the example of the simple quadratic function $ f(x) = x^2 $. The derivative is $ f'(x) = 2x $. Look at the accompanying charts to understand this relationship.

By inspection, we already knew that $ f(x) $ is always positive. But now we know one more thing. The slope of the function $ f(x) $ doubles with $ x $. This means, the function keeps becoming steeper as we move away from $ x = 0 $.

Moreover, it doesn't matter if we are moving towards the positive or negative side from $ x = 0 $, the rapid increase in the function is the same in both directions. This means, the function must be a symmetric function around $ x = 0 $.

One other thing to notice is that the minimum of the function is achieved when the derivative is $ 0 $. Well, the function flattens out near its minimum, then it's derivative must be zero at that point. But, the function also flattens near a maximum. We will see in one of the later examples how we deal with that.

In machine learning, derivatives are mostly used in fitting models by optimizing a loss function. We will focus on this aspect of derivatives in the rest of the discussion.

Here's a thought exercise: Can you identify whether $ x^2 - 1 $ and $ (x-1)^2 $ are symmetric functions? Are they both symmetric around $ x = 0 $ or around $ x = 1 $? Can you visualize their shapes without actually plotting them? Can you identify their minimum value and when they attain them?

Understanding derivative of the Sigmoid

The derivative of Sigmoid is quite interesting and often computed in optimization routines.

$$ \sigma'(x) = \sigma(x) (1 - \sigma(x)) $$

We already knew that $ \sigma(x) \ge 0, \forall x \in \real $. From limits, we also knew that $ \lim_{x \to \infty} \sigma(x) = 1 $ and $ \lim_{x \to -\infty} \sigma(x) = 0 $.

From the derivative, now we know that the function has the steepest growth around $\sigma(0) = 0.5, x= 0$. It grows or drops very very slowly as we move away from $ x = 0 $, on either sides, because if $ \sigma(x) $ is high, then $ 1 - \sigma(x) $ will be low, reducing the overall slope, leading to slow growth. Thus, the function flattens out as $ x \to \infty $ and $ x \to -\infty $.

This is important to understand. The magnitude of the derivative tells how fast or slow a function is growing or decreasing. And as we saw in the previous example, the sign of the derivative indicates the direction of change of the function with respect to its input. Positive derivative implies function increasing with increasing $ x $ and negative derivative indicates a decreasing function.

We can also identify that the function is symmetric about $ x = 0 $, but the symmetric portion is flipped. (Think!)

In the case of the quadratic $ f(x) = x^2 $ function, we observed that the zero derivative indicated minimum of the function. In the case of Sigmoid, the zero is never really attained, but you will note that $ \sigma'(x) \to 0$ as $ x \to \infty $ or $ x \to 0 $. So, again we note that the derivative tends to be zero near flat regions.

But just by looking at the derivative, we cannot tell whether the extremum reached is a minimum or a maximum In both cases the derivative is zero or close to zero. We will address this challenge soon.

Thought experiment: The hyperbolic tangent $ \tanh(x) = \frac{\exp^{x} - \exp^{-x}}{ \exp^{x} + \exp^{x}} $, is a related function, sometimes used as an alternative activation function to the Sigmoid function. What can you say about its nature, without even plotting the function?

Understanding derivative of $ \log x $

The logarithm function $ f(x) = \log x $ appears very commonly in machine learning. It is a continuous function, only defined for $ x > 0 $. The derivative is also defined for $ x > 0 $. It is given by

$$ \frac{d}{dx} \log x = \frac{1}{x} $$

We have encountered the reciprocal function earlier in our discussion on limits. Since we are only dealing with the positive values of $ x $, we know that

$$ \lim_{x \to 0^+} \frac{1}{x} = \infty $$

We also know that

$$ \lim_{x \to \infty} \frac{1}{x} = 0 $$

This means, the slope of the function rises rapidly as we move closer to zero. It also means that the function flattens out as we move closer to $ \infty $.

Derivative of $ x^3 $

For $ f(x) = x^3 $, the derivative is $ f'(x) = 3x^2 $.

Note that at $ x = 0 $, $f'(x) = 0 $. But wait!

As you can see from the chart, at $ x = 0 $, the function does not have a minimum or a maximum, local or global. Such points, where $ f'(x) = 0 $, which are neither minima or maxima, are knowns as inflection points.

Note that we cannot identify whether a critical point is a minimum, maximum, or an inflection just from $ f'(x) $.

Where to next?

Are all functions differentiable? Check out our comprehensive interactive article on differentiability and smoothness.

Or understand the counterpart to derivatives — integrals.

Already a calculus expert? Check out comprehensive courses on multivariate calculus, machine learning or deep learning

Please support us

Help us create more engaging and effective content and keep it free of paywalls and advertisements!

Please donate

Subscribe for article updates

Stay up to date with new material for free.

Derivatives

Calculus