With a sound understanding of limits and continuity, we are ready to tackle derivatives — the rate of change of a function.
With a sound understanding of limits and continuity, we are ready to tackle derivatives — the rate of change of a function.
To understand derivatives, we recommend familiarity with the concepts in
Follow the above links to first get acquainted with the corresponding concepts.
A quick refresher on this basic concept in geometry before we delve into derivatives.
Every point \( (x,y) \) along a line is related according to the equation \( y = mx + c \). Here, \( m \) is known as the slope and \( c \) is the intercept. In other words, \( y = f(x) \), a function \( f(x) = mx + c \).
Imagine two points along the line \( y = f(x) \) — \( (x,f(x)) \) and \( (x+h, f(x+h)) \).
Note that
\begin{aligned} & f(x+h) - f(x) = m(x+h) + c - \left( mx + c \right) \\\\ \implies& f(x+h) - f(x) = mh \\\\ \implies& m = \frac{f(x+h) - f(x)}{h} \\\\ \end{aligned}
This formulation is a building block for derivatives.
Zoom in close enough and most continuous functions will look piecewise linear; made by juxtaposing line segments together. Zoom into a discontinuity, and it remains a discontinuity; a gap that just keeps getting wider. Note that the key is zooming-in enough that the function behaves like a line-segment in the magnified viewport.
In this interactive, zoom into the function \( f(x) = \sin x \) to note that even a seemingly wavy function is composed of line segments if you zoom enough. Drag the circle in the function plot to re-center the zoom plot. Use the zoom slider to achieve the desired level of magnification.
Now that we know that any function is composed of line segments, suppose our line segment extends from \( a \) to \( b \) on the \( X\)-axis. Imagine two points along this line such that \( x \in [a,b] \) and \( (x+h) \in [a,b] \). If \( m \) is the slope of the line segment, basic line geometry, that we studied earlier, tells us that
$$\begin{align} & f(x + h) = f(x) + m(x+h - x) \\ & \implies m = \frac{f(x+h) - f(x)}{h} \\ \end{align}$$
Note that the key is zooming-in enough that the function behaves like a line-segment in the magnified viewport. In other words, zooming in means, \( h \) should be miniscule.
According to the limits concept introduced earlier, we are interested in \( h \to 0 \). This means, we can compute the slope \( m_x \) of a function at a point \(x\) as follows.
$$ m_x = \lim_{h \to 0} \frac{f(x+h) - f(x)}{h} $$
The slope of a line is constant throughout. If the line segment changes, so does the slope.
Since the line segment at every point along the domain of the function is different, and the slope is measured at that point, as you shall notice in this next interactive.
This slope of the function is known as a derivative. It is by far one of the most important mathematical concept required to optimize machine learning models on data.
Here's a list of symbols that you might find mathematical literature, named after their founders. They have been ordered according to their popularity in machine learning literature.
We will restrict usage to the Leibniz and Lagrange notation. We will use the Lagrange as a succinct representation when \( x \) is clear from the context and using Leibniz notation otherwise.
The magnitude of \( f'(x) \) tells us how fast the function is changing. The sign of \( f'(x) \) tells us the direction of change.
Derivatives provide us an easy way to intuitively understand function behavior, even without plotting them.
The upcoming sections will demonstrate the relationship of functions and their derivatives. Here's how to understand them.
Let's start with the example of the simple quadratic function \( f(x) = x^2 \). The derivative is \( f'(x) = 2x \). Look at the accompanying charts to understand this relationship.
By inspection, we already knew that \( f(x) \) is always positive. But now we know one more thing. The slope of the function \( f(x) \) doubles with \( x \). This means, the function keeps becoming steeper as we move away from \( x = 0 \).
Moreover, it doesn't matter if we are moving towards the positive or negative side from \( x = 0 \), the rapid increase in the function is the same in both directions. This means, the function must be a symmetric function around \( x = 0 \).
One other thing to notice is that the minimum of the function is achieved when the derivative is \( 0 \). Well, the function flattens out near its minimum, then it's derivative must be zero at that point. But, the function also flattens near a maximum. We will see in one of the later examples how we deal with that.
In machine learning, derivatives are mostly used in fitting models by optimizing a loss function. We will focus on this aspect of derivatives in the rest of the discussion.
Here's a thought exercise: Can you identify whether \( x^2 - 1 \) and \( (x-1)^2 \) are symmetric functions? Are they both symmetric around \( x = 0 \) or around \( x = 1 \)? Can you visualize their shapes without actually plotting them? Can you identify their minimum value and when they attain them?
The derivative of Sigmoid is quite interesting and often computed in optimization routines.
$$ \sigma'(x) = \sigma(x) (1 - \sigma(x)) $$
We already knew that \( \sigma(x) \ge 0, \forall x \in \real \). From limits, we also knew that \( \lim_{x \to \infty} \sigma(x) = 1 \) and \( \lim_{x \to -\infty} \sigma(x) = 0 \).
From the derivative, now we know that the function has the steepest growth around \(\sigma(0) = 0.5, x= 0\). It grows or drops very very slowly as we move away from \( x = 0 \), on either sides, because if \( \sigma(x) \) is high, then \( 1 - \sigma(x) \) will be low, reducing the overall slope, leading to slow growth. Thus, the function flattens out as \( x \to \infty \) and \( x \to -\infty \).
This is important to understand. The magnitude of the derivative tells how fast or slow a function is growing or decreasing. And as we saw in the previous example, the sign of the derivative indicates the direction of change of the function with respect to its input. Positive derivative implies function increasing with increasing \( x \) and negative derivative indicates a decreasing function.
We can also identify that the function is symmetric about \( x = 0 \), but the symmetric portion is flipped. (Think!)
In the case of the quadratic \( f(x) = x^2 \) function, we observed that the zero derivative indicated minimum of the function. In the case of Sigmoid, the zero is never really attained, but you will note that \( \sigma'(x) \to 0\) as \( x \to \infty \) or \( x \to 0 \). So, again we note that the derivative tends to be zero near flat regions.
But just by looking at the derivative, we cannot tell whether the extremum reached is a minimum or a maximum In both cases the derivative is zero or close to zero. We will address this challenge soon.
Thought experiment: The hyperbolic tangent \( \tanh(x) = \frac{\exp^{x} - \exp^{-x}}{ \exp^{x} + \exp^{x}} \), is a related function, sometimes used as an alternative activation function to the Sigmoid function. What can you say about its nature, without even plotting the function?
The logarithm function \( f(x) = \log x \) appears very commonly in machine learning. It is a continuous function, only defined for \( x > 0 \). The derivative is also defined for \( x > 0 \). It is given by
$$ \frac{d}{dx} \log x = \frac{1}{x} $$
We have encountered the reciprocal function earlier in our discussion on limits. Since we are only dealing with the positive values of \( x \), we know that
$$ \lim_{x \to 0^+} \frac{1}{x} = \infty $$
We also know that
$$ \lim_{x \to \infty} \frac{1}{x} = 0 $$
This means, the slope of the function rises rapidly as we move closer to zero. It also means that the function flattens out as we move closer to \( \infty \).
For \( f(x) = x^3 \), the derivative is \( f'(x) = 3x^2 \).
Note that at \( x = 0 \), \(f'(x) = 0 \). But wait!
As you can see from the chart, at \( x = 0 \), the function does not have a minimum or a maximum, local or global. Such points, where \( f'(x) = 0 \), which are neither minima or maxima, are knowns as inflection points.
Note that we cannot identify whether a critical point is a minimum, maximum, or an inflection just from \( f'(x) \).
Are all functions differentiable? Check out our comprehensive interactive article on differentiability and smoothness.
Or understand the counterpart to derivatives — integrals.
Already a calculus expert? Check out comprehensive courses on multivariate calculus, machine learning or deep learning
Help us create more engaging and effective content and keep it free of paywalls and advertisements!
Please share your comments, questions, encouragement, and feedback.