Extending limits and derivatives to multivariate functions

This learning module has many interactive demos. It is easier to work with them on a larger screen. Bookmark and revisit if you are currently on a small screen device.

\(\DeclareMathOperator*{\argmax}{arg\,max} \DeclareMathOperator*{\argmin}{arg\,min} \DeclareMathOperator*{\asterisk}{\ast} \newcommand{\sup}{\text{sup}} \newcommand{\inf}{\text{inf}} \newcommand{\min}{\text{min}\;} \newcommand{\max}{\text{max}\;} \newcommand{\maxunder}[1]{\underset{#1}{\max}} \newcommand{\minunder}[1]{\underset{#1}{\min}} \newcommand{\real}{\mathbb{R}} \newcommand{\natural}{\mathbb{N}} \newcommand{\integer}{\mathbb{Z}} \newcommand{\rational}{\mathbb{Q}} \newcommand{\irrational}{\mathbb{I}} \newcommand{\complex}{\mathbb{C}} \newcommand{\cardinality}[1]{|#1|} \newcommand{\vec}[1]{\mathbf{#1}} \newcommand{\mat}[1]{\mathbf{#1}} \newcommand{\star}[1]{#1^*} \newcommand{\inv}[1]{#1^{-1}} \newcommand{\indicator}[1]{\mathcal{I}(#1)} \renewcommand{\BigO}[1]{\mathcal{O}(#1)} \renewcommand{\BigOsymbol}{\mathcal{O}} \renewcommand{\smallo}[1]{\mathcal{o}(#1)} \renewcommand{\smallosymbol}[1]{\mathcal{o}} \newcommand{\set}[1]{\mathbb{#1}} \newcommand{\complement}[1]{#1^c} \newcommand{\powerset}[1]{\mathcal{P}(#1)} \newcommand{\setdiff}{\setminus} \newcommand{\setsymmdiff}{\oplus} \newcommand{\dash}[1]{#1^{'}} \newcommand{\permutation}[2]{{}_{#1} \mathrm{ P }_{#2}} \newcommand{\combination}[2]{{}_{#1} \mathrm{ C }_{#2}} \newcommand{\prob}[1]{P(#1)} \newcommand{\pmf}[1]{P(#1)} \newcommand{\pdf}[1]{p(#1)} \newcommand{\cdf}[1]{F(#1)} \newcommand{\expect}[2]{E_{#1}\left[#2\right]} \newcommand{\entropy}[1]{\mathcal{H}\left[#1\right]} \newcommand{\expe}[1]{\mathrm{e}^{#1}} \newcommand{\textexp}[1]{\text{exp}\left(#1\right)} \def\independent{\perp\!\!\!\perp} \def\notindependent{\not\!\independent} \newcommand{\yhat}{\hat{y}} \newcommand{\vs}{\vec{s}} \newcommand{\vt}{\vec{t}} \newcommand{\vu}{\vec{u}} \newcommand{\vv}{\vec{v}} \newcommand{\vw}{\vec{w}} \newcommand{\vx}{\vec{x}} \newcommand{\vy}{\vec{y}} \newcommand{\vz}{\vec{z}} \newcommand{\va}{\vec{a}} \newcommand{\vb}{\vec{b}} \newcommand{\vc}{\vec{c}} \newcommand{\vd}{\vec{d}} \newcommand{\ve}{\vec{e}} \newcommand{\vg}{\vec{g}} \newcommand{\vh}{\vec{h}} \newcommand{\vi}{\vec{i}} \newcommand{\vk}{\vec{k}} \newcommand{\vo}{\vec{o}} \newcommand{\vp}{\vec{p}} \newcommand{\vq}{\vec{q}} \newcommand{\vr}{\vec{r}} \newcommand{\vs}{\vec{s}} \newcommand{\vmu}{\vec{\mu}} \newcommand{\vsigma}{\vec{\sigma}} \newcommand{\vphi}{\vec{\phi}} \newcommand{\vtau}{\vec{\tau}} \newcommand{\vtheta}{\vec{\theta}} \newcommand{\mA}{\mat{A}} \newcommand{\mB}{\mat{B}} \newcommand{\mC}{\mat{C}} \newcommand{\mD}{\mat{D}} \newcommand{\mE}{\mat{E}} \newcommand{\mH}{\mat{H}} \newcommand{\mK}{\mat{K}} \newcommand{\mP}{\mat{P}} \newcommand{\mQ}{\mat{Q}} \newcommand{\mR}{\mat{R}} \newcommand{\mS}{\mat{S}} \newcommand{\mU}{\mat{U}} \newcommand{\mV}{\mat{V}} \newcommand{\mW}{\mat{W}} \newcommand{\mX}{\mat{X}} \newcommand{\mY}{\mat{Y}} \newcommand{\mZ}{\mat{Z}} \newcommand{\mI}{\mat{I}} \newcommand{\mLambda}{\mat{\Lambda}} \newcommand{\mSigma}{\mat{\Sigma}} \newcommand{\mTheta}{\mat{\theta}} \newcommand{\setsymb}[1]{#1} \newcommand{\sA}{\setsymb{A}} \newcommand{\sB}{\setsymb{B}} \newcommand{\sC}{\setsymb{C}} \newcommand{\sO}{\setsymb{O}} \newcommand{\sP}{\setsymb{P}} \newcommand{\sQ}{\setsymb{Q}} \newcommand{\sH}{\setsymb{H}} \newcommand{\sX}{\setsymb{X}} \newcommand{\sY}{\setsymb{Y}} \newcommand{\norm}[2]{||{#1}||_{#2}} \newcommand{\infnorm}[1]{\norm{#1}{\infty}} \newcommand{\fillinblank}{\text{ }\underline{\text{ ? }}\text{ }} \newcommand{\lbrace}{\left\{} \newcommand{\rbrace}{\right\}} \newcommand{\set}[1]{\lbrace #1 \rbrace} \newcommand{\seq}[1]{\left( #1 \right)} \newcommand{\ndim}{N} \newcommand{\ndimsmall}{n} \newcommand{\dataset}{\mathbb{D}} \newcommand{\ndata}{D} \newcommand{\ndatasmall}{d} \newcommand{\labeledset}{\mathbb{L}} \newcommand{\nlabeled}{L} \newcommand{\nlabeledsmall}{l} \newcommand{\unlabeledset}{\mathbb{U}} \newcommand{\nunlabeled}{U} \newcommand{\nunlabeledsmall}{u} \newcommand{\nclass}{M} \newcommand{\nclasssmall}{m} \newcommand{\loss}{\mathcal{L}} \newcommand{\sign}{\text{sign}} \newcommand{\Gauss}{\mathcal{N}} \newcommand{\hadamard}{\circ} \newcommand{\doh}[2]{\frac{\partial #1}{\partial #2}} \newcommand{\dox}[1]{\doh{#1}{x}} \newcommand{\doy}[1]{\doh{#1}{y}} \newcommand{\doxx}[1]{\doh{#1}{x^2}} \newcommand{\doyy}[1]{\doh{#1}{y^2}} \newcommand{\doxy}[1]{\frac{\partial #1}{\partial x \partial y}} \newcommand{\doyx}[1]{\frac{\partial #1}{\partial y \partial x}} \newcommand{\qed}{\tag*{$\blacksquare$}}\)

        Multivariate limits and derivatives
        Calculus
      

The concepts of limits, derivatives, continuity, and smoothness that we studied earlier are all applicable to multivariate functions. And they mean exactly the same. Just the definitions have to be adapted.

Prerequisites

To understand multivariate limits and derivatives, we recommend familiarity with the concepts in

Follow the above links to first get acquainted with the corresponding concepts.

Multivariate limits

The limit of a multivariate function $ f(x_1,\ldots,x_n) $ at a multivariate point $(a_1,\ldots,a_\ndim)$ is defined as

$$ \lim_{\substack{x_1 \to a_1 \\ \ldots \\ \\ x_\ndim \to a_\ndim}} f(x_1,\ldots,x_\ndim) $$

Limit of a multivariate function may not exist, or the multivariate function may be discontinuous, at a point $ \va = [a_1, \ldots, a_n] $, for the same reasons that we discussed in the univariate case.

Multivariate derivatives

If a function is continuous at a point, then the derivative of the multivariate function $f'(\vx) $ means the same thing — the slope of the function.

But there is an interesting twist to the story here.

Being a multivariate function, the rate of change needs to be qualified with the dimension along which we wish to measure the rate of change. For a function $ f(x_1,\ldots,x_n) $, are we measuring the change along $ x_1 $ or $ x_2 $, or $ x_n $? This is crucial because changing dimension may have different effect on the function.

To measure the rate of change for the $ k $-th variable $ x_k $, here's what we should do

hold all the remaining variables constant
compute the derivative of the resulting univariate function in $ x_k $ only with respect to $ x_k $

So, in effect, we are only computing a part of the derivative of that function. The part that is only affected by $ x_k $.

Naturally, such a derivative is known as partial derivative of $ f(x_1,\ldots,x_n) $ with respect to $ x_k $. It is denoted as $ \frac{\partial f}{\partial x_k} $. It is computed exactly the same way we would compute a derivative of a univariate function.

$$ \frac{\partial f(\vx)}{\partial x_k} = \lim_{h \to 0} \frac{f(x_1,\ldots,x_k + h,\ldots, x_n) - f(x_1,\ldots,x_k, \ldots, x_n)}{h} $$

If there are $ n $ input variables to a function $ f(x_1,\ldots,x_n) $, then one can try to compute $ n $ partial derivatives of the function $\frac{\partial f}{\partial x_1}, \ldots, \frac{\partial f}{\partial x_n} $; one with respect to each input variable. This collection of partial derivatives, a generalization of the derivative to the multivariate case, is known as the gradient of the function.

Higher order derivatives

And if one can calculate $ n $ partial derivatives of a multivariate function, then we could also calculate partial derivatives of these partial derivatives. So, the partial derivatives of the first-order-partial derivative $ \frac{\partial f}{\partial x_k} $ will lead to $ n $ second-order partial derivatives

$$ \frac{\partial^2 f}{\partial x_j x_k} = \frac{\partial}{\partial x_j} \left( \frac{\partial f}{\partial x_k} \right), \forall j = \{1,\ldots,n\} $$

Note that there are $ n $ first-order partial derivatives. Thus, there will be $ n^2 $ second-order partial derivatives.

Are second order partial derivatives interchangeable in order? Can we take the derivative first with $ x_j $ and then with $ x_k $?

The Schwartz Theorem (also known as Clairaut's theorem or Young's theorem) states that if a function has continuous second-order partial derivatives at a point $ (a_1,\ldots,a_n) $, then the partial derivatives are interchangeable. So, only under the conditions of the Schwartz theorem

$$ \frac{\partial^2 f}{\partial x_k x_j} = \frac{\partial^2 f}{\partial x_j x_k} $$

Concise notation

Explicitly writing each variable can become quite cumbersome.

An astute reader will notice that we can use the vector notation introduced earlier to arrive at a concise notation for multivariate functions, limits, and partial derivatives.

Multivariate limits

A multivariate function on $ n $ input variables $ f(x_1,\ldots,x_n) $ can be written in a vector form by encapsulating the input variables into a vector $ \vx = [x_1,\ldots,x_n] $. So, $ f(x_1,\ldots,x_n) = f(\vx) $

The limit of a multivariate function $ f(\vx) $ can then be defined as

$$ \lim_{\vx \to \va} f(\vx) $$

Note that $ \vx \in \real^n $ is a vector, so $ \va \in \real^n $ is also a vector of the same dimensionality, $ \va = [a_1,\ldots,a_n] $.

Gradients

The set of partial derivatives, the gradient can also be denoted as vector of partial derivative with respect to each input variable.

$$ \nabla = \begin{bmatrix} \frac{\partial f(\vx)}{\partial x_1}, \ldots, \frac{\partial f(\vx)}{\partial x_n} \end{bmatrix} $$

Hessian

The second order partial derivatives, all the $ n^2 $ of them, can be denoted by a matrix $ \mathbf{H} $

$$ \mathbf{H} = \begin{bmatrix} \frac{\partial f}{\partial x_1^2} & \ldots & \frac{\partial f}{\partial x_1 x_n} \\ \vdots & \ddots & \vdots \\ \frac{\partial f}{\partial^2 x_n x_1} & \ldots & \frac{\partial f}{\partial x_n^2} \\ \end{bmatrix} $$

This matrix of second-order partial derivatives of a function is called the Hessian of the function. The $i,j$-th element of the Hessian is defined as

$$ \mathbf{H}_{i,j} = \frac{\partial^2 f}{\partial x_j x_i} $$

Under the conditions of the Schwartz Theorem described earlier, $ \mathbf{H}_{i,j} = \mathbf{H}_{j,i} $. This would mean, that if a function has continuous second-order derivatives, then the Hessian of the function is symmetric. But there are many functions for which the Hessian is not symmetric. So beware!

Where to next?

Now that you know gradients and Hessians, it is time to put them to use for identifying minima, maxima, and saddle points.

Already a calculus expert? Check out comprehensive courses on machine learning or deep learning.

Please support us

Help us create more engaging and effective content and keep it free of paywalls and advertisements!

Please donate

Let's connect

Please share your comments, questions, encouragement, and feedback.