Multivariate functions

Calculus

To understand multivariate calculus, first we need to understand multivariate functions.

In this article, we will work through several multivariate functions and their visualizations to build intuition about their nature. These functions will be used in our later articles on gradients and Hessians. So, familiarity with them is essential to understanding more advanced material.

Prerequisites

To understand multivariate functions, we recommend familiarity with the concepts in

Follow the above links to first get acquainted with the corresponding concepts.

Multivariate functions

Functions that take a scalar, a single value, as an input are called univariate functions.
For example, a function mapping a real number to another real number, \( f: \real \rightarrow \real \) is a univariate function.

A function that generates an output based on multiple input values is known as a multivariate function. For example, \( f: \real^n \rightarrow \real \) takes \( n \) real inputs and generates a real output.

The multivariate bowl

We already saw the univariate bowl \( f(x) = x^2 \). Time to understand its multivariate analogue in two dimensions, the multivariate bowl \( f(x,y) = x^2 + y^2 \).

The multivariate bowl function is similar to many loss-functions in machine learning that involve quadratic terms. It is crucial to understand its topology to build intuition about any optimization that might be performed on such functions.

For this function, there is a single minimum, the global minimum, at \( x = 0 \) and \( y = 0 \), as you shall note in the next demo.

Multivariate bowl: Demo

In this interactive demo, we have plotted the multivariate bowl in three panels.

The larger panel shows the contour plot of the function. This means, the color band is indicative of the value of \( f(x,y) \). The variable \( x \) is plotted on the horizontal X-axis. The variable \( y \) is plotted on the vertical Y-axis.

The remaining two panels show a slice of the function if we fix the value along either axis. This provides looking into the function from the side. For example, \( f(x,y) \) with \( y \) held constant means the spectator is analyzing the function standing below the chart looking at a particular slice along the Y-axis.

The slice locations along both axes can be changed by moving the interactive orange circle on the contour plot. The slices can be imagined as vertical cuts made into the function along the two dotted lines; one along each axis. The location of the orange circle is also highlighted on the slice plots as a red dot.

For this particular chart, note that the function achieves a minimum at \( x = 0, y = 0 \). The function increases in value along all other directions. By looking at the bands on the contour plot, note that the increase in the function is slower near the center, and gets quite steep as you move away from \( x = 0, y = 0 \). This can also be verified from the slice plots.

Finally, note that no matter where you move slices, the nature of the function remains the same, a univariate cup.

Rosenbrock's function

Of course, multivariate functions are seldom symmetric.

Let's look at a popular asymmetric function, the Rosenbrock function.

$$ f(x,y) = (a-x)^2 + b(y - x^2)^2 $$

In our implementation, we have assigned \( a = 1 \) and \( b = 100 \), a very common setting for this function.

The Rosenbrock function is a very good example when trying to understand the performance of optimization methods.

Rosenbrock's function: demo

Check out this interactive demo to build intuition about the Rosenbrock function.

Note the parabolic valley, the white band of low values in the center. The global minimum, which happens at \( x = a, y = a^2 \), lies inside this valley.

It can be seen from the contour plot that the rate of change of the function is very different along the two dimensions. Moving towards a lower area in one dimension does not imply you are actually making progress towards the global minimum since the other dimension is so different.

Himmelblau's function

So far, our function had only a single minimum. Such functions, albeit easy to understand, are rare in machine learning.

Let's understand the Himmelblau's function.

$$ f(x,y) = (x^2 + y - 11)^2 + (x + y^2 - 7)^2 $$

Himmelblau's function: demo

Try out the next interactive demo to understand the topology of Himmelblau's function.

From the contour plot, note that the function has four local minimas and one local maximum. The local minima, \( f(x,y) = 0 \) happen at the following points (truncated in precision)

  • \( x = 3.0, y = 2.0 \)
  • \( x = -2.80, y = 3.13 \)
  • \( x = -3.78, y = -3.28 \)
  • \( x = 3.58, y = -1.85 \)

The local maximum, \( f(x,y) = 181.617 \), occurs at \( x = -0.271 \) and \( y = -0.923 \). This is somehwere in the center of the 4 local minima. Note that the function keep increasing along the edges of the graph, so this is only a local maximum.

Notice the small bump, the local maximum, on both the slice plots. Also notice the small puddles on both slice plots. Notice how the two puddles flatten out for \( x = 3 \) on the slice \( f(3,y) \), as that slice ignores the local maximum.

A twisted function: The saddle

In univariate functions, there are only three kinds of critical points: minima, maxima, and inflection points.

In multivariate functions, there is an interplay between different dimensions, resulting in another kind: a saddle point. This next function, we call the Saddle function,

$$ f(x,y) = 3x^2y + y^3 - 3x^2 - 3y^2 + 2 $$

Check out what happens at the following points.

  • \( x = 0, y = 0 \)
  • \( x = 0, y = 2 \)
  • \( x = -1, y = 1 \)
  • \( x = 1, y = 1 \)

The first two are local maxima and minima respectively. The last two are saddle points.

In the next demo, move the orange circle to the saddle points to observe the function behavior around those points.

Saddle function: demo

The slice \( f(x,1) \) is flat; a constant. In fact, \(f(x,1) = 0, \forall x \in \real \).

The slice \( f(-1,y) \) is interesting however. It is similar to the cube function \( f(x) = x^3 \) that we introduced earlier for inflection points. Indeed, there is an inflection point on the slice \(f(-1,y) \) at \( y = 1 \).

Similarly, there is another inflection point on the slice \( f(1,y) \), at \( y = 1 \).

Just like in the case of univariate functions, distinguishing between the various critical points (maxima, minima, inflection, and saddle points) relies on the use of second-order derivatives.

We will present this shortly, after an introduction to multivariate calculus.

Where to next?

This was a quick article on multivariate functions. Just the essential concepts to understand calculus. Explore functions in more depth in our comprehensive article on functions or choose some other topic from mathematical foundations.

Feeling satisfied with functions? Move on to multivariate limits and derivatives.

Already a calculus expert? Check out comprehensive courses on machine learning or deep learning.

Please support us

Help us create more engaging and effective content and keep it free of paywalls and advertisements!

Subscribe for article updates

Stay up to date with new material for free.