Interactive tutorial to understand transformations by matrices

This learning module has many interactive demos. It is easier to work with them on a larger screen. Bookmark and revisit if you are currently on a small screen device.

\(\DeclareMathOperator*{\argmax}{arg\,max} \DeclareMathOperator*{\argmin}{arg\,min} \DeclareMathOperator*{\asterisk}{\ast} \newcommand{\sup}{\text{sup}} \newcommand{\inf}{\text{inf}} \newcommand{\min}{\text{min}\;} \newcommand{\max}{\text{max}\;} \newcommand{\maxunder}[1]{\underset{#1}{\max}} \newcommand{\minunder}[1]{\underset{#1}{\min}} \newcommand{\real}{\mathbb{R}} \newcommand{\natural}{\mathbb{N}} \newcommand{\integer}{\mathbb{Z}} \newcommand{\rational}{\mathbb{Q}} \newcommand{\irrational}{\mathbb{I}} \newcommand{\complex}{\mathbb{C}} \newcommand{\cardinality}[1]{|#1|} \newcommand{\vec}[1]{\mathbf{#1}} \newcommand{\mat}[1]{\mathbf{#1}} \newcommand{\star}[1]{#1^*} \newcommand{\inv}[1]{#1^{-1}} \newcommand{\indicator}[1]{\mathcal{I}(#1)} \renewcommand{\BigO}[1]{\mathcal{O}(#1)} \renewcommand{\BigOsymbol}{\mathcal{O}} \renewcommand{\smallo}[1]{\mathcal{o}(#1)} \renewcommand{\smallosymbol}[1]{\mathcal{o}} \newcommand{\set}[1]{\mathbb{#1}} \newcommand{\complement}[1]{#1^c} \newcommand{\powerset}[1]{\mathcal{P}(#1)} \newcommand{\setdiff}{\setminus} \newcommand{\setsymmdiff}{\oplus} \newcommand{\dash}[1]{#1^{'}} \newcommand{\permutation}[2]{{}_{#1} \mathrm{ P }_{#2}} \newcommand{\combination}[2]{{}_{#1} \mathrm{ C }_{#2}} \newcommand{\prob}[1]{P(#1)} \newcommand{\pmf}[1]{P(#1)} \newcommand{\pdf}[1]{p(#1)} \newcommand{\cdf}[1]{F(#1)} \newcommand{\expect}[2]{E_{#1}\left[#2\right]} \newcommand{\entropy}[1]{\mathcal{H}\left[#1\right]} \newcommand{\expe}[1]{\mathrm{e}^{#1}} \newcommand{\textexp}[1]{\text{exp}\left(#1\right)} \def\independent{\perp\!\!\!\perp} \def\notindependent{\not\!\independent} \newcommand{\yhat}{\hat{y}} \newcommand{\vs}{\vec{s}} \newcommand{\vt}{\vec{t}} \newcommand{\vu}{\vec{u}} \newcommand{\vv}{\vec{v}} \newcommand{\vw}{\vec{w}} \newcommand{\vx}{\vec{x}} \newcommand{\vy}{\vec{y}} \newcommand{\vz}{\vec{z}} \newcommand{\va}{\vec{a}} \newcommand{\vb}{\vec{b}} \newcommand{\vc}{\vec{c}} \newcommand{\vd}{\vec{d}} \newcommand{\ve}{\vec{e}} \newcommand{\vg}{\vec{g}} \newcommand{\vh}{\vec{h}} \newcommand{\vi}{\vec{i}} \newcommand{\vk}{\vec{k}} \newcommand{\vo}{\vec{o}} \newcommand{\vp}{\vec{p}} \newcommand{\vq}{\vec{q}} \newcommand{\vr}{\vec{r}} \newcommand{\vs}{\vec{s}} \newcommand{\vmu}{\vec{\mu}} \newcommand{\vsigma}{\vec{\sigma}} \newcommand{\vphi}{\vec{\phi}} \newcommand{\vtau}{\vec{\tau}} \newcommand{\vtheta}{\vec{\theta}} \newcommand{\mA}{\mat{A}} \newcommand{\mB}{\mat{B}} \newcommand{\mC}{\mat{C}} \newcommand{\mD}{\mat{D}} \newcommand{\mE}{\mat{E}} \newcommand{\mH}{\mat{H}} \newcommand{\mK}{\mat{K}} \newcommand{\mP}{\mat{P}} \newcommand{\mQ}{\mat{Q}} \newcommand{\mR}{\mat{R}} \newcommand{\mS}{\mat{S}} \newcommand{\mU}{\mat{U}} \newcommand{\mV}{\mat{V}} \newcommand{\mW}{\mat{W}} \newcommand{\mX}{\mat{X}} \newcommand{\mY}{\mat{Y}} \newcommand{\mZ}{\mat{Z}} \newcommand{\mI}{\mat{I}} \newcommand{\mLambda}{\mat{\Lambda}} \newcommand{\mSigma}{\mat{\Sigma}} \newcommand{\mTheta}{\mat{\theta}} \newcommand{\setsymb}[1]{#1} \newcommand{\sA}{\setsymb{A}} \newcommand{\sB}{\setsymb{B}} \newcommand{\sC}{\setsymb{C}} \newcommand{\sO}{\setsymb{O}} \newcommand{\sP}{\setsymb{P}} \newcommand{\sQ}{\setsymb{Q}} \newcommand{\sH}{\setsymb{H}} \newcommand{\sX}{\setsymb{X}} \newcommand{\sY}{\setsymb{Y}} \newcommand{\norm}[2]{||{#1}||_{#2}} \newcommand{\infnorm}[1]{\norm{#1}{\infty}} \newcommand{\fillinblank}{\text{ }\underline{\text{ ? }}\text{ }} \newcommand{\lbrace}{\left\{} \newcommand{\rbrace}{\right\}} \newcommand{\set}[1]{\lbrace #1 \rbrace} \newcommand{\seq}[1]{\left( #1 \right)} \newcommand{\ndim}{N} \newcommand{\ndimsmall}{n} \newcommand{\dataset}{\mathbb{D}} \newcommand{\ndata}{D} \newcommand{\ndatasmall}{d} \newcommand{\labeledset}{\mathbb{L}} \newcommand{\nlabeled}{L} \newcommand{\nlabeledsmall}{l} \newcommand{\unlabeledset}{\mathbb{U}} \newcommand{\nunlabeled}{U} \newcommand{\nunlabeledsmall}{u} \newcommand{\nclass}{M} \newcommand{\nclasssmall}{m} \newcommand{\loss}{\mathcal{L}} \newcommand{\sign}{\text{sign}} \newcommand{\Gauss}{\mathcal{N}} \newcommand{\hadamard}{\circ} \newcommand{\doh}[2]{\frac{\partial #1}{\partial #2}} \newcommand{\dox}[1]{\doh{#1}{x}} \newcommand{\doy}[1]{\doh{#1}{y}} \newcommand{\doxx}[1]{\doh{#1}{x^2}} \newcommand{\doyy}[1]{\doh{#1}{y^2}} \newcommand{\doxy}[1]{\frac{\partial #1}{\partial x \partial y}} \newcommand{\doyx}[1]{\frac{\partial #1}{\partial y \partial x}} \newcommand{\qed}{\tag*{$\blacksquare$}}\)

        Matrix geometry
        Linear Algebra
      

One way to look at our system of linear equations $ \mA \vx = \vy $ is to think that the matrix $ \mA $ is acting as a transformation on the vector $ \vx $ and resulting in another vector $ \vy $.

This is akin to vector scaling and rotation that we presented earlier.

Let's build some intuition about matrix as a transformation on a vector. Then, let us understand these results from a geometric perspective to understand the rotational, translational, and scaling properties of a matrix.

It will be crucial for understanding some complex matrix transformations and operations that we present later.

Prerequisites

To understand the transformational natures of matrices, we recommend familiarity with the concepts in

Follow the above links to first get acquainted with the corresponding concepts.

The identity matrix transform.

Let us start with the identity matrix. We are interested in the product $ \mI \vx = \vy $.

Expressing it in a verbose form, for let's say 2 dimensions, helps to understand what really happens.

$$ \begin{aligned} & \mI \vx = \vy \\ & \implies \begin{bmatrix} 1 & 0 \\ 0 & 1 \\ \end{bmatrix} \begin{bmatrix} x_1 \\ x_2 \end{bmatrix} = \begin{bmatrix} y_1 \\ y_2 \end{bmatrix} \\ & \implies \begin{bmatrix} 1x_1 + 0x_2 \\ 0x_1 + 1x_2 \end{bmatrix} = \begin{bmatrix} y_1 \\ y_2 \end{bmatrix} \\ & \implies \begin{bmatrix} x_1 \\ x_2 \end{bmatrix} = \begin{bmatrix} y_1 \\ y_2 \end{bmatrix} \\ & \implies x_1 = y_1 \text{ and } x_2 = y_2 \\ & \implies \vx = \vy
\end{aligned} $$

This essentially means that multiplying by an identity matrix preserves the original vector.

Scaled identity matrix

What about multiplication by a scaled identity matrix, say $ \alpha \mI $?

It is easy to extend the analysis above to note that $ \vy = \alpha \vx $.

Diagonal matrix transform on a vector

What about multiplying a general vector $ \vx \in \real^n $ by a general diagonal matrix, say $ \text{diag}(\alpha_1, \ldots, \alpha_n) $.

Again, easy to work out the math to note that each element of the resulting vector will be scaled by the corresponding factor in the diagonal matrix, so that $ \vy = [y_1, \ldots, y_n] $ such that $ y_i = \alpha_i x_i, \forall i \in \{1, \ldots, n \} $.

Check out the next interactive demo to understand this transformation effect of a diagonal matrix on an input vector. In this demo, we have constrained the matrix to be a diagonal matrix. Check out what happens when all diagonal elements of the matrix are the same or distinct. Check if you can scale, flip, and make the input vector span the entire space by just multiplying with a diagonal matrix.

Drag the cricle to change the matrix row vectors

Some key observations to check out.

When both diagonal elements are exactly 1, then $ \vy = \vx $.
When the diagonal elements are the same, then $ \vy = \alpha \vx $, where $ \alpha $ is the value of the diagonal elements.
When one diagonal is larger than the other, then $ \vy $ is pulled towards that diagonal (in the positive quadrant).

So, the moral of the story is that multiplication with a diagonal matrix transforms the input vector. This may seem like a trivial transform, but it is a handy tool in implementing computationally efficient code. You would be surprised by the number of for-loops that can be avoided by simple diagonal matrix multiplication.

Matrix as a transformation on space

Instead of checking the impact of a matrix on a single vector, wouldn't it be cool to check out the overall impact of a matrix on an entire collection of vectors, rather, the entire space?

This is what we do in the upcoming demos. We have represented the space with dots lined along concentric circles. Each dot represents the end point of a vector in that space.

The size of the dot is proportional to its distance from the center. Thus, the size of the dot is indicative of the magnitude of the vector it represents. (You know why!)

The color of the dot changes based on the angle the vector makes with the $X$-axis. So, all dots with the same color represent vectors that are at the same angle with the $X$-axis.

We start with demonstrating the effect of a diagonal matrix. You can modify a particular diagonal element of the matrix by dragging the corresponding axis vector. You will note the following:

Changing the diagonal elements stretches or shrinks the space along the corresponding axis. There is no effect on the spacing of the dots along the other axis.
The relative ordering of points remains the same, implying a linear transform.
There is absolutely no rotation.
Negating the values along a diagonal has the effect of flipping the axis and space.

Diagonal matrix transform on space demo

Drag the cricle to change the matrix row vectors

Orthogonal matrix transformation on space

Now that we know that diagonal matrices stretch and shrink the space, let's try to understand the impact of orthogonal matrices on the space.

Many of the important matrix factorizations and decompositions deal with orthogonal matrices. This demo will help you understand the implications of orthogonal transformations.

In the next demo, we will constrain the matrix to be an orthogonal matrix. That means that the rows vectors will be constrained to be orthonormal. Each of them is constrained to be a unit vector and the angle between them is constrained to be 90 degrees.

To enable this demo, you can drag only one vector in the demo. The other one adjusts itself to remain orthonormal to the first one.

Check out how an orthogonal matrix only rotates the input space. And it rotates the input space by the same amount as the matrix is rotated. Also note that there is absolutely no stretching or shrinking due to an orthogonal matrix.

Orthogonal matrix transform on space demo

Drag the cricle to change the matrix row vectors

Upper Triangular matrix transform on space.

Upper triangular matrices are zero below the main diagonal.

In the next demo spin the row representing the $X$-axis (row 1 in the demo), a full 360 degrees. Observe how the space rotates along the $Y$-axis due to multiplication by an upper triangular matrix. There is no rotation along the $X$-axis. This is because the $Y$-axis does not participate in the transformation. Of course, except the stretching/shrinking/flipping along $Y$-axis, if you modified that.

Upper triangular matrix transform on space demo

Drag the cricle to change the matrix row vectors

Lower Triangular matrix transform on space.

Lower triangular matrices are zero above the main diagonal.

I am sure you know what to expect in this case.

In the next demo spin the row representing the $Y$-axis (row 2 in the demo), a full 360 degrees. Observe how the space rotates along the $X$-axis due to multiplication by a lower triangular matrix.

As expected, there is no rotation along the $Y$-axis. The reason is same as before. The $X$-axis does not participate in the transformation. Of course, except the stretching/shrinking/flipping along $X$-axis, if you modified that.

Lower triangular matrix transform on space demo

Drag the cricle to change the matrix row vectors

Symmetric matrix transform on space

Symmetric matrices are a very important family of matrices that appear very often in machine learning literature.

We know what influence diagonal and orthogonal matrices have on an input space. A symmetric matrix could sometimes be diagonal or orthogonal, so we already know what to expect in those situations.

In the next demo, we have constrained the first element of the second row vector to move in sync with the second element of first row vector. This ensures a symmetric matrix at all times.

Keep a watch on the dark blue dots lining the $X$-axis as you move the first row of the matrix. These points rotate along with that vector.

Along notice the light green dots. These always align with the second row vector.

Also note that when the two row vectors are aligned along the same plane, the space is squashed into a single dimension. So all points fall into a line and the line is along the same direction as the direction of the first row.

Every time the two row vectors align the space flips over, as the vectors cross each other.

Now stabilize the first row and see what happens when you move the second row along its constrained plane. You will notice that there is no more rotation. Instead, there is stretching, shrinking, and inverting across the plane set up by the first row.

It is almost like the first row vector is setting the direction (rotation) of the space and then the second dependent vector is acting as a diagonal matrix along that orientation.

Try out the demo to see if you can visualize these ideas.

Symmetric matrix transform on space demo

Drag the cricle to change the matrix row vectors

Positive definite matrix transform on space

A positive definite matrix is any symmetric matrix $ \mA \in \real^{n \times n} $ such that $ \vx^T \mA \vx > 0 $. For a positive definite matrix, the following properties always hold.

The diagonal elements are positive
The non-diagonal elements of a positive definite matrix may be negative, subject to the next property.
Each diagonal element is greater than the sum of the absolute values of the non-diagonal elements of that row (or column, due to symmetry).

How might such a matrix impact the underlying space?

Try out the next demo to find out.

Being a symmetric matrix, much of what you saw in the symmetric matrix transform will hold here.

With one big difference. The PD matrix never turns the space more than 90 degrees. Pay attention to the dark green points (or pick your favorite color) as you move. See how constrained they are in the transformed space. The angle between each vector $ \vx $ in the original space and the output of the transform, $ \mA \vx $ is always less than 90 degrees.

Thus, each point is constrained to an imaginary orthant around it. This is almost equivalent to multiplying by a positive number. That is one intuition behind positive definite.

Can you think about what will happen if the matrix were negative definite?

Positive definite matrix transform on space demo

Drag the cricle to change the matrix row vectors

Where to next?

With a sound understanding of matrix geometry, you are ready to explore other advanced topics in linear algebra.

Already feeling like an expert in linear algebra? Move on to other advanced topics in mathematics or machine learning.

Please support us

Help us create more engaging and effective content and keep it free of paywalls and advertisements!

Please donate

Subscribe for article updates

Stay up to date with new material for free.