Matrix inverse

Linear Algebra

Now that we understand the geometry of the matrix multiplication, it is time to understand it's counterpart — the matrix inverse.

Prerequisites

To understand matrix inverse, we recommend familiarity with the concepts in

Follow the above links to first get acquainted with the corresponding concepts.

Matrix inverse

The inverse of a matrix \( \mA \) is a matrix \( \inv{\mA} \), such that

$$ \mA \inv{\mA} = \inv{\mA}\mA = \mI $$

Systems of linear equations and the inverse

We studied earlier that systems of linear equations can be represented in a concise form as

$$ \mA \vx = \vy $$

where, \( \mA \) is the matrix of coefficients, \( \vx \) is the vector of variables (that need to be solved for), and \( \vy \) is the vector of outputs.

As defined above, \( \inv{\mA} \mA = \mI \). Therefore

\begin{aligned} \mA \vx = \vy& \\\\ \implies& \inv{\mA} \mA \vx = \inv{\mA} \vy \\\\ \implies& \vx = \inv{\mA} \vy \\\\ \end{aligned}

Thus, \( \vx = \inv{\mA}\vy \), is the solution to our equations in variables \( \vx \). In other words, multiplication with a matrix inverse, reverts the effect of the original matrix multiplication, and recovers the original input vector, the desired solution.

Moreover, given the requirements for matrix multiplication, the inverse \( \inv{\mA} \) is also a square matrix of the same size as of the original matrix \( \mA \).

Invertibility

Do all matrices have an inverse? Let's find out.

Now we know from our earlier discussion on systems of linear equations that \( \mA \vx = \vy \) has a solution if and only if \( \mA \) is full rank.

So, for \( \mA \) to be invertible or for the inverse to exist, it needs to be square and full-rank.

No inverse, no solution.

No solution, no inverse.

Easy as that.

A matrix inverse is only defined for a square full-rank matrix.

Calculating an inverse

We saw earlier how to solve equations in \( n \) unknowns by Gaussian elimination. Well, finding the inverse of a \( n \times n \) matrix requires solving for \( n^2 \) unknowns. So, how do we do it?

Well, here's the key information we know. \( \mA \inv{\mA} = \mI\). This is a system of \( n^2 \) equations on \( n^2 \) unknowns. So, as long as \( \mA \) is full rank, we should be able to solve it. Also, we can use plain old Gaussian elimination to do it. Same old row multiplication, subtraction, variable elimination.

Matrix inverse demo

Let's build some intuition from the geometric perspective with this demo. A few observations to note:

  • When the row vectors of \( \mA \) are stretched, the row vectors of \( \inv{\mA} \) get shrunk. This is much like a reciprocal in arithmetic, because for scalars \( p \) and \( q \) it is the case that \( p \frac{1}{p} q = q \). Thus, the inverse is analogous to a reciprocal.
  • When the row vectors of \( \mA \) overlap, it is no longer full-rank, and the inverse does not exist.
Drag the cricle to change the matrix row vectors

Matrix inverse recovery demo

To build further intuition, check out this demo involving matrix inverse. The right-most column of charts shows the space transformations \( \vx, \mA\vx, \inv{\mA}\mA \vx \), in that order. The bottom-most row of charts shows the space transformations \( \vx, \inv{\mA} \vx, \mA \inv{\mA}\vx \).

Notice how the inverse of the matrix rotates the space in a direction opposite that by the input matrix.

Also notice that when the matrix stretches the space along a direction, its inverse shrinks it along the same direction. The effect is reversed when the matrix shrinks it.

Thus, the net effect of \( \inv{\mA} \mA \vx \) is the recovery of the original space \( \vx \), as evidenced in the last panel of the bottom row of charts.

Cool, isn't it?

Drag the cricle to change the matrix row vectors

Inverse of an orthogonal matrix

The inverse of an orthogonal matrix is particularly interesting.

Let \( \mQ \) is an orthogonal matrix.

Remember that the rows and columns of an orthogonal matrix are orthonormal vectors. So a dot product of a row-vector with itself is 1. The dot product between any pair of mutually orthonormal vectors is 0.

We know that the following must be true for an orthogonal matrix \( \mQ \inv{\mQ} = \mI \).

In \(\mQ \inv{\mQ} \), we take the dot product of rows of \( \mQ \) with the columns of \( \inv{\mQ} \). The result \( \mQ \inv{\mQ} = \mI \) implies that the \( i \)-th row of \( \mQ \) is same as the \(i\)-th column of \( \inv{\mQ} \).

Moreover, the \( i \)-the row is orthogonal to the rest of the columns \( j \ne i, \forall j \).

This would be possible if \( \inv{\mQ} = \mQ^T \).

Thus, the transpose of an orthogonal matrix is its inverse. And because a matrix always has a transpose, an orthogonal matrix is always invertible and never singular.

This is a key result. It makes it desirable to factorize or decompose a matrix such that some components are orthogonal matrices. We will study two such decompositions: Eigendecomposition and Singular value decomposition.

What about rectangular matrices?

The Moore-Penrose pseudoinverse, denoted as \( \mA^+ \), is a generalization of the inverse for rectangular matrices.

We do not deal with that often in machine learning literature, so this will not be covered here.

Where to next?

You may choose to explore other advanced topics linear algebra.

Already feeling like an expert in linear algebra? Move on to other advanced topics in mathematics or machine learning.

Please support us

Help us create more engaging and effective content and keep it free of paywalls and advertisements!

Let's connect

Please share your comments, questions, encouragement, and feedback.