Interactive tutorial on systems of linear equations

This learning module has many interactive demos. It is easier to work with them on a larger screen. Bookmark and revisit if you are currently on a small screen device.

\(\DeclareMathOperator*{\argmax}{arg\,max} \DeclareMathOperator*{\argmin}{arg\,min} \DeclareMathOperator*{\asterisk}{\ast} \newcommand{\sup}{\text{sup}} \newcommand{\inf}{\text{inf}} \newcommand{\min}{\text{min}\;} \newcommand{\max}{\text{max}\;} \newcommand{\maxunder}[1]{\underset{#1}{\max}} \newcommand{\minunder}[1]{\underset{#1}{\min}} \newcommand{\real}{\mathbb{R}} \newcommand{\natural}{\mathbb{N}} \newcommand{\integer}{\mathbb{Z}} \newcommand{\rational}{\mathbb{Q}} \newcommand{\irrational}{\mathbb{I}} \newcommand{\complex}{\mathbb{C}} \newcommand{\cardinality}[1]{|#1|} \newcommand{\vec}[1]{\mathbf{#1}} \newcommand{\mat}[1]{\mathbf{#1}} \newcommand{\star}[1]{#1^*} \newcommand{\inv}[1]{#1^{-1}} \newcommand{\indicator}[1]{\mathcal{I}(#1)} \renewcommand{\BigO}[1]{\mathcal{O}(#1)} \renewcommand{\BigOsymbol}{\mathcal{O}} \renewcommand{\smallo}[1]{\mathcal{o}(#1)} \renewcommand{\smallosymbol}[1]{\mathcal{o}} \newcommand{\set}[1]{\mathbb{#1}} \newcommand{\complement}[1]{#1^c} \newcommand{\powerset}[1]{\mathcal{P}(#1)} \newcommand{\setdiff}{\setminus} \newcommand{\setsymmdiff}{\oplus} \newcommand{\dash}[1]{#1^{'}} \newcommand{\permutation}[2]{{}_{#1} \mathrm{ P }_{#2}} \newcommand{\combination}[2]{{}_{#1} \mathrm{ C }_{#2}} \newcommand{\prob}[1]{P(#1)} \newcommand{\pmf}[1]{P(#1)} \newcommand{\pdf}[1]{p(#1)} \newcommand{\cdf}[1]{F(#1)} \newcommand{\expect}[2]{E_{#1}\left[#2\right]} \newcommand{\entropy}[1]{\mathcal{H}\left[#1\right]} \newcommand{\expe}[1]{\mathrm{e}^{#1}} \newcommand{\textexp}[1]{\text{exp}\left(#1\right)} \def\independent{\perp\!\!\!\perp} \def\notindependent{\not\!\independent} \newcommand{\yhat}{\hat{y}} \newcommand{\vs}{\vec{s}} \newcommand{\vt}{\vec{t}} \newcommand{\vu}{\vec{u}} \newcommand{\vv}{\vec{v}} \newcommand{\vw}{\vec{w}} \newcommand{\vx}{\vec{x}} \newcommand{\vy}{\vec{y}} \newcommand{\vz}{\vec{z}} \newcommand{\va}{\vec{a}} \newcommand{\vb}{\vec{b}} \newcommand{\vc}{\vec{c}} \newcommand{\vd}{\vec{d}} \newcommand{\ve}{\vec{e}} \newcommand{\vg}{\vec{g}} \newcommand{\vh}{\vec{h}} \newcommand{\vi}{\vec{i}} \newcommand{\vk}{\vec{k}} \newcommand{\vo}{\vec{o}} \newcommand{\vp}{\vec{p}} \newcommand{\vq}{\vec{q}} \newcommand{\vr}{\vec{r}} \newcommand{\vs}{\vec{s}} \newcommand{\vmu}{\vec{\mu}} \newcommand{\vsigma}{\vec{\sigma}} \newcommand{\vphi}{\vec{\phi}} \newcommand{\vtau}{\vec{\tau}} \newcommand{\vtheta}{\vec{\theta}} \newcommand{\mA}{\mat{A}} \newcommand{\mB}{\mat{B}} \newcommand{\mC}{\mat{C}} \newcommand{\mD}{\mat{D}} \newcommand{\mE}{\mat{E}} \newcommand{\mH}{\mat{H}} \newcommand{\mK}{\mat{K}} \newcommand{\mP}{\mat{P}} \newcommand{\mQ}{\mat{Q}} \newcommand{\mR}{\mat{R}} \newcommand{\mS}{\mat{S}} \newcommand{\mU}{\mat{U}} \newcommand{\mV}{\mat{V}} \newcommand{\mW}{\mat{W}} \newcommand{\mX}{\mat{X}} \newcommand{\mY}{\mat{Y}} \newcommand{\mZ}{\mat{Z}} \newcommand{\mI}{\mat{I}} \newcommand{\mLambda}{\mat{\Lambda}} \newcommand{\mSigma}{\mat{\Sigma}} \newcommand{\mTheta}{\mat{\theta}} \newcommand{\setsymb}[1]{#1} \newcommand{\sA}{\setsymb{A}} \newcommand{\sB}{\setsymb{B}} \newcommand{\sC}{\setsymb{C}} \newcommand{\sO}{\setsymb{O}} \newcommand{\sP}{\setsymb{P}} \newcommand{\sQ}{\setsymb{Q}} \newcommand{\sH}{\setsymb{H}} \newcommand{\sX}{\setsymb{X}} \newcommand{\sY}{\setsymb{Y}} \newcommand{\norm}[2]{||{#1}||_{#2}} \newcommand{\infnorm}[1]{\norm{#1}{\infty}} \newcommand{\fillinblank}{\text{ }\underline{\text{ ? }}\text{ }} \newcommand{\lbrace}{\left\{} \newcommand{\rbrace}{\right\}} \newcommand{\set}[1]{\lbrace #1 \rbrace} \newcommand{\seq}[1]{\left( #1 \right)} \newcommand{\ndim}{N} \newcommand{\ndimsmall}{n} \newcommand{\dataset}{\mathbb{D}} \newcommand{\ndata}{D} \newcommand{\ndatasmall}{d} \newcommand{\labeledset}{\mathbb{L}} \newcommand{\nlabeled}{L} \newcommand{\nlabeledsmall}{l} \newcommand{\unlabeledset}{\mathbb{U}} \newcommand{\nunlabeled}{U} \newcommand{\nunlabeledsmall}{u} \newcommand{\nclass}{M} \newcommand{\nclasssmall}{m} \newcommand{\loss}{\mathcal{L}} \newcommand{\sign}{\text{sign}} \newcommand{\Gauss}{\mathcal{N}} \newcommand{\hadamard}{\circ} \newcommand{\doh}[2]{\frac{\partial #1}{\partial #2}} \newcommand{\dox}[1]{\doh{#1}{x}} \newcommand{\doy}[1]{\doh{#1}{y}} \newcommand{\doxx}[1]{\doh{#1}{x^2}} \newcommand{\doyy}[1]{\doh{#1}{y^2}} \newcommand{\doxy}[1]{\frac{\partial #1}{\partial x \partial y}} \newcommand{\doyx}[1]{\frac{\partial #1}{\partial y \partial x}} \newcommand{\qed}{\tag*{$\blacksquare$}}\)

        Systems of linear equations
        Linear Algebra
      

Linear Algebra was developed to solve systems of linear equations. So it is crucial we understand some basics about solving linear equations here.

Prerequisites

To understand systems of linear equations, we recommend familiarity with the concepts in

Follow the above links to first get acquainted with the corresponding concepts.

Linear equations

Suppose you wish to solve an equation in 2 unknown variables.

$$ 2x + 4y = 22 $$

Can you solve this to find the value of $ x $ and $ y $? No, not unless you can have more information.

Suppose the additional information is provided in the form of another equation

$$ x + y = 6 $$

Great! Now you can solve it. How did you do it?

Mechanically, you went through these steps.

Multiply the second equation with 2 to get $ 2x + 2y = 12 $.
Subtract this updated equation from the first equation to get $ 2y = 10 $.
Solve this simple equation in one variable $ y = 5 $.
Substitute back in the second equation to get $ x = 1 $.

Note how you first eliminated $ x $ to arrive at an equation in one variable $ y $. This mechanical process is known as the process of Gaussian elimination.

Gaussian elimination is a general purpose strategy to solve equations in any number of unknown variables.

Gaussian elimination: demo

Check out the next interactive demo to see Gaussian elimination in action. Vary the number of equations and the number of unknowns to see the steps needed to arrive at a solution. Try to understand when the system is unable to solve the equations. Building intuition here is key to understanding much of linear algebra.

(Note: We are deliberately not doing any exception handling or dealing with infinity and NaNs. Check out when you get them to strengthen your intuition.)

Can every set of equations be solved?

If you have interacted with the demo above, you already know the answer. And also some pitfalls.

No, every set of equations cannot be solved. Here are the obvious outliers

If the number of unknowns is more than the number of equations
If the number of equations is more than the number of unknowns

In the first case, we have inadequate information to attempt to solve the problem. Many solutions will satisfy the given equations. But we will not be able to narrow down to the precise one.

In the second case, some equations might be inconsistent with the others. That means, the solution for a subset of equations will differ from another subset of equations. Hence, again we cannot precisely arrive at a single solution to the problem.

But what if the number of equations is the same as the number of unknowns. Let's take an example. Here are some equations in 2 variables.

$$ \begin{aligned} & 2x + 3y = 5 \\ & 4x + 6y = 10 \end{aligned} $$

If we were to do Gaussian elimination here, then to eliminate $ x $ from the second equation, we will subtract 2 times the first equation from the second to get $ 0x + 0y = 0 $ !

Not good! In our attempt to eliminate $ x $, we got rid of $ y $ too. We cannot solve for $ y $ anymore.

This happened because the second equation was a mere scaled up version of the first equation. In other words, the second equation, although seemingly different, conveyed the same information as the first one. It was not an independent piece of information.

What if we had more than 2, say 3, equations in 3 unknown variables? This is where things get interesting. Check out the next Gaussian elimination demo where we have purposely constrained the equations to have a certain nature. The last equation for all examples in this demo is a sum over scalar multiples of previous equations.

Here's what you will find. To be able to solve equations in $ n $ unknown variables, you need exactly $ n $ independent and consistent equations.

Gaussian elimination solvability : demo

Solver for systems of linear equations

Enter your own linear equations. Compare your solution to ours.

Enter each equation on a separate line in the input box. Each equation should use the commonly used format such as $ 2x + 3y = 4 $. You may use any characters in the English alphabet, $ a $ through $ z $ as your variables. The constants may be integers or real-valued numbers.

A example set of equations has been provided.

Linear equations

Representing equations as vector dot product

Now that we understand how to solve linear equations, how about we come up with a compact representation for equations?

An astute reader will note that a linear equation such as $ 2x_1 + 4x_2 = 22 $ can be written in the form of vector dot products that we presented earlier.

$$ \va \cdot \vx = y $$

Here, $ \vx \in \real^2 $ is a vector containing the variables, $ \vx = [x_1,x_2] $. Also, $ \va \in \real^2 $ is another vector containing the constants, $ \va = [2,4] $. And, $ y $ is the result.

This is a succinct representation and will be followed henceforth to represent equations.

Linear independence

What about independence of equations in terms of vectors? Continuing our discussion of independence, if one equation is a scaled version of another, then they are not independent. In vector notation, if $ \va_j = \alpha \va_i, i \ne j $, then the vectors are not independent. Conversely, if $ \va_j \ne \alpha \va_i, i \ne j $, then the vectors $ \va_i $ and $ \va_j $ are said to be linearly independent.

What if there are more than 2 equations and more than 2 variables? Let's suppose we have $ m $ equations and $ n $ variables. Easy, we create $ m $ vectors, each with $ n $ values. Each of the $ m $ equations results in $ m $ outputs.

Collectively, this can all be written as $$ \va_i \cdot \vx = y_i, \forall i = \{1,\ldots,m\} $$

What about independence for more than 2 equations? Just like we need 2 linearly independent vectors to solve for 2 unknowns, so do we need $ n $ linearly independent vectors to solve for $ n $ unknowns. Let's extend our formula of independence to more than 2 vectors.

So, if any of the $ m $ vectors can be written as a linear combination of the remaining set of vectors, $ \va_j = \sum_i \alpha_i \va_i, \forall i, i \ne j $, then the set of vectors are not linearly independent.

In plain terms, a group of $ m $ vectors is said to be linearly independent if none of them can be written as the linear combination of the rest.

And, to solve for a $ n $ unknowns, you need $ n $ linearly independent equations or vectors.

A more concise representation for equations

Now that we have a concise representation for a single equation as a vector, how about we come up with a composite representation over multiple vectors. Let's stack each vector in the set $ \{\va_1, \ldots, \va_m \}, \va_i \in \real^n, \forall i $ as follows and refer to the entire box by the symbol $ \mA $.

$$ \mA = \begin{bmatrix} \va_{1,1} & \ldots & \va_{1,n} \\ \vdots & \ddots & \vdots \\ \va_{m,1} & \ldots & \va_{m,n} \\ \end{bmatrix} $$

Here, $ \va_{i,j} $ represents the $ j $-th element of the $ i $-th vector.

This composite representation of multiple vectors is known as a matrix.

Each horizontal list of elements is known as a row of the matrix. For example, the elements $ [\va_{1,1}, \va_{1,2}, \ldots, \va_{1,n}] $ form the first row of the matrix $ \mA $.

Similarly, each vertical list of elements is known as a column of the matrix. For example, the elements $ [\va_{1,1}, \va_{2,1}, \ldots, \va_{m,1}] $ form the first column of the matrix $ \mA $.

Naturally, the diagonal, or the list of elements from the top left to the bottom right, are known as the diagonal of the matrix. For example, the elements $ [\va_{1,1}, \va_{2,2}, \ldots, \va_{n,n}] $ form the diagonal of the matrix $ \mA $.

So, now that the vectors of multipliers have been stacked, how about the vector of unknowns and outputs?

Easy, we define a new operation on the matrix, the so-called matrix-multiplication to make it as succinct as

$$ \mA \vx = \vy $$

How cool is that? Matrix multiplication works by taking a row from the matrix $ \mA $, say $ \va_i $ , and performing a dot product of $ \va_i $ with the column vector $ \vx $ of unknowns to arrive at the answer $ y_i $. Doing so over all the rows of the matrix leads to $ m $ outputs, thereby creating our output vector $ \vy $.

It should be noted that the number of results in $ \vy $ is the same as the number of rows of $ \mA $.

Gaussian elimination can be easily re-implemented for matrices as simple row-operations. Multiply a row with an appropriate scalar and subtract it from another row to eliminate a variable and continue to the solution as we demonstrated earlier.

What about the independence requirements we figured out earlier? Well, they still apply. More succinctly so. To solve equations in $ n $ variables you need $ n $ linearly independent rows and $ n $ linearly independent columns in the matrix $ \mA $.

Where to next?

Now that we have introduced the matrix, there is no looking back. Explore matrices (plural of matrix), their properties, their types, and the operations defined on them in our comprehensive articles on matrices.

If you already understand matrices, explore other advanced topics in linear algebra.

Already feeling like an expert in linear algebra. Move on to other advanced topics in mathematics or machine learning.

Please support us

Help us create more engaging and effective content and keep it free of paywalls and advertisements!

Please donate

Let's connect

Please share your comments, questions, encouragement, and feedback.