Understanding \( \mU, \mD, \text{ and } \mV \) in SVD
Now that we know that eigendecomposition is different from SVD, time to understand the individual components of the SVD.
We saw in an earlier interactive demo that orthogonal matrices rotate and reflect, but never stretch.
So that's the role of \( \mU \) and \( \mV \), both orthogonal matrices.
But, \( \mU \in \real^{m \times m} \) and \( \mV \in \real^{n \times n} \).
So they perform the rotation in different spaces.
Since \( \mU \) and \( \mV \) are strictly orthogonal matrices and only perform rotation or reflection, any stretching or shrinkage has to come from the diagonal matrix \( \mD \).
Hence, the diagonal non-zero elements of \( \mD \), the singular values, are non-negative.
Any dimensions with zero singular values are essentially squashed. Dimensions with higher singular values are more dominant (stretched) and conversely, those with lower singular values are shrunk. And therein lies the importance of SVD.
By focusing on directions of larger singular values, one might ensure that the data, any resulting models, and analyses are about the dominant patterns in the data.
This is achieved by sorting the singular values in magnitude and truncating the diagonal matrix to dominant singular values.
That will entail corresponding adjustments to the \( \mU \) and \( \mV \) matrices by getting rid of the rows or columns that correspond to lower singular values.
So, if we are focused on the \( r \) top singular values, then we can construct an approximate or compressed version \( \mA_r \) of the original matrix \( \mA \) as follows:
$$ \mA_r = \mU_r \mD_r \mV_r^T $$
This is a great way of compressing a dataset while still retaining the dominant patterns within.
In fact, the number of non-zero or positive singular values of a matrix is equal to its rank.
Machine learning is all about working with the generalizable and dominant patterns in data.
Some details might be lost.
In fact, in some cases, it is desirable to ignore irrelevant details to avoid the phenomenon of overfitting. And this is where SVD helps.
As a consequence, the SVD appears in numerous algorithms in machine learning. In the upcoming learning modules, we will highlight the importance of SVD for processing and analyzing datasets and models.