Math for Machine Learning
Build the mathematical foundation in linear algebra, calculus, probability, and statistics for ML.
Content
Matrix Decomposition
Versions:
Watch & Learn
AI-discovered learning video
Matrix Decomposition — The Art of Taking Matrices Apart (and Actually Understanding Them)
"If a matrix is a story, then matrix decomposition is the editor who reveals the plot twists."
You already know vectors, dot products, and the boring-but-essential rules of matrix arithmetic from the previous sections. Now we get to the fun part: breaking matrices into meaningful pieces. This is not destruction for the thrill of it — it’s methodical archaeology that reveals the structure hidden in data. In AI, matrix decompositions are the backstage crew: quiet, invisible, and responsible for making the show look effortless (and sometimes saving your model from catastrophic numerical meltdown).
What is a matrix decomposition, and why should you care?
Definition (short): A matrix decomposition expresses a matrix as a product of two or more simpler matrices with nice properties (orthogonality, triangularity, diagonal, etc.).
Why it matters in ML:
- Dimensionality reduction (PCA via SVD) — less noise, faster models.
- Efficient solvers for linear systems (QR, LU, Cholesky) — critical for training linear models and computing updates.
- Low-rank approximations for compression and recommender systems.
- Numerical stability and better conditioning — fewer surprises in optimization.
Think: you’re not just solving Ax = b; you’re asking, "What are the simple, meaningful building blocks that make A tick?"
The heavy hitters — decompositions you’ll use in ML
1) Singular Value Decomposition (SVD)
Formula: A (m×n) = U (m×m) · Σ (m×n) · V^T (n×n)
- U: orthonormal left singular vectors
- Σ: diagonal matrix of singular values (non-negative, sorted)
- V: orthonormal right singular vectors
Intuition: SVD writes your matrix as a set of orthogonal axes (V), scales along those axes (Σ), and rotates into the output space (U). For data matrices, columns of V are principal directions.
ML uses:
- PCA (principal component analysis): either via SVD on centered data X or eigendecompose X^T X
- Low-rank approximation: keep top-k singular values to approximate A with best rank-k matrix (optimal in Frobenius norm)
- Latent factor models (collaborative filtering), image compression, noise reduction
Python (numpy) cheat snippet:
# X: data matrix (n_samples x n_features), centered
U, S, VT = np.linalg.svd(X, full_matrices=False)
# project to k dimensions
Xk = U[:, :k] * S[:k]
# reconstruct low-rank approx
Ak = U[:, :k] @ np.diag(S[:k]) @ VT[:k, :]
Question: Why SVD instead of eigendecomposition? Because SVD works for any rectangular matrix and is numerically more stable.
2) Eigenvalue Decomposition (EVD)
Formula: A = Q Λ Q^-1 (for diagonalizable matrices)
For symmetric matrices: A = Q Λ Q^T with Q orthonormal.
- Λ: diagonal of eigenvalues
- Q: eigenvectors
Intuition: Finding directions v where Av = λv — directions that the linear transformation just scales.
ML uses: covariance matrices in PCA, spectral clustering, understanding linear dynamical systems.
Caveat: EVD requires square matrices; symmetry gives orthogonality and nicer properties.
3) QR Decomposition
Formula: A = Q R (Q orthonormal, R upper triangular)
Used to solve least squares efficiently and to orthonormalize bases (Gram-Schmidt’s newer, stable cousin). QR is the workhorse for numerically stable linear solves.
4) LU Decomposition
Formula: A = L U (lower × upper triangular, often with a permutation matrix)
Used to factor square matrices for fast solving of Ax = b multiple times with different b’s.
5) Cholesky Decomposition
Formula: A = L L^T (for symmetric, positive-definite A)
Intuition: a numerically stable and efficient factorization for covariance-like matrices — used in Gaussian processes and solving normal equations.
Quick comparison (peekable table)
| Decomposition | Square/Rectangular | Requirements | Output Properties | Common ML Uses |
|---|---|---|---|---|
| SVD | Rectangular OK | None | U, Σ, V^T (orthonormal factors) | PCA, low-rank apps, recommender systems |
| Eigen (EVD) | Square | Diagonalizable; symmetric ⇒ orthonormal Q | Eigenvectors, eigenvalues | PCA (via covariance), spectral methods |
| QR | Rectangular OK | None | Orthonormal Q, triangular R | Least squares, stable solves |
| LU | Square | Typically nonsingular | L, U (triangular) | Fast solves when reusing A |
| Cholesky | Square | Symmetric positive-definite | L (lower triangular) with L L^T | Gaussian processes, normal eqns |
How this builds on what you already learned
You learned about vectors and matrix multiplication — orthogonality, projections, and dot products. Decompositions exploit those exact concepts: SVD finds orthogonal directions (like basis vectors you studied) that explain most of the variance. QR uses orthonormal Q (remember orthogonality = numerical sanity). So this is the next logical step: from operating with matrices to interpreting them.
And from AI Foundations: when framing problems, decompositions tell you whether your data is low-rank (i.e., compressible), whether the optimization problem is well-conditioned, and whether a linear model is even appropriate or doomed.
Common misunderstandings (and why people get confused)
- "SVD = PCA" — Close, but PCA is specifically the eigendecomposition of the covariance matrix (or SVD of mean-centered data). They’re two sides of the same coin, not identical rituals.
- "More components = always better" — Nope. More components means more variance explained, but it can also mean more noise and overfitting.
- "Eigenvectors have fixed sign" — They don’t. Eigenvectors are determined up to sign; your result’s sign can flip and still be valid.
Practical tips & numerical realities
- Use SVD for rectangular or badly-conditioned matrices.
- Use Cholesky when you know the matrix is symmetric and PD — it's faster and more stable.
- For huge sparse matrices, use truncated/randomized SVD (scikit-learn, scipy.sparse.linalg.svds).
- Always center data for PCA unless you have a very good reason not to.
Expert take: "Decompositions are less about memorizing formulas and more about intuition — orthogonality, scaling, and structure."
Closing — TL;DR (and a motivational push)
Key takeaways:
- Matrix decomposition = diagnostics + compression. It reveals the axes that matter.
- SVD is your versatile friend: works for rectangular matrices, underpins PCA and low-rank approximations.
- EVD is for symmetric intuition: eigenvalues tell you how the transformation stretches space.
- QR / LU / Cholesky = numerical toolbox for solving equations and speeding up repeated solves.
Powerful insight: In machine learning, much of the "magic" is actually linear algebra. Decompositions let us reduce complexity, remove noise, and convert messy datasets into manageable, interpretable structure. If AI is a mystery novel, matrix decompositions hand you the magnifying glass.
Want next steps? Try this mini-exercise: take the Iris dataset, center it, compute SVD, plot variance explained vs k, and reconstruct the dataset from the top 2 singular values. Watch how much of the story survives — and what gets lost.
Final zinger: Break matrices apart not to destroy them, but to understand them so well you can build better things from the pieces.
Comments (0)
Please sign in to leave a comment.
No comments yet. Be the first to comment!