Math for Machine Learning
Build the mathematical foundation in linear algebra, calculus, probability, and statistics for ML.
Content
Matrices and Operations
Versions:
Watch & Learn
AI-discovered learning video
Matrices and Operations — The Matrix: Less Hollywood, More Homework (but just as dramatic)
"If vectors are arrows, then matrices are arrows that run a very organized corporate board meeting." — Your slightly unhinged TA
You already met vectors in the previous Linear Algebra module: lists of numbers that point, scale, and let us describe features. Now we level up. Matrices are the workhorses in ML: they store datasets, contain weights, and represent linear transformations. Since you also read about problem framing and documentation practices, consider matrices the rigorous notes that keep your experiments reproducible and your collaborators less confused.
What is a matrix (without the textbook sleep-inducing language)?
- Definition (short): A matrix is a rectangular array of numbers organized in rows and columns.
- Notation: Usually denoted by capital letters like A, W, X. Shape is written as m × n (m rows, n columns).
Imagine a spreadsheet where each row is an example and each column is a feature. That spreadsheet? That's your dataset matrix X (often).
Why matrices matter in ML (practical sense)
- Datasets: X (n_samples × n_features)
- Mini-batches: smaller matrices for SGD
- Model weights: e.g., a fully connected layer has a weight matrix W
- Transformations: multiplying a vector by a matrix rotates/scales it — exactly what linear layers do
When your code breaks with "shapes not aligned", it’s not dramatic irony — it’s a matrix telling you to check your documentation.
Core matrix operations — the toolbox (with friendly examples)
1) Shape and indexing
- A matrix A with shape (m, n) has m rows and n columns. Python (NumPy) example:
import numpy as np
A = np.array([[1,2,3], [4,5,6]]) # shape (2,3)
A.shape # (2, 3)
A[0,2] # 3 (row 0, column 2)
2) Element-wise addition and scalar multiplication
- Addition: A + B requires identical shapes. Think: adding two spreadsheets cell-by-cell.
- Scalar multiply: 2 * A multiplies every element by 2.
B = np.ones_like(A)
A + B
2 * A
3) Transpose (A^T)
- Flips rows and columns: shape (m,n) -> (n,m).
- Useful when you need to switch between examples-as-rows and examples-as-columns.
A.T # transpose
4) Dot product and matrix multiplication (the big one)
- Vector dot: u · v gives a scalar (if same length).
- Matrix multiplication: A (m × k) @ B (k × n) -> result (m × n).
This is how linear layers compute outputs: y = X @ W + b.
X = np.array([[1,2], [3,4]]) # shape (2,2)
W = np.array([[1],[0]]) # shape (2,1)
Y = X @ W # shape (2,1)
Common error: shapes not aligning. Always check inner dimensions match.
5) Identity matrix and inverses
- Identity I_n acts like 1 for matrices: I @ A = A.
- Inverse A^{-1} exists only for square, full-rank matrices. A @ A^{-1} = I.
I = np.eye(3)
np.linalg.inv(np.array([[1,2],[3,4]])) # if invertible
6) Determinant and rank (diagnostics)
- Determinant: scalar giving volume scaling; zero determinant => matrix not invertible.
- Rank: number of independent rows/columns.
np.linalg.det(A)
np.linalg.matrix_rank(A)
Quick reference table — Operations and ML intuition
| Operation | Notation | When you see it in ML | Intuition |
|---|---|---|---|
| Matrix multiply | A @ B | Forward pass, linear transforms | Apply linear transformation to data |
| Transpose | A^T | Covariance, gradients | Switch rows↔columns, change perspective |
| Inverse | A^{-1} | Solving linear systems (rare in large ML) | Undo transformation |
| Determinant | det(A) | Sometimes in probabilistic models | Volume scaling of transformation |
| Rank | rank(A) | Dataset redundancy | Number of independent features |
Real-world examples & analogies (because metaphors stick)
- Dataset matrix X (n × d): each row is a student, each column is a quiz. Multiply by weight vector w (d × 1) to get predicted scores.
- Matrix as a function: A maps input vectors to outputs. Think of A as a machine that takes ingredients (input) and produces cookies (output) — different machines produce different cookies.
- Rank: If the rank of your dataset matrix is low, some features are redundant — like bringing two identical DJs to a party.
Numerical stability and pragmatic notes (from experiments & docs)
- Avoid computing inverses for training large models. Use linear solvers (e.g., np.linalg.solve) or iterative methods. It's more stable and faster.
- Always log matrix shapes, especially in experiments. Clear shape annotation in your notebooks and docs saves future you and your collaborators a week of debugging.
- Use regularization when matrices get close to singular (near-zero determinant).
Documentation practice: write a one-liner in your experiment README: "X shape = (N, D); W shape = (D, C); output shape = (N, C)." Your future collaborators will worship you.
Quick computational checklist (when you build a model)
- Confirm X shape: (n_samples, n_features)
- Confirm weight shape for classic linear layer: (n_features, n_outputs)
- Use X @ W + b; ensure b broadcasts correctly (shape (n_outputs,)).
- If something fails, print shapes. If still failing, read documentation then email a colleague with shapes attached.
Common misunderstandings (and how to avoid them)
- "Why not just invert W?" — For large matrices it's expensive and numerically unstable. Prefer solvers or gradient-based methods.
- "Transpose vs inverse" — They are not the same. Transpose flips axes; inverse undoes a transformation.
- "Element-wise * vs matrix multiply @" — * is element-wise in NumPy; @ is linear algebra multiply. Mix these up and you get subtle bugs.
Closing — TL;DR + next steps
- Matrices = organized tables that encode linear transformations.
- Operations: addition, scalar multiplication, transpose, dot/matrix multiply, inverse, determinant, rank.
- In ML, matrices represent datasets (X), weights (W), and transformations (A). Always mind shapes.
Key practice: In your next experiment notebook, add a tiny header block showing the shapes of major arrays and a one-line comment why each is that shape. It's boring, but it prevents chaos.
Final thought: if vectors are the compass arrows showing direction, matrices are the map. Learn to read the map well — it's how you go from "this might work" to "this actually trains."
Next up: eigenvalues and eigenvectors — the secret sauce behind PCA and why some directions in data matter more than others. Spoiler: it's about finding the loudest DJs in the feature party.
Comments (0)
Please sign in to leave a comment.
No comments yet. Be the first to comment!