Linear Algebra Vectors — Your AI Feature Swiss Army Knife

"If features are the ingredients, vectors are the recipe." — Probably someone who loved takeout and linear algebra

You're coming from "AI Foundations and Problem Framing," so you already know how to pick a problem, sketch an experiment, and keep humans in the loop. Now we move from what to solve to how machines represent the stuff they solve it with. That "stuff" is almost always a vector.

What is a vector (without the scary math voice)?

Intuitively: a vector is an ordered list of numbers. Think of it as a row of labeled drawers: each drawer holds a feature.
Formally: a vector is an element of R^n — an n-dimensional space. It's both a direction and magnitude in that space.

Why we care in ML: Data points, model parameters, embeddings, gradients — all are vectors. When you flatten an image, embed a word, or store a user's profile, you're writing vectors.

Quick glossary (so we stop tripping over terms later)

Scalar: a single number (e.g., 3.14).
Vector: an array of numbers (e.g., [4, 0.5, -2]).
Matrix: a grid of numbers (collection of vectors).

Concept	Shape example	Role in ML
Scalar	1	Learning rate, loss value
Vector	(n,)	Single data point, weights, embedding
Matrix	(m, n)	Dataset (m samples × n features), weight layers

Basic operations — the toolkit you actually use

1) Addition

Add feature-wise. Useful when aggregating: residual connections, offsets.

v = [1, 2, 3]
w = [0, -1, 4]
v + w = [1, 1, 7]

2) Scalar multiplication

Scale a whole vector by a number. Shrink or stretch magnitudes.

2 * [1, -1] = [2, -2]

3) Dot product (aka inner product)

This is where geometry meets algebra.

Formula: a · b = sum(a_i * b_i)
Intuition: measures alignment between vectors. Big dot product = vectors point the same way.

Why it matters in ML:

Predicting with linear models: y = w · x + b
Similarity search in embeddings (cosine similarity is dot-product-based)
Many algorithms (kernels, attention) rely on dot products.

4) Norm (length)

||v|| = sqrt(v · v). Tells you how "big" the vector is.

5) Projection

Projecting vector x onto vector u finds the component of x that points along u. Useful for decomposition, orthogonalization, and understanding explained variance.

Projection formula onto unit u: proj_u(x) = (x · u) u

Geometric intuition (yes, please)

Imagine 2D: vectors are arrows on the floor. The dot product tells you whether arrows point similarly (acute angle) or oppositely (obtuse). Norm is arrow length. Projection is dropping a shadow of one arrow onto another.

In higher dimensions (say 300 for word embeddings), your brain wants pizza; your code wants vectors. Geometry still holds but in a space your imagination can't visit.

Vector examples in ML — make it concrete

Image: a 28×28 grayscale image as a vector of length 784 (flattened). Each pixel is a feature.
User profile: [age, #purchases, avg_rating, last_login_days] — small vector.
Word embedding: word2vec/GloVe gives 50–300 dimensional vectors capturing semantics.
Model weights: layer weights are matrices; a single neuron's weights are a vector.

Why this matters for problem framing:

When you framed your ML problem, you decided what to measure. Those measurements are the components of your vectors.
When designing experiments, documenting preprocessing (normalization, imputation) matters because it changes vector values — and thus model behavior.
When keeping humans in the loop, remember: humans alter labels and features; that changes vectors and the downstream geometry.

Little NumPy demo (because code is therapy)

import numpy as np
x = np.array([2.0, -1.0, 3.0])
w = np.array([0.5, 0.5, 0.0])

# dot product
print('w·x =', w.dot(x))

# norm
print('||x|| =', np.linalg.norm(x))

# projection of x onto w (make w unit first)
w_unit = w / np.linalg.norm(w)
proj = (x.dot(w_unit)) * w_unit
print('proj =', proj)

Run that, feel smarter.

Vector spaces, basis, independence — why they sneak into ML

Basis: a minimal set of vectors that can express any vector in the space. Features act like basis components — if they're redundant, you're wasting capacity.
Linear independence: no feature should be a perfect linear combo of others (multicollinearity). Otherwise, your models get confused and coefficients blow up.

Practical ML takeaways:

Feature engineering is literally choosing a basis.
PCA finds a new orthogonal basis that explains variance — good for compression and denoising.

Common misunderstandings (you aren't alone)

"Vectors are only for math nerds." No — they're the universal data format for ML.
"Higher-dimensional vectors are 'more powerful'." Not automatically. Curse of dimensionality exists. More dims can mean more noise.
"Dot product = multiplication" — yes and no. It's coordinate-wise multiply-and-sum, which encodes similarity, not just number multiplication.

How this ties back to your earlier course modules

When you framed your problem, you implicitly chose which vector elements (features) to keep. Good framing = good vectors.
Documentation practices become critical: changing normalization or tokenization changes vector values. Log them.
Human-in-the-loop systems often alter vectors (e.g., annotators add categorical labels that get one-hot encoded); track those transformations for reproducibility.
When reading research, watch for how authors construct vectors (embedding size, normalization, PCA) — small choices change results.

Final bite-sized takeaways (slam them on a sticky note)

Vectors = the lingua franca of ML. Every data point, parameter, and signal is a vector.
Dot products measure alignment; norms measure size; projections decompose. These are the mental tools for reasoning about models.
Good feature design = good vector design. Choose informative, non-redundant components.
Document preprocessing. Small changes to vector construction change downstream behavior.

Next up: matrices and linear transformations — we'll turn these vectors into functions, layers, and the machines that actually learn.

Version_name: "Vectors but Make It Vivid"

Math for Machine Learning

Content