Fundamentals of Machine Learning
Understand the core principles of machine learning, a subset of AI, and how it enables computers to learn from data.
Content
Key Algorithms
Versions:
Watch & Learn
AI-discovered learning video
Key Algorithms in Machine Learning — The Cheat Sheet That Feels Like a Friend
"Algorithms are just recipes, but with more math and fewer cookies." — Probably me, in sweatpants
You're not looking for philosophy; you're looking for tools. We already met what machine learning is in "Introduction to Artificial Intelligence" and peeked into unsupervised learning (Position 3) and reinforcement learning (Position 4). Now it's time to open the toolbox and figure out which shiny thing to use when your data throws a tantrum.
Why this matters (short & honest)
When someone asks "Which algorithm should I use?" they're really asking: What do I want to predict, how much data do I have, and how interpretable does the result need to be? Choosing an algorithm poorly is like using a sledgehammer to scrape peanut butter off toast. You'll get the job done but at what cost?
This guide gives you the mental map: what each key algorithm does, when it's good, when it sucks, and a one-line metaphor so you can pick with confidence.
The main cast (supervised & core unsupervised references)
We'll focus on algorithms you'll meet again and again. Quick shout-out: we touched on k-means and PCA in the unsupervised section — I'll reference them briefly to show how they contrast.
1) Linear Regression — "The Straight-Talkin' Predictor"
- Use when: Target is continuous (price, temperature), relationship roughly linear.
- Idea: Fit a line y = mx + b (or hyperplane for many features).
- Pros: Simple, interpretable, fast. Great baseline.
- Cons: Breaks on nonlinearities, sensitive to outliers.
- When to pick: You want explainability or a baseline.
Analogy: You're fitting a straight rail through a messy crowd and hoping people mostly line up behind it.
2) Logistic Regression — "Linear Model with a Yes/No Attitude"
- Use when: Binary classification (spam/not spam, churn/keep).
- Idea: Linear combination passed through a sigmoid to predict probability.
- Pros: Interpretable probabilities, fast, baseline for classification.
- Cons: Not great with complex decision boundaries.
Quick note: Despite the name, it's for classification, not regression. Naming conventions: the villain of many students' lives.
3) k-Nearest Neighbors (k-NN) — "The Neighborly Classifier"
- Use when: Small datasets, intuitive boundaries, or you want a lazy algorithm that memorizes.
- Idea: For a new point, look at k closest training points; majority vote (classification) or average (regression).
- Pros: Simple, no training time (just store data). Non-parametric: models arbitrary boundaries.
- Cons: Slow at inference for large data, sensitive to feature scales.
Pseudocode:
function predict(x_new, k):
find k training points closest to x_new
return majority_label_of(those k points)
Analogy: Asking your neighbors for advice — great in small towns, catastrophic in megacities.
4) Decision Trees — "If-This-Then-That, but Make It Recursive"
- Use when: You want interpretability and non-linear splits.
- Idea: Split features into branches that best separate targets (e.g., "age > 30?").
- Pros: Human-readable rules, handles mixed data types.
- Cons: Prone to overfitting, unstable (small data changes the tree a lot).
Decision trees are the base ingredient for the next algorithm...
5) Random Forests — "A Committee of Trees (Smarter Together)"
- Use when: You want strong performance with less tuning and still some interpretability (feature importance).
- Idea: Train many decision trees on bootstrapped samples and average/vote.
- Pros: Robust, less overfitting than single trees, handles many problems well.
- Cons: Less interpretable than a single tree, bigger memory.
Analogy: Instead of trusting one dramatic friend, you poll the whole friend group.
6) Support Vector Machines (SVM) — "Margin Maximizers"
- Use when: Medium-sized datasets, clear margins between classes, or you want a powerful boundary with kernels.
- Idea: Find the boundary that maximizes the margin between classes; use kernels to be nonlinear.
- Pros: Effective in high dimensions, robust to overfitting with proper kernel/C.
- Cons: Slow on large datasets, kernel selection confusing.
Think of SVM as drawing the widest possible moat between kingdoms.
7) Gradient Boosting Machines (XGBoost, LightGBM, CatBoost) — "The Performance Obsession"
- Use when: You want top-tier performance on structured/tabular data.
- Idea: Sequentially train weak learners (usually trees), each one fixing previous errors.
- Pros: State-of-the-art for many tabular tasks, can handle missing values, flexible.
- Cons: More hyperparameters, risk of overfitting if misused.
If Random Forest is a committee, boosting is the bootcamp: each new recruit corrects the last recruit’s mistakes.
8) Neural Networks (Deep Learning) — "The Universal Approximation Party"
- Use when: Lots of data, complex patterns (images, text, audio).
- Idea: Layers of neurons (linear + nonlinear activations) learn hierarchical features.
- Pros: Extremely flexible and powerful for unstructured data.
- Cons: Data-hungry, hard to interpret, needs compute.
We built the conceptual foundation for neural nets back in the "Introduction to AI" module; treat them like powerful, noisy Swiss Army knives.
9) k-Means & PCA (unsupervised tie-in reminders)
- k-Means: Partition data into k clusters — great for quick segmentation (we covered this in Position 3).
- PCA: Dimensionality reduction — compress features while retaining variance.
Use PCA before k-NN or SVM if the dimensionality makes neighbors unreliable or training slow.
Quick comparison table (TL;DR)
| Algorithm | Good for | Pros | Cons |
|---|---|---|---|
| Linear Regression | Regression | Simple, interpretable | Can't model nonlinearities |
| Logistic Regression | Binary classification | Probabilities, baseline | Linear boundaries |
| k-NN | Intuitive classification | No training, flexible | Slow inference, needs scaled features |
| Decision Trees | Interpretable rules | Handles mixed data | Overfits easily |
| Random Forest | General-purpose | Robust, less overfit | Larger, less interpretable |
| SVM | Margin-based classification | Effective in high-dim | Slow on large n |
| Gradient Boosting | Tabular data | State-of-art | Many hyperparams |
| Neural Networks | Images, text, audio | Extremely flexible | Data & compute hungry |
How to choose, step-by-step
- Define your task: regression, binary/multiclass classification, clustering.
- Start simple: linear/logistic or decision tree baseline.
- Consider data size: small → k-NN, SVM; large → boosting, neural nets.
- Consider interpretability: if needed, prefer linear models or trees.
- If unsure, try Random Forest or XGBoost as a strong baseline for tabular data.
Ask yourself: "Do I care about explanation or just accuracy?" That single question saves hours of tuning.
Parting insight (the truth bomb)
No algorithm is magic. The real power is in understanding your data: features, noise, and the problem framing. Algorithms are tools — beautiful, weird tools — but they obey the garbage-in-garbage-out rule.
Next step: Try a simple pipeline — clean data, baseline model (logistic/linear), evaluate, then iterate with a stronger algorithm (random forest or XGBoost). Revisit unsupervised tools (k-means/PCA) if feature engineering could help.
Key takeaways
- Start simple. Baselines teach you the landscape.
- Match algorithm to data and goal. Accuracy vs. interpretability is a trade-off.
- Tree ensembles and boosting are your go-to for tabular data. Neural nets for unstructured data.
- Use unsupervised techniques (from Position 3) for feature insight and compression. Use reinforcement insights (Position 4) when outcomes depend on sequential decisions.
Want a tiny homework challenge? Pick a dataset (e.g., housing prices or Titanic), train a linear model, then a random forest. Compare errors and explain why one beats the other.
Version note: We're building a mental map — not memorizing a menu. Once you can explain why an algorithm fails on your data, you actually understand it.
Comments (0)
Please sign in to leave a comment.
No comments yet. Be the first to comment!