Supervised Machine Learning: Regression and Classification

Chapters

1Foundations of Supervised Learning

Supervised vs Unsupervised vs Reinforcement Inputs, Targets, and Hypothesis Space Bias–Variance Trade-off Underfitting and Overfitting Empirical Risk Minimization Loss Functions Overview Probabilistic Perspective of Supervised Learning Optimization Basics for ML Gradient Descent and Variants Stochasticity and Mini-batching Evaluation vs Training Objectives Data Leakage Pitfalls Reproducibility and Random Seeds Problem Framing: Regression vs Classification Types of Supervision and Labels

2Data Wrangling and Feature Engineering

3Exploratory Data Analysis for Predictive Modeling

4Train/Validation/Test and Cross-Validation Strategies

5Regression I: Linear Models

6Regression II: Regularization and Advanced Techniques

7Classification I: Logistic Regression and Probabilistic View

8Classification II: Thresholding, Calibration, and Metrics

9Distance- and Kernel-Based Methods

10Tree-Based Models and Ensembles

11Handling Real-World Data Issues

12Dimensionality Reduction and Feature Selection

13Model Tuning, Pipelines, and Experiment Tracking

14Model Interpretability and Responsible AI

15Deployment, Monitoring, and Capstone Project

Courses/Supervised Machine Learning: Regression and Classification/Foundations of Supervised Learning

Foundations of Supervised Learning

14132 views

Core concepts, goals, trade-offs, and terminology that underpin regression and classification.

Content

4 of 15

Underfitting and Overfitting

The No-Chill Breakdown: Underfitting vs Overfitting

1346 views

intermediate

humorous

visual

science

gpt-5-mini

1346 views

Versions:

The No-Chill Breakdown: Underfitting vs Overfitting

Watch & Learn

AI-discovered learning video

Start learning for free

Bookmark content and pick up later
AI-generated study materials
Flashcards, timelines, and more
Progress tracking and certificates

Free to join · No credit card required

Underfitting and Overfitting — The Goldilocks Problem of Supervised Learning

"Models are like roommates: too simple, they do nothing; too complicated, they throw wild parties in your dataset. You want the one who cleans the dishes and respects your privacy." — Your slightly dramatic ML TA

Hook: Why this matters (and why your model might secretly be trash)

You trained a model, it got 99% accuracy on the training data, and then tanked on new examples. Or maybe it can't even fit the training set well and performs poorly everywhere. Both are signs of poor generalization. In the previous sections we already met two important neighbors of this problem:

Inputs, Targets, and Hypothesis Space — which taught us that the capacity of the hypothesis space determines what functions the model can represent.
Bias–Variance Trade-off — which showed that expected error decomposes into bias, variance, and irreducible noise.

Underfitting and overfitting are the everyday faces of those theoretical ideas. Let's make them uncomfortably practical.

Definitions (crisp, like a scalpel)

Underfitting: The model is too simple for the underlying pattern in the data. It can't achieve low training error. This is high bias. Think: a linear model trying to fit a spiral.
Overfitting: The model is too flexible and learns noise or idiosyncrasies in the training set. Training error is low, but validation/test error is high. This is high variance. Think: a 50th-degree polynomial on 20 points.

Under- vs. over-fitting is basically a struggle between "I won't learn enough" and "I'll learn everything, including the weird stuff." Balance is the goal.

Diagnostics: How to tell what’s going wrong

Look at training vs validation/test error. Patterns tell stories:

Training error high, validation error ≈ training error → Underfitting.
Training error low, validation error high → Overfitting.
Both low → Success.

Learning curves (visual rule-of-thumb)

Plot error vs number of training examples for both training and validation sets.
Typical shapes:
- Underfitting: both errors high and converge.
- Overfitting: training error low, validation error high; validation error decreases as more data is added (often) because more data reduces variance.

ASCII sketch:

Error
|
|   Underfit:  -------\
|              \      \
| Overfit:      \      \__  validation
|                \_____\__ training
+-----------------------------> Data size

Causes (the rogues' gallery)

Hypothesis space capacity too small (e.g., linear model for nonlinear reality) → Underfit.
Hypothesis space capacity too large without constraints (e.g., deep trees, high-degree polynomials) → Overfit.
Too few training examples → makes complex models overfit easily.
Noisy labels or features → amplifies overfitting risk.
Poor feature engineering (irrelevant features add variance, missing important features increase bias).

Remember: capacity comes from model architecture, feature transformations, and hyperparameters (e.g., tree depth, number of neurons, polynomial degree).

Fixes: From blunt instruments to surgical strikes

For underfitting (increase flexibility / reduce bias)

Use a richer hypothesis space: increase polynomial degree, add layers/neurons, or use a more expressive model.
Add relevant features / do feature engineering.
Reduce regularization (lower λ).
Train longer if optimization isn't converged.

For overfitting (decrease variance / add constraints)

Regularization: L2 (Ridge), L1 (Lasso) — penalize large weights.
- Ridge objective (MSE + L2):
```
J(w) = \frac{1}{n}\sum_{i=1}^n (y_i - \hat{y}_i)^2 + \lambda \|w\|_2^2
```
- Lasso uses (|w|_1) and can produce sparse solutions (feature selection).
Get more data (if possible) — often the cleanest solution.
Reduce model capacity: prune trees, reduce polynomial degree, decrease network size.
Early stopping on validation error during training (common in neural nets).
Use dropout, batch normalization, or other model-specific techniques.
Ensemble methods (bagging reduces variance; boosting trades bias/variance differently).

Pro tip: Regularization is basically telling the model: "Less drama, please." It trades a bit of training fit for more real-world sanity.

Concrete examples (so it stops being abstract)

Linear regression vs polynomial regression: If the true relationship is quadratic and you use linear, you'll underfit. Use a high-degree polynomial and you'll overfit to noise.
Decision Trees: Small max_depth → underfit. Huge max_depth → overfit (memorizes leaves). Random Forest (bagging) reduces variance.
Neural Networks: Tiny network → underfit. Massive network with no regularization and little data → overfit.

Practical checklist for model debugging

Plot train and validation errors. Which pattern matches?
If underfitting: increase complexity or features; check optimization.
If overfitting: add regularization, collect more data, or reduce complexity.
Use cross-validation to pick hyperparameters (e.g., λ, tree depth).
Examine residuals: structure means bias; random noise means variance or label noise.

Quick scikit-learn pseudocode for diagnosing via learning curves

from sklearn.model_selection import learning_curve
train_sizes, train_scores, val_scores = learning_curve(model, X, y, cv=5)
# plot mean(train_scores) and mean(val_scores) vs train_sizes

A short table: Underfit vs Overfit quick reference

Symptom	Training Error	Validation Error	Typical Fixes
Underfitting	High	High (similar)	Increase model capacity, add features, reduce regularization
Overfitting	Low	High	More data, stronger regularization, reduce capacity, ensembling

Closing: The philosophical touchdown

Balancing underfitting and overfitting is the practical side of the Bias–Variance Trade-off and a direct consequence of the hypothesis space you picked earlier. Your model should be just expressive enough to capture the signal, but not so flexible that it learns the dataset's mood swings.

Final thought: don't trust a single number. Look at learning curves, validation behavior, and remember Occam's Razor — simpler models often win in the real world. Now go forth and make something that generalizes, not something that performs a circus for your training set.

"Generalization: it's less about being right about past data and more about behaving well with strangers." — That one wise dataset

Summary of key takeaways

Underfitting = high bias; overfitting = high variance.
Use learning curves and train/validation error patterns to diagnose.
Fix underfitting by increasing capacity or features; fix overfitting by regularization, more data, or simpler models.
Always validate hyperparameters with cross-validation; never let the test set babysit your model selection.

Flashcards

Mind Map

Speed Challenge

Comments (0)

Please sign in to leave a comment.

No comments yet. Be the first to comment!

Ready to practice?

Study with flashcards, timelines, and more

Earn certificates for completed courses

Bookmark content for later reference

Track your progress across all topics