jypi
  • Explore
ChatWays to LearnMind mapAbout

jypi

  • About Us
  • Our Mission
  • Team
  • Careers

Resources

  • Ways to Learn
  • Mind map
  • Blog
  • Help Center
  • Community Guidelines
  • Contributor Guide

Legal

  • Terms of Service
  • Privacy Policy
  • Cookie Policy
  • Content Policy

Connect

  • Twitter
  • Discord
  • Instagram
  • Contact Us
jypi

© 2026 jypi. All rights reserved.

Supervised Machine Learning: Regression and Classification
Chapters

1Foundations of Supervised Learning

Supervised vs Unsupervised vs ReinforcementInputs, Targets, and Hypothesis SpaceBias–Variance Trade-offUnderfitting and OverfittingEmpirical Risk MinimizationLoss Functions OverviewProbabilistic Perspective of Supervised LearningOptimization Basics for MLGradient Descent and VariantsStochasticity and Mini-batchingEvaluation vs Training ObjectivesData Leakage PitfallsReproducibility and Random SeedsProblem Framing: Regression vs ClassificationTypes of Supervision and Labels

2Data Wrangling and Feature Engineering

3Exploratory Data Analysis for Predictive Modeling

4Train/Validation/Test and Cross-Validation Strategies

5Regression I: Linear Models

6Regression II: Regularization and Advanced Techniques

7Classification I: Logistic Regression and Probabilistic View

8Classification II: Thresholding, Calibration, and Metrics

9Distance- and Kernel-Based Methods

10Tree-Based Models and Ensembles

11Handling Real-World Data Issues

12Dimensionality Reduction and Feature Selection

13Model Tuning, Pipelines, and Experiment Tracking

14Model Interpretability and Responsible AI

15Deployment, Monitoring, and Capstone Project

Courses/Supervised Machine Learning: Regression and Classification/Foundations of Supervised Learning

Foundations of Supervised Learning

14120 views

Core concepts, goals, trade-offs, and terminology that underpin regression and classification.

Content

4 of 15

Underfitting and Overfitting

The No-Chill Breakdown: Underfitting vs Overfitting
1343 views
intermediate
humorous
visual
science
gpt-5-mini
1343 views

Versions:

The No-Chill Breakdown: Underfitting vs Overfitting

Watch & Learn

AI-discovered learning video

Sign in to watch the learning video for this topic.

Sign inSign up free

Start learning for free

Sign up to save progress, unlock study materials, and track your learning.

  • Bookmark content and pick up later
  • AI-generated study materials
  • Flashcards, timelines, and more
  • Progress tracking and certificates

Free to join · No credit card required

Underfitting and Overfitting — The Goldilocks Problem of Supervised Learning

"Models are like roommates: too simple, they do nothing; too complicated, they throw wild parties in your dataset. You want the one who cleans the dishes and respects your privacy." — Your slightly dramatic ML TA


Hook: Why this matters (and why your model might secretly be trash)

You trained a model, it got 99% accuracy on the training data, and then tanked on new examples. Or maybe it can't even fit the training set well and performs poorly everywhere. Both are signs of poor generalization. In the previous sections we already met two important neighbors of this problem:

  • Inputs, Targets, and Hypothesis Space — which taught us that the capacity of the hypothesis space determines what functions the model can represent.
  • Bias–Variance Trade-off — which showed that expected error decomposes into bias, variance, and irreducible noise.

Underfitting and overfitting are the everyday faces of those theoretical ideas. Let's make them uncomfortably practical.


Definitions (crisp, like a scalpel)

  • Underfitting: The model is too simple for the underlying pattern in the data. It can't achieve low training error. This is high bias. Think: a linear model trying to fit a spiral.

  • Overfitting: The model is too flexible and learns noise or idiosyncrasies in the training set. Training error is low, but validation/test error is high. This is high variance. Think: a 50th-degree polynomial on 20 points.

Under- vs. over-fitting is basically a struggle between "I won't learn enough" and "I'll learn everything, including the weird stuff." Balance is the goal.


Diagnostics: How to tell what’s going wrong

Look at training vs validation/test error. Patterns tell stories:

  1. Training error high, validation error ≈ training error → Underfitting.
  2. Training error low, validation error high → Overfitting.
  3. Both low → Success.

Learning curves (visual rule-of-thumb)

  • Plot error vs number of training examples for both training and validation sets.

  • Typical shapes:

    • Underfitting: both errors high and converge.
    • Overfitting: training error low, validation error high; validation error decreases as more data is added (often) because more data reduces variance.

ASCII sketch:

Error
|
|   Underfit:  -------\
|              \      \
| Overfit:      \      \__  validation
|                \_____\__ training
+-----------------------------> Data size

Causes (the rogues' gallery)

  • Hypothesis space capacity too small (e.g., linear model for nonlinear reality) → Underfit.
  • Hypothesis space capacity too large without constraints (e.g., deep trees, high-degree polynomials) → Overfit.
  • Too few training examples → makes complex models overfit easily.
  • Noisy labels or features → amplifies overfitting risk.
  • Poor feature engineering (irrelevant features add variance, missing important features increase bias).

Remember: capacity comes from model architecture, feature transformations, and hyperparameters (e.g., tree depth, number of neurons, polynomial degree).


Fixes: From blunt instruments to surgical strikes

For underfitting (increase flexibility / reduce bias)

  • Use a richer hypothesis space: increase polynomial degree, add layers/neurons, or use a more expressive model.
  • Add relevant features / do feature engineering.
  • Reduce regularization (lower λ).
  • Train longer if optimization isn't converged.

For overfitting (decrease variance / add constraints)

  • Regularization: L2 (Ridge), L1 (Lasso) — penalize large weights.

    • Ridge objective (MSE + L2):
    J(w) = \frac{1}{n}\sum_{i=1}^n (y_i - \hat{y}_i)^2 + \lambda \|w\|_2^2
    
    • Lasso uses (|w|_1) and can produce sparse solutions (feature selection).
  • Get more data (if possible) — often the cleanest solution.

  • Reduce model capacity: prune trees, reduce polynomial degree, decrease network size.

  • Early stopping on validation error during training (common in neural nets).

  • Use dropout, batch normalization, or other model-specific techniques.

  • Ensemble methods (bagging reduces variance; boosting trades bias/variance differently).

Pro tip: Regularization is basically telling the model: "Less drama, please." It trades a bit of training fit for more real-world sanity.


Concrete examples (so it stops being abstract)

  • Linear regression vs polynomial regression: If the true relationship is quadratic and you use linear, you'll underfit. Use a high-degree polynomial and you'll overfit to noise.

  • Decision Trees: Small max_depth → underfit. Huge max_depth → overfit (memorizes leaves). Random Forest (bagging) reduces variance.

  • Neural Networks: Tiny network → underfit. Massive network with no regularization and little data → overfit.


Practical checklist for model debugging

  1. Plot train and validation errors. Which pattern matches?
  2. If underfitting: increase complexity or features; check optimization.
  3. If overfitting: add regularization, collect more data, or reduce complexity.
  4. Use cross-validation to pick hyperparameters (e.g., λ, tree depth).
  5. Examine residuals: structure means bias; random noise means variance or label noise.

Quick scikit-learn pseudocode for diagnosing via learning curves

from sklearn.model_selection import learning_curve
train_sizes, train_scores, val_scores = learning_curve(model, X, y, cv=5)
# plot mean(train_scores) and mean(val_scores) vs train_sizes

A short table: Underfit vs Overfit quick reference

Symptom Training Error Validation Error Typical Fixes
Underfitting High High (similar) Increase model capacity, add features, reduce regularization
Overfitting Low High More data, stronger regularization, reduce capacity, ensembling

Closing: The philosophical touchdown

Balancing underfitting and overfitting is the practical side of the Bias–Variance Trade-off and a direct consequence of the hypothesis space you picked earlier. Your model should be just expressive enough to capture the signal, but not so flexible that it learns the dataset's mood swings.

Final thought: don't trust a single number. Look at learning curves, validation behavior, and remember Occam's Razor — simpler models often win in the real world. Now go forth and make something that generalizes, not something that performs a circus for your training set.

"Generalization: it's less about being right about past data and more about behaving well with strangers." — That one wise dataset


Summary of key takeaways

  • Underfitting = high bias; overfitting = high variance.
  • Use learning curves and train/validation error patterns to diagnose.
  • Fix underfitting by increasing capacity or features; fix overfitting by regularization, more data, or simpler models.
  • Always validate hyperparameters with cross-validation; never let the test set babysit your model selection.
Flashcards
Mind Map
Speed Challenge

Comments (0)

Please sign in to leave a comment.

No comments yet. Be the first to comment!

Ready to practice?

Sign up now to study with flashcards, practice questions, and more — and track your progress on this topic.

Study with flashcards, timelines, and more
Earn certificates for completed courses
Bookmark content for later reference
Track your progress across all topics