jypi
  • Explore
ChatWays to LearnMind mapAbout

jypi

  • About Us
  • Our Mission
  • Team
  • Careers

Resources

  • Ways to Learn
  • Mind map
  • Blog
  • Help Center
  • Community Guidelines
  • Contributor Guide

Legal

  • Terms of Service
  • Privacy Policy
  • Cookie Policy
  • Content Policy

Connect

  • Twitter
  • Discord
  • Instagram
  • Contact Us
jypi

© 2026 jypi. All rights reserved.

Supervised Machine Learning: Regression and Classification
Chapters

1Foundations of Supervised Learning

Supervised vs Unsupervised vs ReinforcementInputs, Targets, and Hypothesis SpaceBias–Variance Trade-offUnderfitting and OverfittingEmpirical Risk MinimizationLoss Functions OverviewProbabilistic Perspective of Supervised LearningOptimization Basics for MLGradient Descent and VariantsStochasticity and Mini-batchingEvaluation vs Training ObjectivesData Leakage PitfallsReproducibility and Random SeedsProblem Framing: Regression vs ClassificationTypes of Supervision and Labels

2Data Wrangling and Feature Engineering

3Exploratory Data Analysis for Predictive Modeling

4Train/Validation/Test and Cross-Validation Strategies

5Regression I: Linear Models

6Regression II: Regularization and Advanced Techniques

7Classification I: Logistic Regression and Probabilistic View

8Classification II: Thresholding, Calibration, and Metrics

9Distance- and Kernel-Based Methods

10Tree-Based Models and Ensembles

11Handling Real-World Data Issues

12Dimensionality Reduction and Feature Selection

13Model Tuning, Pipelines, and Experiment Tracking

14Model Interpretability and Responsible AI

15Deployment, Monitoring, and Capstone Project

Courses/Supervised Machine Learning: Regression and Classification/Foundations of Supervised Learning

Foundations of Supervised Learning

14120 views

Core concepts, goals, trade-offs, and terminology that underpin regression and classification.

Content

2 of 15

Inputs, Targets, and Hypothesis Space

Hypothesis Space: The Chaotic Kitchen of ML
2515 views
beginner
humorous
machine learning
visual
gpt-5-mini
2515 views

Versions:

Hypothesis Space: The Chaotic Kitchen of ML

Watch & Learn

AI-discovered learning video

YouTube

Start learning for free

Sign up to save progress, unlock study materials, and track your learning.

  • Bookmark content and pick up later
  • AI-generated study materials
  • Flashcards, timelines, and more
  • Progress tracking and certificates

Free to join · No credit card required

Foundations of Supervised Learning — Inputs, Targets, and Hypothesis Space

You already know the difference between supervised, unsupervised, and reinforcement learning. Good. Now let’s get our hands dirty with the nuts and bolts that actually make supervised learning work: the inputs (features), the targets (labels), and the hypothesis space (the universe of functions our algorithm is allowed to consider).


Why this matters (quick reminder)

You learned earlier that supervised learning is about learning a mapping from observations to outcomes using labeled examples. That statement hides three huge questions we now unpack:

  1. What are we observing? (Inputs)
  2. What are we predicting? (Targets)
  3. What kind of mappings are we allowed to consider? (Hypothesis space)

Get these three wrong (or sloppy) and you’ll get models that are confused, overconfident, or quietly useless.


1) Inputs (aka features, covariates, X)

Definition: The input space is the set of all possible observations we feed into our model. Usually denoted X (uppercase) for the space, and x (lowercase) for a single example.

  • Typical forms: numeric vectors, images, text, categorical variables, time series.
  • Practical issues: missing values, scaling, encoding, correlated features, and feature engineering.

Analogy: Inputs are the ingredients. If you give the chef rotten avocados, you can’t expect a Michelin-level guacamole no matter how skilled the chef is.

Questions to ask about inputs:

  • Is each feature meaningful for the task? (garbage in → garbage out)
  • Are features on wildly different scales? (standardize/normalize)
  • Do I need to create new features? (polynomials, interactions)

2) Targets (aka labels, y)

Definition: The target is the quantity we want to predict. It lives in the output space Y. Usually denoted y.

Types of targets:

  • Regression: Continuous y (house price, temperature)
  • Classification: Discrete y (spam/not spam, dog breed)
  • Structured outputs: Sequences, images, graphs (harder, but still supervised)

Important subtleties:

  • Label noise: humans make mistakes. Your model might just learn human inconsistency.
  • Imbalanced classes: if 99% of examples are class A, accuracy becomes a liar. Use precision/recall, AUC, or resampling.

Analogy: Targets are the recipe you aim to cook. If the recipe says cake but you actually want cookies, your chef will comply but you’ll be miserable.


3) Hypothesis Space (aka hypothesis class)

Definition: The hypothesis space H is the set of functions f : X → Y that our learning algorithm can pick from. When we say we’re “training a model,” we’re searching H for the best f according to some loss on the data.

Notation example:

H = { f_theta(x) : theta in Theta }

This means: our hypotheses are parameterized by some theta values in parameter space Theta.

Why it’s the real star (and villain)

  • If H is too small (low capacity), no function in H fits the true relationship → underfitting.
  • If H is huge (high capacity), you're flexible enough to fit noise → overfitting.

This tradeoff is the backbone of model selection.

Common hypothesis spaces

Hypothesis class Typical representation Capacity When it's useful
Linear models f(x)=w^T x + b Low When relationships are roughly linear; interpretable
Decision trees Tree of splits Medium Nonlinear interactions; tabular data
k-NN Instance memory + distance Variable (grows with data) Simple, nonparametric, sensitive to noise
Neural networks Layered nonlinearities High Complex patterns (images, audio), big data

A rule of thumb: pick the simplest H that can express the patterns you need.


Hypothesis space, loss, and learning — the holy trinity

Learning = searching H to minimize expected loss L(y, f(x)). In practice we minimize empirical loss plus regularization:

f_hat = argmin_{f in H} (1/n) sum_i L(y_i, f(x_i)) + lambda * R(f)

  • The loss (e.g., MSE, cross-entropy) ties hypotheses to what we care about.
  • Regularization (R) restricts effective hypothesis complexity (penalize big weights, tree depth, etc.).

Think of regularization as a leash: your hypothesis class might be a caffeinated greyhound, and regularization is the sensible owner holding the leash so the dog doesn’t sprint after every squirrel (noise) it sees.


Inductive bias and why we need it

Every learning algorithm carries assumptions — inductive bias. Without bias, learning is impossible (no free lunch theorem says so). Examples:

  • Linear models assume linearity.
  • k-NN assumes similar inputs → similar outputs (locality).
  • Neural nets assume compositional hierarchical features.

Bias vs. variance: the joke that’s also a theorem. High-bias models underfit; high-variance models overfit. Good learning finds the sweet spot.


Practical checklist — before you start training

  • Define input space X clearly (raw features, transformations).
  • Define target space Y and proper evaluation metric (accuracy, RMSE, F1...).
  • Choose an initial H that matches problem complexity.
  • Think about regularization and validation (cross-validation).
  • Ask: is the data representative of the world you’ll use the model in? If not, no hypothesis in H will fix that.

Tiny pseudocode: what learning looks like

Given: data (X, Y), hypothesis class H, loss L, regularizer R
for candidate f in H:    # in practice, we optimize parameters rather than enumerate
    compute empirical_loss = mean(L(y_i, f(x_i)))
    objective = empirical_loss + lambda * R(f)
select f_hat that minimizes objective
return f_hat

Closing — big picture and takeaways

  • Inputs are your ingredients. Clean them. Engineer them. Respect them.
  • Targets are what you’re trying to bake. Make sure the recipe is correct and measurable.
  • Hypothesis space is the kitchen rules: what tools and recipes your algorithm can use. Too small and you starve; too big and you eat the entire pantry and regret everything.

Final mic-drop insight:

The model you get is not just a product of the data — it's the interaction of data, the hypothesis space you pick, the loss you care about, and the inductive biases you accept. Tweak any one of these and the learned function changes. Treat them all like they’re alive.

Next up: we’ll see how specific choices of hypothesis class (linear vs tree vs neural net) behave on real data — and why sometimes a humble linear model beats a flashy deep net. Spoiler: it’s not about glamour; it’s about fit.

Flashcards
Mind Map
Speed Challenge

Comments (0)

Please sign in to leave a comment.

No comments yet. Be the first to comment!

Ready to practice?

Sign up now to study with flashcards, practice questions, and more — and track your progress on this topic.

Study with flashcards, timelines, and more
Earn certificates for completed courses
Bookmark content for later reference
Track your progress across all topics