jypi
  • Explore
ChatWays to LearnMind mapAbout

jypi

  • About Us
  • Our Mission
  • Team
  • Careers

Resources

  • Ways to Learn
  • Mind map
  • Blog
  • Help Center
  • Community Guidelines
  • Contributor Guide

Legal

  • Terms of Service
  • Privacy Policy
  • Cookie Policy
  • Content Policy

Connect

  • Twitter
  • Discord
  • Instagram
  • Contact Us
jypi

© 2026 jypi. All rights reserved.

Supervised Machine Learning: Regression and Classification
Chapters

1Foundations of Supervised Learning

2Data Wrangling and Feature Engineering

3Exploratory Data Analysis for Predictive Modeling

4Train/Validation/Test and Cross-Validation Strategies

5Regression I: Linear Models

6Regression II: Regularization and Advanced Techniques

7Classification I: Logistic Regression and Probabilistic View

8Classification II: Thresholding, Calibration, and Metrics

9Distance- and Kernel-Based Methods

10Tree-Based Models and Ensembles

Decision Trees for RegressionDecision Trees for ClassificationImpurity and Splitting CriteriaPruning and Regularization of TreesHandling Missing Values in TreesRandom Forests EssentialsExtremely Randomized TreesGradient Boosting FundamentalsLearning Rate, Depth, and EstimatorsXGBoost, LightGBM, and CatBoostFeature Importance and PermutationPartial Dependence and ICE with TreesHandling Imbalanced Data with EnsemblesCalibration of Ensemble PredictionsStacking and Blending Strategies

11Handling Real-World Data Issues

12Dimensionality Reduction and Feature Selection

13Model Tuning, Pipelines, and Experiment Tracking

14Model Interpretability and Responsible AI

15Deployment, Monitoring, and Capstone Project

Courses/Supervised Machine Learning: Regression and Classification/Tree-Based Models and Ensembles

Tree-Based Models and Ensembles

25059 views

Learn interpretable trees and powerful ensembles like random forests and gradient boosting.

Content

1 of 15

Decision Trees for Regression

Tree Therapy: Regression Edition
3387 views
intermediate
humorous
science
visual
gpt-5-mini
3387 views

Versions:

Tree Therapy: Regression Edition

Watch & Learn

AI-discovered learning video

Sign in to watch the learning video for this topic.

Sign inSign up free

Start learning for free

Sign up to save progress, unlock study materials, and track your learning.

  • Bookmark content and pick up later
  • AI-generated study materials
  • Flashcards, timelines, and more
  • Progress tracking and certificates

Free to join · No credit card required

Decision Trees for Regression — The Tree That Predicts Your Rent (and Judges Your Life Choices)

"If kNN is the friendly neighbor who averages everyone’s opinion, and SVM is the strict bouncer carving a crisp boundary, decision trees are the extroverted realtor who divides the city into neighborhoods until prices look sensible."

You're coming from a world of distance- and kernel-based methods (kNN, SVM). Those approaches leaned on neighborhoods and smooth kernels to handle nonlinearity. Now we pivot to a different kind of local thinking: partition the feature space into chunks where the response behaves similarly, and then predict with a summary statistic (usually the mean). Welcome to regression trees.


Quick orientation (no rerun of old material)

You already know how locality (kNN) and margin/feature mappings (SVM) give nonlinear power. Trees use space partitioning instead: they split features into axis-aligned regions and fit a constant (or simple) model in each region. This makes them extremely interpretable, fast, and flexible — but also dramatic and sometimes a tad overconfident.


What is a regression tree, in plain and mildly theatrical English?

  • Definition (short): A regression tree recursively splits the feature space into disjoint regions and predicts the average target value in each final region (leaf).
  • Intuition: Imagine repeatedly slicing a pizza (feature space) along one topping at a time (features) until every slice tastes roughly the same (response variance low). The pizza chef? The CART algorithm.

The algorithm (CART for regression) — step-by-step

  1. Start with all training data in one node.
  2. For every candidate split (choose a feature and a cut value), compute how much the split reduces variance of the target.
  3. Pick the split that gives the largest variance reduction.
  4. Recurse on each child node until stopping criteria (max depth, min samples, or no improvement).
  5. The prediction at a leaf = mean(y) of training examples in that leaf.

The math behind the glamour: variance reduction

If node t has n_t observations and variance Var(t), and a split produces left child L and right child R, the impurity decrease (also called reduction in MSE) is:

Δ = Var(t) - (n_L/n_t) * Var(L) - (n_R/n_t) * Var(R)

We pick the split with the largest Δ. Simpler than some kernels, but surprisingly effective.


Pseudocode (because we like order)

function build_tree(data, depth=0):
    if stopping_condition(data, depth):
        return leaf(mean(targets))
    best_split = argmax_over_splits(variance_reduction)
    left, right = split(data, best_split)
    node.left = build_tree(left, depth+1)
    node.right = build_tree(right, depth+1)
    return node

Stopping conditions: max depth, min samples per leaf, or no split improves variance significantly.


Why use regression trees? (Pros & snarky metaphors)

  • Interpretability: You can follow the path: "If bedrooms >= 3 AND distance_to_subway < 1km THEN price ≈ $X". Like a decision checklist from your overbearing aunt.
  • Handles mixed data types: Numeric and categorical features live together happily — no need to scale.
  • Robust to outliers (to some degree): Leaves average targets, so single weird points can get isolated instead of poisoning a global model.
  • Fast inference and little preprocessing.

And the drawbacks (the tree’s kryptonite)

  • High variance: Small data changes can yield very different trees — unstable like a soap opera character.
  • Axis-aligned splits only: Trees partition along single features at a time; they don’t create diagonal decision boundaries unless you combine many splits.
  • Not smooth: Predictions jump from leaf to leaf — unlike kernel methods that produce smooth functions.

Comparing to kNN and SVR — quick table

Property kNN SVR / Kernel methods Regression Trees
Locality Neighborhood averaging Global via kernel transform Local via space partitioning
Smoothness Smooth (depends on k) Smooth (depends on kernel) Piecewise-constant (not smooth)
Interpretability Low Low-medium High
Handles mixed features Yes (but need distance choice) Usually needs numeric & scaled Yes, naturally
Robustness to noise Sensitive (k small) Controlled by C, epsilon Can overfit unless pruned

Ask yourself: do you want a smooth predictor or a readable rulebook? That determines much.


Practical knobs: controlling complexity and avoiding overfitting

  • Pre-pruning (early stopping): max_depth, min_samples_split, min_samples_leaf, max_leaf_nodes.

  • Post-pruning (cost-complexity pruning): Grow a big tree, then prune using a complexity cost:

    Cost(T) = RSS(T) + α * |leaves(T)|

    where α is the complexity parameter (higher α = more pruning). sklearn exposes this as ccp_alpha.

  • Cross-validation: Choose pruning parameter (e.g., ccp_alpha) with CV to trade bias vs variance.


Feature importance & interpretability tools

  • Gini/variance-based importance: Sum of impurity decreases for splits that use a feature. Convenient but biased toward features with many possible splits.
  • Permutation importance: Shuffle a feature and measure how much performance drops — a model-agnostic check.
  • Partial dependence plots (PDP): Show average model prediction while varying a feature — helps understand marginal effects.

Handling real-life annoyances

  • Missing values: CART can use surrogate splits (alternate features that mimic the primary split) or treat missing as a category.
  • Categorical variables: Trees handle them naturally; many implementations do binary splits for categories.
  • Heteroscedasticity & non-constant variance: Trees are flexible enough to isolate regions of different variance, but they don't model variance explicitly unless you augment them.

Worked example snapshot — predicting house prices

Imagine data: {num_bedrooms, sqft, distance_to_center}. The tree might first split on sqft > 1,200. In the left region (small houses), it might split on distance_to_center; in the right region (large houses), it might split on bedrooms. The leaf predictions are means of prices for examples reaching those leaves. No smoothing — just clear neighborhood-level rules.

Why might this beat kNN here? Because the tree gives simple rules that segment price drivers, rather than averaging across potentially irrelevant neighbors.


When to choose a regression tree (short checklist)

  • You want model interpretability/rules.
  • You have mixed feature types and minimal preprocessing time.
  • You need a fast, low-friction baseline.
  • You’re okay with piecewise predictions or you’ll wrap the tree in an ensemble (coming up next).

Next step (teaser)

Single trees are great, but they’re unstable. Ensembles (Random Forests, Gradient Boosting) combine many trees to reduce variance and improve accuracy — the next thrilling act in our course.


Key takeaways (memorize these like tiny dramatic revelations)

  • Regression trees partition feature space and predict leaf means. They're simple, interpretable, and can capture nonlinearity with axis-aligned splits.
  • Splits are chosen by variance reduction (ΔMSE). Prediction = mean(y) in leaf.
  • Main weaknesses: high variance and non-smooth predictions — but these are exactly why ensembles exist.

Parting thought: Trees give you readable, human-friendly rules. If you want clinical, smooth curves, reach for kernels. If you want a rulebook that fits your messy dataset like a glove (sometimes a patchwork glove), start with a tree and then ensemble it if it gets dramatic.

Flashcards
Mind Map
Speed Challenge

Comments (0)

Please sign in to leave a comment.

No comments yet. Be the first to comment!

Ready to practice?

Sign up now to study with flashcards, practice questions, and more — and track your progress on this topic.

Study with flashcards, timelines, and more
Earn certificates for completed courses
Bookmark content for later reference
Track your progress across all topics