jypi
  • Explore
ChatWays to LearnMind mapAbout

jypi

  • About Us
  • Our Mission
  • Team
  • Careers

Resources

  • Ways to Learn
  • Mind map
  • Blog
  • Help Center
  • Community Guidelines
  • Contributor Guide

Legal

  • Terms of Service
  • Privacy Policy
  • Cookie Policy
  • Content Policy

Connect

  • Twitter
  • Discord
  • Instagram
  • Contact Us
jypi

© 2026 jypi. All rights reserved.

Supervised Machine Learning: Regression and Classification
Chapters

1Foundations of Supervised Learning

2Data Wrangling and Feature Engineering

3Exploratory Data Analysis for Predictive Modeling

4Train/Validation/Test and Cross-Validation Strategies

5Regression I: Linear Models

6Regression II: Regularization and Advanced Techniques

7Classification I: Logistic Regression and Probabilistic View

Bernoulli and Binomial LikelihoodLink Functions and the LogitMaximum Likelihood EstimationRegularized Logistic RegressionDecision Boundaries and GeometryOne-vs-Rest and Multinomial LogisticClass Probability EstimationFeature Scaling and ConvergenceInterpreting Coefficients and Odds RatiosHandling Linearly Separable DataClass Weights and Cost-Sensitive LearningBaseline and Dummy ClassifiersNaive Bayes ClassifiersOverfitting in Logistic ModelsSparse High-Dimensional Settings

8Classification II: Thresholding, Calibration, and Metrics

9Distance- and Kernel-Based Methods

10Tree-Based Models and Ensembles

11Handling Real-World Data Issues

12Dimensionality Reduction and Feature Selection

13Model Tuning, Pipelines, and Experiment Tracking

14Model Interpretability and Responsible AI

15Deployment, Monitoring, and Capstone Project

Courses/Supervised Machine Learning: Regression and Classification/Classification I: Logistic Regression and Probabilistic View

Classification I: Logistic Regression and Probabilistic View

23088 views

Model class probabilities with logistic regression and related probabilistic classifiers.

Content

5 of 15

Decision Boundaries and Geometry

Geometry but Make It Dramatic
2847 views
intermediate
humorous
visual
machine learning
gpt-5-mini
2847 views

Versions:

Geometry but Make It Dramatic

Watch & Learn

AI-discovered learning video

Sign in to watch the learning video for this topic.

Sign inSign up free

Start learning for free

Sign up to save progress, unlock study materials, and track your learning.

  • Bookmark content and pick up later
  • AI-generated study materials
  • Flashcards, timelines, and more
  • Progress tracking and certificates

Free to join · No credit card required

Decision Boundaries and Geometry — Logistic Regression, But Make It Spatial

"Think of logistic regression as a polite bouncer who gives a probability, not a binary shove. The decision boundary is where they pause mid-judgment and say, ‘hmm… 50–50.’"


Hook: Why should we care about geometry here?

You already know from Maximum Likelihood Estimation (we did the math in the last chapter) that logistic regression fits weights to maximize the probability of the labels. And from Regularized Logistic Regression we learned how penalties like L2/L1 tame those weights. Great. But what does that look like on the plane? How does a vector of weights turn into a line, a curve, or a weirdly shaped region that separates cats from dogs (or spam from not-spam)? This is the geometry chapter — the part where algebra stops being dry and gets spatially dramatic.


Quick reminder (no heavy repeat): the probabilistic formula

For binary logistic regression we model

p(y{=}1 \mid x) = \sigma(\theta^T x) = \frac{1}{1 + e^{-\theta^T x}}

The decision boundary for threshold 0.5 is given by

\theta^T x = 0

That equation is everything. It's the cliff-edge where our model is indifferent.


The basic geometry: linear boundaries in feature space

  • In d-dimensional input space, the set {x : θ^T x = 0} is a (d−1)-dimensional hyperplane. In 2D it's a line, in 3D it's a plane, etc.
  • The weight vector θ is normal (perpendicular) to the decision hyperplane. That's the single most useful geometric fact.

Imagine θ as an arrow. The decision plane sits perpendicular to that arrow, slicing the space. Points the arrow points toward have positive θ^T x (predicted p > 0.5), points in the opposite direction give negative values (p < 0.5).

Intuition: the knife and the pancake

Picture your dataset as a pancake on the table. The weight vector is a knife stuck straight into the pancake; the decision boundary is the plane of the knife blade. Rotate the knife (change θ direction) and you rotate the line that divides blueberries from chocolate chips.


Intercept and translation

Including an intercept θ0 (bias) corresponds to augmenting x with a constant 1: θ^T x + θ0 = 0. Geometrically, θ0 moves the hyperplane away from the origin — it translates the slice. Changing θ0 slides the line parallel to itself; changing θ (direction) rotates it.


Thresholds other than 0.5: parallel decision boundaries

If you use threshold τ (p≥τ ⇒ class 1), then the boundary is

\theta^T x = \log\frac{\tau}{1-\tau}

That's still a hyperplane — just parallel to the 0.5 boundary. Lowering/increasing τ slides the boundary along the θ direction without rotating it.


Regularization — geometry edition (builds on your Regression II and Regularized Logistic Regression knowledge)

Remember: L2 (ridge) penalizes large weights, L1 (lasso) promotes sparsity. What does that do geometrically?

Regularizer Geometric effect on decision boundary Intuition/When useful
None Can produce very steep/tilted hyperplanes (high-magnitude θ) Fit closely, risk overfitting to weird shapes if you transform features
L2 (ridge) Shrinks θ magnitudes → boundary less steeply sensitive, smoother orientation Reduces variance, keeps all features but smaller influence
L1 (lasso) Drives some θ components to 0 → boundary aligns with subspace of selected features Feature selection; boundary may become axis-aligned in transformed space
  • L2 is like attaching a rubber band to the arrow (θ): it resists extreme directions, preferring shorter arrows that still slice the data. Shorter arrow → gentler discrimination.
  • L1 is like forcing some components of the arrow to zero with duct tape; the knife ends up only pointing along certain coordinates.

Question: Why does shrinking θ make the boundary "less complex"? Because large θ amplify small differences in feature space into big probability swings (logit magnitudes). Shrink θ → smaller logits → smoother probability surface.


Nonlinear boundaries via feature engineering

Logistic's hyperplane is linear in the feature space, but if you engineer features (polynomials, interactions, kernels), the hyperplane becomes nonlinear in original x.

Example: If you augment (x1,x2) with x1^2 + x2^2, a linear separator in this transformed space can produce a circular boundary in the original space. Geometry + feature transforms = creativity.


Multiclass geometry (softmax and one-vs-rest)

  • One-vs-rest (OvR): Each class gets a θ_k. The predicted class is argmax_k θ_k^T x. Decision boundaries between classes k and j satisfy θ_k^T x = θ_j^T x ⇒ (θ_k − θ_j)^T x = 0. So pairwise boundaries are hyperplanes.
  • Softmax (multinomial logistic): same idea — pairwise linear boundaries. The regions form convex polyhedra (think Voronoi tessellation with linear facets).

So multiclass logistic with linear scores partitions space into convex regions separated by straight hyperplanes.


Practical geometry exercises (do these in your head or notebook)

  1. Take θ = [1, −2, 0.5] with x = [1, x1, x2] (bias included). Plot the line θ^T x = 0 in the x1-x2 plane. Which direction is positive? (Answer: follow θ's projection onto x1-x2).
  2. Increase the magnitude of θ by 10x. What happens to predicted probabilities near the boundary? (Answer: they become sharper — closer to 0 or 1 except in a thinner strip around the boundary.)
  3. Add a feature x3 = x1^2 + x2^2 and fit a new θ. What shape might the decision boundary be now? (Answer: circular or elliptical if coefficients weight x3 and bias appropriately.)

Why people misunderstand this

  • They mix up data space and parameter space. Change θ (parameter space) and you move/rotate the decision boundary (data space). Don’t confuse the sign of θ components with class labels without checking the intercept.
  • They think regularization magically “changes model class” — it doesn’t make logistic nonlinear. It just changes how the hyperplane sits.

Closing: key takeaways and the single line to tattoo on your brain

  • Decision boundary = θ^T x + θ0 = 0. The weight vector θ is perpendicular to that boundary. Rotate θ → rotate the boundary; change θ0 → slide it.
  • L2/L1 don’t change type of boundary (unless you change features); they change the orientation, position, and sensitivity of the boundary by altering θ.
  • Feature transformations convert linear separators in feature space to nonlinear separators in original space — geometry is loyal to your feature map.

Final thought: We learned to control complexity in regression with ridge/lasso. Now in classification, those same brakes on θ are our geometric steering wheel: they rotate, shrink, or simplify the slices our model makes in the world. Geometry isn’t decoration — it’s where the rubber meets the data.

Go try: pick a 2D dataset, fit logistic with and without L2 and with a quadratic feature. Plot the boundaries. Then sit back and watch math become visual theater.


Flashcards
Mind Map
Speed Challenge

Comments (0)

Please sign in to leave a comment.

No comments yet. Be the first to comment!

Ready to practice?

Sign up now to study with flashcards, practice questions, and more — and track your progress on this topic.

Study with flashcards, timelines, and more
Earn certificates for completed courses
Bookmark content for later reference
Track your progress across all topics