Supervised Machine Learning: Regression and Classification

Chapters

1Foundations of Supervised Learning

2Data Wrangling and Feature Engineering

3Exploratory Data Analysis for Predictive Modeling

Univariate Distributions and Summary Stats Pairwise Relationships and Correlations Visualization for Regression Targets Visualization for Class Imbalance Detecting Nonlinearity and Heteroscedasticity Multicollinearity Diagnostics Train–Test Split Before EDA Stratification Strategies Leakage-Aware EDA Practices Robust Scaling Decisions from EDA Identifying Data Quality Issues Feature Importance via Baseline Models Partial Plots for Early Insight Handling Out-of-Range Values Data Imputation Strategy Design

4Train/Validation/Test and Cross-Validation Strategies

5Regression I: Linear Models

6Regression II: Regularization and Advanced Techniques

7Classification I: Logistic Regression and Probabilistic View

8Classification II: Thresholding, Calibration, and Metrics

9Distance- and Kernel-Based Methods

10Tree-Based Models and Ensembles

11Handling Real-World Data Issues

12Dimensionality Reduction and Feature Selection

13Model Tuning, Pipelines, and Experiment Tracking

14Model Interpretability and Responsible AI

15Deployment, Monitoring, and Capstone Project

Courses/Supervised Machine Learning: Regression and Classification/Exploratory Data Analysis for Predictive Modeling

Exploratory Data Analysis for Predictive Modeling

25159 views

EDA methods tailored to supervised tasks to reveal signal, distribution shifts, and modeling risks.

Content

5 of 15

Detecting Nonlinearity and Heteroscedasticity

Nonlinearity & Heteroscedasticity — Sass and Stats

2524 views

intermediate

humorous

visual

science

gpt-5-mini

2524 views

Versions:

Nonlinearity & Heteroscedasticity — Sass and Stats

Watch & Learn

AI-discovered learning video

Start learning for free

Bookmark content and pick up later
AI-generated study materials
Flashcards, timelines, and more
Progress tracking and certificates

Free to join · No credit card required

Detecting Nonlinearity and Heteroscedasticity — the plot twist your model didn't see coming

"If your residual plot looks like a hairball, your model is lying to you." — probably me, loudly, in a data lab

You already explored the target distribution visually and checked for class imbalance and target weirdness (see: Visualization for Regression Targets; Visualization for Class Imbalance). You also did the sensible stuff in Data Wrangling and Feature Engineering: cleaned, encoded, scaled, and guarded against leakage. Nice. Now it's time for the emotional heart-to-heart with your model: ask whether the relationship you're modeling is linear enough to justify a plain old linear model, and whether the model's errors behave themselves.

Why this matters

Nonlinearity means your predictor and target have a relationship that isn't a straight line. If your model pretends it is linear, you'll get biased predictions. That feeling when your friend says they 'only drink water' and then chugs an espresso shot — same betrayal.
Heteroscedasticity means the variance of errors changes across levels of a predictor. If you ignore it, your uncertainty estimates and hypothesis tests will be wrong; confidence intervals will be lying little confidence liars.

Quick checklist (what we will do)

Visual checks: residual vs fitted, grouped variance plots, scale-location plot.
Formal tests: Breusch-Pagan, White, Goldfeld-Quandt.
Remedial actions: transforms, polynomials/splines, GAMs, weighted methods, heteroscedasticity-robust inference.
Special note: classification models have their own nonlinearity issues (link function, calibration).

Visual detective work (start here)

Why start visually? Because numbers lie and plots tell the truth. Visual checks are quick and often decisive.

1) Residuals vs Fitted

Plot: residuals on y-axis, fitted values on x-axis.
What to look for: a funnel shape (widening or narrowing) indicates heteroscedasticity. A systematic curve pattern indicates nonlinearity.

Code sketch:

# Python sketch
import matplotlib.pyplot as plt
fitted = model.predict(X)
resid = y - fitted
plt.scatter(fitted, resid, alpha=0.6)
plt.axhline(0, color='k', linestyle='--')
plt.xlabel('Fitted values')
plt.ylabel('Residuals')
plt.show()

Ask: Is the cloud centered around zero and uniform? If not, you found a problem.

2) Scale-Location (Spread) Plot

Plot sqrt(|residuals|) against fitted values.
This makes heteroscedasticity patterns more visible.

3) Residuals vs Predictor

Plot residuals against each important predictor. If you see curvature, your linear terms are missing the beat.

4) Binned variance plot

Group data by quantiles of a predictor, compute variance of residuals per bin, and plot. This clarifies trends when scatter is noisy.

Formal statistical tests (they won't replace plots)

Breusch-Pagan test: tests whether residual variance can be explained by predictors. Good general-purpose test.
White test: allows for nonlinearity in variance, tests more general specifications.
Goldfeld-Quandt test: compares variance across two subsamples; useful if you suspect variance increases with a predictor.

Remember: tests can be sensitive to non-normality of errors and outliers. Use them as complements to plots, not scripture.

Detecting nonlinearity more formally

Partial dependence plots (PDPs) and individual conditional expectation (ICE) plots: great for black-box models, but useful even with linear models to see if relationship looks straight.
Component plus residual (partial residual) plots: reveal whether adding polynomial terms might help.
Correlation + scatter + loess smoother: fit a lowess curve; if it bulges, you need nonlinear features.

Quick code idea for lowess:

from statsmodels.nonparametric.smoothers_lowess import lowess
sm = lowess(y, x, frac=0.3)
plt.plot(x, sm[:,1])

Remedies and when to use them

Table: Problem -> Quick Fix -> When it's best

Problem	Quick Fix	When to use
Nonlinearity (mild)	Add polynomial terms (x^2, x^3)	When shape is simple curve, few features
Nonlinearity (complex)	Splines, regression trees, GAMs	When curve is wiggly or you want interpretable smoothness
Heteroscedasticity	Transform target (log, Box-Cox) or Weighted Least Squares	When variance grows with level; transform can stabilize
Heteroscedasticity (inference)	Robust SEs (HC0-HC3), bootstrap	When you only need correct CIs/p-values

Notes:

Transforming the target can fix both nonlinearity and heteroscedasticity at once (log often tames multiplicative error patterns). But remember interpretability changes.
Weighted Least Squares gives more weight to observations with lower variance; requires estimating a weight function (often via modeling residual variance).
Generalized Additive Models (GAMs) are elegant: they model nonlinearity with smooth functions and can also model variance if extended (e.g., mgcv in R can fit location-scale models).

Classification models: the twist

You still care about nonlinearity: if logit link doesn't fit, predicted probabilities can be systematically off (miscalibration). Diagnostics:

Calibration plot: bin predicted probabilities and compare observed frequency.
Residual-like checks: deviance residuals vs predictors.

Remedies: add nonlinear terms, use tree-based models, or recalibrate probabilities (isotonic regression, Platt scaling).

Practical workflow (do this in order)

Fit your baseline model (after proper splitting/cross-validation!).
Plot residuals vs fitted and residuals vs key predictors. Ask: curve? funnel? both?
Fit a lowess smoother or partial residual plot to confirm nonlinearity.
Run Breusch-Pagan to test heteroscedasticity if visual signs exist.
Try a simple transform (log or Box-Cox). Re-evaluate.
If transform insufficient, try polynomial/spline or a flexible model like GAM or tree ensembles.
For inference, switch to robust SEs or WLS as needed.

Closing mic drop

Nonlinearity and heteroscedasticity are not bugs in the data, they're features of reality refusing to be simplified. Your job is to listen: plot, test, and adapt. Start with visual empathy, then apply formal tools, and only then choose a remedy that balances accuracy and interpretability.

Key takeaways:

Always look at residuals; they will whisper the truth long before your metrics scream it.
Use transforms, splines, GAMs, or robust methods depending on severity and your goals.
For classification, pay special attention to calibration and link function adequacy.

Final thought: models are like friends — they work best when you accept their quirks and tailor your expectations. Fit the relationship, not the ego.

Flashcards

Mind Map

Speed Challenge

Comments (0)

Please sign in to leave a comment.

No comments yet. Be the first to comment!

Ready to practice?

Study with flashcards, timelines, and more

Earn certificates for completed courses

Bookmark content for later reference

Track your progress across all topics