Courses/Python for Data Science, AI & Development/Machine Learning with scikit-learn

Machine Learning with scikit-learn

44937 views

Build, tune, and evaluate models using scikit-learn pipelines with reproducible ML workflows.

Content

4 of 15

Regression Metrics

Regression Metrics in scikit-learn: RMSE, MAE, R² Guide

2490 views

beginner

python

regression

scikit-learn

data-science

gpt-5-mini

2490 views

Versions:

Regression Metrics in scikit-learn: RMSE, MAE, R² Guide

Watch & Learn

AI-discovered learning video

Start learning for free

Bookmark content and pick up later
AI-generated study materials
Flashcards, timelines, and more
Progress tracking and certificates

Free to join · No credit card required

Regression Metrics in scikit-learn — How to judge continuous predictions

You already learned how to evaluate classifiers and split your data cleanly. Now y'know how to not lie with accuracy — welcome to the regression edition, where everything is a number and everyone cries into spreadsheets.

In the previous sections we covered classification metrics and cross-validation strategies. We also built statistical intuition from the "Statistics and Probability for Data Science" chapter — so you already understand variance, bias, sampling noise, and why confidence matters. Now we apply that intuition to regression metrics — the measures that tell you whether your model’s continuous predictions are useful.

What are regression metrics and why they matter

Regression metrics quantify the difference between predicted and actual continuous values.
They answer questions like: How wrong is my model on average? How much of the variance did it explain? Is the error sensitive to outliers?

Why this matters (real-world examples):

Predicting house prices: a high RMSE might mean millions wrong. Not good.
Forecasting demand: a biased model could lead to understock or waste.
Scientific measurements: understanding uncertainty ties back to inference and hypothesis testing from earlier chapters.

Key metrics you’ll use (and when to use them)

1) Mean Squared Error (MSE)

Definition: average of squared residuals.

Why it’s used:

Penalizes larger errors strongly (because of squaring).
Mathematically convenient (differentiable) — often the loss used to train regressors.

Sklearn: mean_squared_error(y_true, y_pred)

Pros: smooth and optimizable. Cons: in units squared of target (awkward to interpret).

2) Root Mean Squared Error (RMSE)

Definition: sqrt(MSE). Same units as the target — easier to interpret.

Use when: you care about large errors and want interpretable scale.

Sklearn: mean_squared_error(y_true, y_pred, squared=False)

3) Mean Absolute Error (MAE)

Definition: average absolute difference between predictions and truth.

Why it’s useful:

Robust to outliers compared to MSE/RMSE.
Interpretable: average absolute error in same units as target.

Sklearn: mean_absolute_error(y_true, y_pred)

4) R-squared (R²)

Definition (intuitively): proportion of variance in y explained by the model.

R² = 1 → perfect fit. R² = 0 → model does no better than predicting the mean. R² can be negative if worse than mean predictor.

Sklearn: r2_score(y_true, y_pred)

Caveat: R² doesn't tell you about bias, heteroscedasticity, or goodness for forecasting — it’s about variance explained on the evaluation data.

5) Explained Variance

Definition: how much of the variance of y is captured by the predictions (similar to R² but subtly different for some edge cases).

Sklearn: explained_variance_score(y_true, y_pred)

6) Mean Absolute Percentage Error (MAPE)

Definition: mean of |(y_true - y_pred) / y_true|.

Be careful: division by zero issues, and it punishes small true values heavily. Use only when target is strictly positive and percent error interpretation is desired.

Sklearn: mean_absolute_percentage_error(y_true, y_pred)

Quick comparison table

Metric	Sensitive to outliers?	Units	Good when...
MSE	Yes (high)	squared units	you want to penalize big errors / training loss
RMSE	Yes (high)	original units	interpretability + penalize large errors
MAE	Moderate	original units	robustness to outliers
R²	N/A	unitless	measuring variance explained
MAPE	Yes, and scale-dependent	percent	percent-error interpretation (positive targets only)

Code cheat-sheet (scikit-learn)

from sklearn.metrics import (
    mean_squared_error, mean_absolute_error,
    r2_score, explained_variance_score,
    mean_absolute_percentage_error
)

y_true = [3.0, -0.5, 2.0, 7.0]
 y_pred = [2.5, 0.0, 2.0, 8.0]

mse = mean_squared_error(y_true, y_pred)
rmse = mean_squared_error(y_true, y_pred, squared=False)
mae = mean_absolute_error(y_true, y_pred)
r2 = r2_score(y_true, y_pred)

print('MSE', mse, 'RMSE', rmse, 'MAE', mae, 'R2', r2)

Micro explanation: use squared=False to get RMSE directly.

Practical pitfalls and statistical connections

Scale dependence: MSE/RMSE/MAE are in the units of the target. If you standardize/scale your target, metric values change — compare across models on the same target scale only.
Outliers: MSE/RMSE exaggerate outliers (squared term). If your residuals have heavy tails (remember distributional intuition from Statistics chapter), MAE or median absolute error may be better.
R² misinterpretation: High R² doesn't guarantee low errors if your target has low variance. Conversely, negative R² signals the model is worse than predicting the mean.
Cross-validation / scoring API: scikit-learn’s cross_val_score and GridSearchCV expect a score where higher is better. Many regression metrics are loss-like (lower is better). Sklearn exposes negative versions: e.g., scoring='neg_mean_squared_error'. After CV you’ll usually take the negative of the reported score to interpret MSE.

Example:

from sklearn.model_selection import cross_val_score
scores = cross_val_score(model, X, y, scoring='neg_root_mean_squared_error', cv=5)
rmse_cv = -scores.mean()

Heteroscedasticity: If residual variance changes across X, pointwise error metrics miss that structure. Visualize residuals! The statistical tools you learned (e.g., plots, tests for variance) help diagnose this.

Practical workflow — what to report and why

Always report at least two metrics: one scale-aware (RMSE or MAE) and one relative (R²).
Use RMSE when big errors are especially bad; use MAE for robustness.
If your business cares about percentages, use MAPE only when target > 0 and you understand its bias.
Always show residual plots and distribution (histogram or QQ-plot). Metrics alone lie.
When tuning with cross-validation, use sklearn’s negative-loss scoring and convert back to positive losses for interpretation.

Quick checklist before you ship a model

Did you compute RMSE and MAE (or whichever suits your objective)?
Did you compute R² or explained variance for relative performance?
Did you plot residuals and check for heteroscedasticity?
Did you compare against a simple baseline (mean predictor) and confirm positive gain?
Did you think about outliers and whether your metric is robust to them?

Final takeaways — the memorable insight

Metrics are not just numbers — they're narratives. RMSE screams about big mistakes, MAE whispers about the median human experience, and R² tells you how much of the story your model remembers. Use more than one metric, visualize residuals, and always compare to a naive baseline.

"A single metric gives you a number. A set of metrics with residual plots gives you truth."

If you want, next we can: code a small function that returns a neat metric report for any regression model (with plots), or walk through interpreting metrics from a real dataset — your call.

Flashcards

Mind Map

Speed Challenge

Comments (0)

Please sign in to leave a comment.

No comments yet. Be the first to comment!

Ready to practice?

Study with flashcards, timelines, and more

Earn certificates for completed courses

Bookmark content for later reference

Track your progress across all topics