Machine Learning with scikit-learn
Build, tune, and evaluate models using scikit-learn pipelines with reproducible ML workflows.
Content
Regression Metrics
Versions:
Watch & Learn
AI-discovered learning video
Sign in to watch the learning video for this topic.
Regression Metrics in scikit-learn — How to judge continuous predictions
You already learned how to evaluate classifiers and split your data cleanly. Now y'know how to not lie with accuracy — welcome to the regression edition, where everything is a number and everyone cries into spreadsheets.
In the previous sections we covered classification metrics and cross-validation strategies. We also built statistical intuition from the "Statistics and Probability for Data Science" chapter — so you already understand variance, bias, sampling noise, and why confidence matters. Now we apply that intuition to regression metrics — the measures that tell you whether your model’s continuous predictions are useful.
What are regression metrics and why they matter
- Regression metrics quantify the difference between predicted and actual continuous values.
- They answer questions like: How wrong is my model on average? How much of the variance did it explain? Is the error sensitive to outliers?
Why this matters (real-world examples):
- Predicting house prices: a high RMSE might mean millions wrong. Not good.
- Forecasting demand: a biased model could lead to understock or waste.
- Scientific measurements: understanding uncertainty ties back to inference and hypothesis testing from earlier chapters.
Key metrics you’ll use (and when to use them)
1) Mean Squared Error (MSE)
Definition: average of squared residuals.
Why it’s used:
- Penalizes larger errors strongly (because of squaring).
- Mathematically convenient (differentiable) — often the loss used to train regressors.
Sklearn: mean_squared_error(y_true, y_pred)
Pros: smooth and optimizable. Cons: in units squared of target (awkward to interpret).
2) Root Mean Squared Error (RMSE)
Definition: sqrt(MSE). Same units as the target — easier to interpret.
Use when: you care about large errors and want interpretable scale.
Sklearn: mean_squared_error(y_true, y_pred, squared=False)
3) Mean Absolute Error (MAE)
Definition: average absolute difference between predictions and truth.
Why it’s useful:
- Robust to outliers compared to MSE/RMSE.
- Interpretable: average absolute error in same units as target.
Sklearn: mean_absolute_error(y_true, y_pred)
4) R-squared (R²)
Definition (intuitively): proportion of variance in y explained by the model.
- R² = 1 → perfect fit. R² = 0 → model does no better than predicting the mean. R² can be negative if worse than mean predictor.
Sklearn: r2_score(y_true, y_pred)
Caveat: R² doesn't tell you about bias, heteroscedasticity, or goodness for forecasting — it’s about variance explained on the evaluation data.
5) Explained Variance
Definition: how much of the variance of y is captured by the predictions (similar to R² but subtly different for some edge cases).
Sklearn: explained_variance_score(y_true, y_pred)
6) Mean Absolute Percentage Error (MAPE)
Definition: mean of |(y_true - y_pred) / y_true|.
Be careful: division by zero issues, and it punishes small true values heavily. Use only when target is strictly positive and percent error interpretation is desired.
Sklearn: mean_absolute_percentage_error(y_true, y_pred)
Quick comparison table
| Metric | Sensitive to outliers? | Units | Good when... |
|---|---|---|---|
| MSE | Yes (high) | squared units | you want to penalize big errors / training loss |
| RMSE | Yes (high) | original units | interpretability + penalize large errors |
| MAE | Moderate | original units | robustness to outliers |
| R² | N/A | unitless | measuring variance explained |
| MAPE | Yes, and scale-dependent | percent | percent-error interpretation (positive targets only) |
Code cheat-sheet (scikit-learn)
from sklearn.metrics import (
mean_squared_error, mean_absolute_error,
r2_score, explained_variance_score,
mean_absolute_percentage_error
)
y_true = [3.0, -0.5, 2.0, 7.0]
y_pred = [2.5, 0.0, 2.0, 8.0]
mse = mean_squared_error(y_true, y_pred)
rmse = mean_squared_error(y_true, y_pred, squared=False)
mae = mean_absolute_error(y_true, y_pred)
r2 = r2_score(y_true, y_pred)
print('MSE', mse, 'RMSE', rmse, 'MAE', mae, 'R2', r2)
Micro explanation: use squared=False to get RMSE directly.
Practical pitfalls and statistical connections
- Scale dependence: MSE/RMSE/MAE are in the units of the target. If you standardize/scale your target, metric values change — compare across models on the same target scale only.
- Outliers: MSE/RMSE exaggerate outliers (squared term). If your residuals have heavy tails (remember distributional intuition from Statistics chapter), MAE or median absolute error may be better.
- R² misinterpretation: High R² doesn't guarantee low errors if your target has low variance. Conversely, negative R² signals the model is worse than predicting the mean.
- Cross-validation / scoring API: scikit-learn’s cross_val_score and GridSearchCV expect a score where higher is better. Many regression metrics are loss-like (lower is better). Sklearn exposes negative versions: e.g.,
scoring='neg_mean_squared_error'. After CV you’ll usually take the negative of the reported score to interpret MSE.
Example:
from sklearn.model_selection import cross_val_score
scores = cross_val_score(model, X, y, scoring='neg_root_mean_squared_error', cv=5)
rmse_cv = -scores.mean()
- Heteroscedasticity: If residual variance changes across X, pointwise error metrics miss that structure. Visualize residuals! The statistical tools you learned (e.g., plots, tests for variance) help diagnose this.
Practical workflow — what to report and why
- Always report at least two metrics: one scale-aware (RMSE or MAE) and one relative (R²).
- Use RMSE when big errors are especially bad; use MAE for robustness.
- If your business cares about percentages, use MAPE only when target > 0 and you understand its bias.
- Always show residual plots and distribution (histogram or QQ-plot). Metrics alone lie.
- When tuning with cross-validation, use sklearn’s negative-loss scoring and convert back to positive losses for interpretation.
Quick checklist before you ship a model
- Did you compute RMSE and MAE (or whichever suits your objective)?
- Did you compute R² or explained variance for relative performance?
- Did you plot residuals and check for heteroscedasticity?
- Did you compare against a simple baseline (mean predictor) and confirm positive gain?
- Did you think about outliers and whether your metric is robust to them?
Final takeaways — the memorable insight
Metrics are not just numbers — they're narratives. RMSE screams about big mistakes, MAE whispers about the median human experience, and R² tells you how much of the story your model remembers. Use more than one metric, visualize residuals, and always compare to a naive baseline.
"A single metric gives you a number. A set of metrics with residual plots gives you truth."
If you want, next we can: code a small function that returns a neat metric report for any regression model (with plots), or walk through interpreting metrics from a real dataset — your call.
Comments (0)
Please sign in to leave a comment.
No comments yet. Be the first to comment!