Supervised Machine Learning: Regression and Classification

Chapters

1Foundations of Supervised Learning

2Data Wrangling and Feature Engineering

3Exploratory Data Analysis for Predictive Modeling

4Train/Validation/Test and Cross-Validation Strategies

5Regression I: Linear Models

6Regression II: Regularization and Advanced Techniques

7Classification I: Logistic Regression and Probabilistic View

8Classification II: Thresholding, Calibration, and Metrics

Confusion Matrix Anatomy Accuracy, Precision, Recall, F1 ROC Curves and AUC Precision–Recall Curves and AUC-PR Threshold Selection Strategies Cost Curves and Expected Utility Probability Calibration Methods Brier Score and Log Loss Multiclass Metrics and Averaging Ranking Metrics for Imbalanced Data Top-k and Coverage Metrics Macro vs Micro vs Weighted Scores Cumulative Gain and Lift Charts Calibration Plots and Reliability Decision Curves and Net Benefit

9Distance- and Kernel-Based Methods

10Tree-Based Models and Ensembles

11Handling Real-World Data Issues

12Dimensionality Reduction and Feature Selection

13Model Tuning, Pipelines, and Experiment Tracking

14Model Interpretability and Responsible AI

15Deployment, Monitoring, and Capstone Project

Courses/Supervised Machine Learning: Regression and Classification/Classification II: Thresholding, Calibration, and Metrics

Classification II: Thresholding, Calibration, and Metrics

32355 views

Make cost-aware decisions by selecting thresholds, calibrating probabilities, and using the right metrics.

Content

6 of 15

Cost Curves and Expected Utility

Costly Choices — Practical Decision Theory with Sass

3078 views

intermediate

humorous

visual

science

gpt-5-mini

3078 views

Versions:

Costly Choices — Practical Decision Theory with Sass

Watch & Learn

AI-discovered learning video

Start learning for free

Bookmark content and pick up later
AI-generated study materials
Flashcards, timelines, and more
Progress tracking and certificates

Free to join · No credit card required

Cost Curves and Expected Utility — The Glorious Economics of Decisions

"Metrics are cute, but dollars (or lives, or server time) pay the bills." — Your friendly decision-theory TA

You're already fresh off learning how to pick thresholds and read precision–recall curves, and you know how logistic regression gives you probabilities instead of just binary screeds. Now we ask: how do we turn those probabilities into decisions that optimize what actually matters — utility (or, equivalently, minimize cost)? Welcome to cost curves and expected utility: the place where math meets money and moral dilemmas (false positives vs false negatives).

What's the point (quick)?

If you can estimate P(y=1 | x) (hello, logistic regression), the optimal decision depends not just on that probability but on the relative costs of mistakes and the class prevalence. Cost curves are a way to visualize how a classifier performs across all possible trade-offs between those costs and prevalence — and expected utility tells you which threshold to pick once you've specified costs.

The setup: costs, errors, and expected cost

Imagine a binary classifier. There are two mistakes:

False Positive (FP): predict 1 when true label = 0. Cost: C_FP
False Negative (FN): predict 0 when true label = 1. Cost: C_FN

(Yes, you can call them "annoying consequences" instead — costs can be monetary, reputational, or life-or-death.)

Given a threshold t on the model's score s(x) (or on P(y=1|x)), define:

FPR_t = P(pred=1 | y=0) at threshold t
FNR_t = P(pred=0 | y=1) at threshold t

Then the expected cost (EC) for prior p = P(y=1) is:

EC(t; p) = C_FN * p * FNR_t + C_FP * (1 - p) * FPR_t

That's it. Two error rates weighted by class prevalence and the cost of each type of error.

Interpretation: Think of p * C_FN as the total "risk mass" assigned to positive-class errors, and (1-p) * C_FP to negative-class errors. The classifier splits those masses according to its FNR and FPR.

Bayes decision rule (aka pick the threshold like a grown-up)

For a probabilistic classifier that gives p_hat = P(y=1 | x), compare the expected costs of predicting 1 vs predicting 0 for this single example:

If you predict 1: expected cost = C_FP * (1 - p_hat)
If you predict 0: expected cost = C_FN * p_hat

Predict 1 when:

C_FP * (1 - p_hat) <= C_FN * p_hat

Rearrange:

p_hat >= C_FP / (C_FP + C_FN)

So the optimal threshold (for this cost pair) is t* = C_FP / (C_FP + C_FN).

Nice consequences:

It depends on the ratio of costs, not their absolute scale.
If C_FP = C_FN, threshold = 0.5 (as you'd expect).
If false negatives are very expensive (C_FN >> C_FP), threshold gets small — be generous calling positives.

Key point: this neat thresholding requires well-calibrated probabilities. Garbage probabilities → garbage decisions.

Cost Curves (Drummond & Holte style) — visualize all operating points

A big pain: real-world costs and class prevalence vary. You might deploy the same model in two countries (different p) or suddenly the cost of an FP spikes (regulation). Instead of committing to one (p, costs) pair, we can look at performance across the whole spectrum.

Construct two transformations:

Probability–Cost Function (PCF):

PCF = (p * C_FN) / (p * C_FN + (1 - p) * C_FP)

This compresses class prior and costs into a single axis variable between 0 and 1. Intuitively, PCF is the relative weight placed on positive-class errors.

Normalized Expected Cost (NEC):

NEC(t; PCF) = FNR_t * PCF + FPR_t * (1 - PCF)

Now plot NEC on the y-axis vs PCF on the x-axis for your classifier (often you do this for a family of thresholds, forming a piecewise-linear curve). Each point tells you the normalized expected cost for that operating point (a blend of prevalence and cost ratio).

Why normalized? NEC avoids absolute cost scales so curves from different datasets or cost-schemes are comparable.

How to read a cost curve (the meme version)

If classifier A's curve lies below B's for a range of PCF, A dominates there — lower normalized expected cost for those cost/prior mixes.
The convex hull of these curves tells you the best choice if you can change thresholds post-hoc.
Crossing curves = pick-your-poison: one classifier better when false negatives costly, the other when false positives costly.

Question to ask yourself: "What PCF region is my deployment in?" If you care about very rare positives and huge cost of missing them (medical screening), you're in a corner of the x-axis and you can pick accordingly.

From theory to practice: how to compute expected cost (pseudocode)

# Given: y_true, p_hat, cost_fp, cost_fn, grid of thresholds T, grid of priors p_grid
for t in T:
    pred = p_hat >= t
    FPR = FP / N_negative
    FNR = FN / N_positive
    for p in p_grid:
        EC[t,p] = cost_fn * p * FNR + cost_fp * (1-p) * FPR
# Or transform p and costs to PCF and compute normalized expected cost

(Use cross-validation or a separate validation set to estimate FPR/FNR — do not cheat with test labels when picking thresholds.)

Practical tips and trade-offs

Calibration matters. If your probabilities are miscalibrated, thresholds from Bayes rule will be wrong. Use Platt scaling / isotonic regression.
AUC is not enough. AUC summarizes ranking, but cost curves capture where ranking errors actually cost you. Two models with similar AUC can have very different expected costs in realistic PCF ranges.
If you know costs, optimize them directly. If you can assign monetary utility, pick the threshold that maximizes expected utility on validation data (or train cost-sensitive models).
When costs are uncertain, use cost curves. They show robustness across assumptions.
Don't forget class priors shift. Even if costs fixed, deployment prevalence p can move; cost curves let you see sensitivity.

Quick comparison table

Concept	What it shows	When to use
AUC-ROC / AUC-PR	Ranking performance across thresholds	General model selection; ranking-heavy tasks
Precision–Recall curves	Behavior on positive class (sensitive to class imbalance)	Rare positive detection
Cost curves / NEC	Expected (normalized) cost over all cost/prior mixes	When costs/priors matter or vary

Final flourish — key takeaways

Expected cost = weighted sum of FPR and FNR; weights come from class prior and misclassification costs.
With calibrated probabilities, the Bayes optimal threshold is t* = C_FP / (C_FP + C_FN).
Cost curves compress prior+cost into a PCF axis and let you visualize performance across operating conditions — use them when costs or prevalence are uncertain.
Calibration + cost-sensitive thinking = decisions that actually improve utility, not just metrics.

Parting thought: metrics tell you how your model behaves; cost curves tell you how much its misbehavior will hurt. Optimize the latter if you care about consequences — which you should.

Version: "Costly Choices — Practical Decision Theory with Sass"

Flashcards

Mind Map

Speed Challenge

Comments (0)

Please sign in to leave a comment.

No comments yet. Be the first to comment!

Ready to practice?

Study with flashcards, timelines, and more

Earn certificates for completed courses

Bookmark content for later reference

Track your progress across all topics