jypi
  • Explore
ChatWays to LearnMind mapAbout

jypi

  • About Us
  • Our Mission
  • Team
  • Careers

Resources

  • Ways to Learn
  • Mind map
  • Blog
  • Help Center
  • Community Guidelines
  • Contributor Guide

Legal

  • Terms of Service
  • Privacy Policy
  • Cookie Policy
  • Content Policy

Connect

  • Twitter
  • Discord
  • Instagram
  • Contact Us
jypi

© 2026 jypi. All rights reserved.

Supervised Machine Learning: Regression and Classification
Chapters

1Foundations of Supervised Learning

2Data Wrangling and Feature Engineering

3Exploratory Data Analysis for Predictive Modeling

4Train/Validation/Test and Cross-Validation Strategies

5Regression I: Linear Models

6Regression II: Regularization and Advanced Techniques

7Classification I: Logistic Regression and Probabilistic View

8Classification II: Thresholding, Calibration, and Metrics

Confusion Matrix AnatomyAccuracy, Precision, Recall, F1ROC Curves and AUCPrecision–Recall Curves and AUC-PRThreshold Selection StrategiesCost Curves and Expected UtilityProbability Calibration MethodsBrier Score and Log LossMulticlass Metrics and AveragingRanking Metrics for Imbalanced DataTop-k and Coverage MetricsMacro vs Micro vs Weighted ScoresCumulative Gain and Lift ChartsCalibration Plots and ReliabilityDecision Curves and Net Benefit

9Distance- and Kernel-Based Methods

10Tree-Based Models and Ensembles

11Handling Real-World Data Issues

12Dimensionality Reduction and Feature Selection

13Model Tuning, Pipelines, and Experiment Tracking

14Model Interpretability and Responsible AI

15Deployment, Monitoring, and Capstone Project

Courses/Supervised Machine Learning: Regression and Classification/Classification II: Thresholding, Calibration, and Metrics

Classification II: Thresholding, Calibration, and Metrics

32343 views

Make cost-aware decisions by selecting thresholds, calibrating probabilities, and using the right metrics.

Content

6 of 15

Cost Curves and Expected Utility

Costly Choices — Practical Decision Theory with Sass
3076 views
intermediate
humorous
visual
science
gpt-5-mini
3076 views

Versions:

Costly Choices — Practical Decision Theory with Sass

Watch & Learn

AI-discovered learning video

Sign in to watch the learning video for this topic.

Sign inSign up free

Start learning for free

Sign up to save progress, unlock study materials, and track your learning.

  • Bookmark content and pick up later
  • AI-generated study materials
  • Flashcards, timelines, and more
  • Progress tracking and certificates

Free to join · No credit card required

Cost Curves and Expected Utility — The Glorious Economics of Decisions

"Metrics are cute, but dollars (or lives, or server time) pay the bills." — Your friendly decision-theory TA

You're already fresh off learning how to pick thresholds and read precision–recall curves, and you know how logistic regression gives you probabilities instead of just binary screeds. Now we ask: how do we turn those probabilities into decisions that optimize what actually matters — utility (or, equivalently, minimize cost)? Welcome to cost curves and expected utility: the place where math meets money and moral dilemmas (false positives vs false negatives).


What's the point (quick)?

If you can estimate P(y=1 | x) (hello, logistic regression), the optimal decision depends not just on that probability but on the relative costs of mistakes and the class prevalence. Cost curves are a way to visualize how a classifier performs across all possible trade-offs between those costs and prevalence — and expected utility tells you which threshold to pick once you've specified costs.


The setup: costs, errors, and expected cost

Imagine a binary classifier. There are two mistakes:

  • False Positive (FP): predict 1 when true label = 0. Cost: C_FP
  • False Negative (FN): predict 0 when true label = 1. Cost: C_FN

(Yes, you can call them "annoying consequences" instead — costs can be monetary, reputational, or life-or-death.)

Given a threshold t on the model's score s(x) (or on P(y=1|x)), define:

  • FPR_t = P(pred=1 | y=0) at threshold t
  • FNR_t = P(pred=0 | y=1) at threshold t

Then the expected cost (EC) for prior p = P(y=1) is:

EC(t; p) = C_FN * p * FNR_t + C_FP * (1 - p) * FPR_t

That's it. Two error rates weighted by class prevalence and the cost of each type of error.

Interpretation: Think of p * C_FN as the total "risk mass" assigned to positive-class errors, and (1-p) * C_FP to negative-class errors. The classifier splits those masses according to its FNR and FPR.


Bayes decision rule (aka pick the threshold like a grown-up)

For a probabilistic classifier that gives p_hat = P(y=1 | x), compare the expected costs of predicting 1 vs predicting 0 for this single example:

  • If you predict 1: expected cost = C_FP * (1 - p_hat)
  • If you predict 0: expected cost = C_FN * p_hat

Predict 1 when:

C_FP * (1 - p_hat) <= C_FN * p_hat

Rearrange:

p_hat >= C_FP / (C_FP + C_FN)

So the optimal threshold (for this cost pair) is t* = C_FP / (C_FP + C_FN).

Nice consequences:

  • It depends on the ratio of costs, not their absolute scale.
  • If C_FP = C_FN, threshold = 0.5 (as you'd expect).
  • If false negatives are very expensive (C_FN >> C_FP), threshold gets small — be generous calling positives.

Key point: this neat thresholding requires well-calibrated probabilities. Garbage probabilities → garbage decisions.


Cost Curves (Drummond & Holte style) — visualize all operating points

A big pain: real-world costs and class prevalence vary. You might deploy the same model in two countries (different p) or suddenly the cost of an FP spikes (regulation). Instead of committing to one (p, costs) pair, we can look at performance across the whole spectrum.

Construct two transformations:

  1. Probability–Cost Function (PCF):

PCF = (p * C_FN) / (p * C_FN + (1 - p) * C_FP)

This compresses class prior and costs into a single axis variable between 0 and 1. Intuitively, PCF is the relative weight placed on positive-class errors.

  1. Normalized Expected Cost (NEC):

NEC(t; PCF) = FNR_t * PCF + FPR_t * (1 - PCF)

Now plot NEC on the y-axis vs PCF on the x-axis for your classifier (often you do this for a family of thresholds, forming a piecewise-linear curve). Each point tells you the normalized expected cost for that operating point (a blend of prevalence and cost ratio).

Why normalized? NEC avoids absolute cost scales so curves from different datasets or cost-schemes are comparable.


How to read a cost curve (the meme version)

  • If classifier A's curve lies below B's for a range of PCF, A dominates there — lower normalized expected cost for those cost/prior mixes.
  • The convex hull of these curves tells you the best choice if you can change thresholds post-hoc.
  • Crossing curves = pick-your-poison: one classifier better when false negatives costly, the other when false positives costly.

Question to ask yourself: "What PCF region is my deployment in?" If you care about very rare positives and huge cost of missing them (medical screening), you're in a corner of the x-axis and you can pick accordingly.


From theory to practice: how to compute expected cost (pseudocode)

# Given: y_true, p_hat, cost_fp, cost_fn, grid of thresholds T, grid of priors p_grid
for t in T:
    pred = p_hat >= t
    FPR = FP / N_negative
    FNR = FN / N_positive
    for p in p_grid:
        EC[t,p] = cost_fn * p * FNR + cost_fp * (1-p) * FPR
# Or transform p and costs to PCF and compute normalized expected cost

(Use cross-validation or a separate validation set to estimate FPR/FNR — do not cheat with test labels when picking thresholds.)


Practical tips and trade-offs

  • Calibration matters. If your probabilities are miscalibrated, thresholds from Bayes rule will be wrong. Use Platt scaling / isotonic regression.
  • AUC is not enough. AUC summarizes ranking, but cost curves capture where ranking errors actually cost you. Two models with similar AUC can have very different expected costs in realistic PCF ranges.
  • If you know costs, optimize them directly. If you can assign monetary utility, pick the threshold that maximizes expected utility on validation data (or train cost-sensitive models).
  • When costs are uncertain, use cost curves. They show robustness across assumptions.
  • Don't forget class priors shift. Even if costs fixed, deployment prevalence p can move; cost curves let you see sensitivity.

Quick comparison table

Concept What it shows When to use
AUC-ROC / AUC-PR Ranking performance across thresholds General model selection; ranking-heavy tasks
Precision–Recall curves Behavior on positive class (sensitive to class imbalance) Rare positive detection
Cost curves / NEC Expected (normalized) cost over all cost/prior mixes When costs/priors matter or vary

Final flourish — key takeaways

  • Expected cost = weighted sum of FPR and FNR; weights come from class prior and misclassification costs.
  • With calibrated probabilities, the Bayes optimal threshold is t* = C_FP / (C_FP + C_FN).
  • Cost curves compress prior+cost into a PCF axis and let you visualize performance across operating conditions — use them when costs or prevalence are uncertain.
  • Calibration + cost-sensitive thinking = decisions that actually improve utility, not just metrics.

Parting thought: metrics tell you how your model behaves; cost curves tell you how much its misbehavior will hurt. Optimize the latter if you care about consequences — which you should.


Version: "Costly Choices — Practical Decision Theory with Sass"

Flashcards
Mind Map
Speed Challenge

Comments (0)

Please sign in to leave a comment.

No comments yet. Be the first to comment!

Ready to practice?

Sign up now to study with flashcards, practice questions, and more — and track your progress on this topic.

Study with flashcards, timelines, and more
Earn certificates for completed courses
Bookmark content for later reference
Track your progress across all topics