jypi
  • Explore
ChatWays to LearnMind mapAbout

jypi

  • About Us
  • Our Mission
  • Team
  • Careers

Resources

  • Ways to Learn
  • Mind map
  • Blog
  • Help Center
  • Community Guidelines
  • Contributor Guide

Legal

  • Terms of Service
  • Privacy Policy
  • Cookie Policy
  • Content Policy

Connect

  • Twitter
  • Discord
  • Instagram
  • Contact Us
jypi

© 2026 jypi. All rights reserved.

Supervised Machine Learning: Regression and Classification
Chapters

1Foundations of Supervised Learning

2Data Wrangling and Feature Engineering

3Exploratory Data Analysis for Predictive Modeling

4Train/Validation/Test and Cross-Validation Strategies

5Regression I: Linear Models

6Regression II: Regularization and Advanced Techniques

7Classification I: Logistic Regression and Probabilistic View

8Classification II: Thresholding, Calibration, and Metrics

Confusion Matrix AnatomyAccuracy, Precision, Recall, F1ROC Curves and AUCPrecision–Recall Curves and AUC-PRThreshold Selection StrategiesCost Curves and Expected UtilityProbability Calibration MethodsBrier Score and Log LossMulticlass Metrics and AveragingRanking Metrics for Imbalanced DataTop-k and Coverage MetricsMacro vs Micro vs Weighted ScoresCumulative Gain and Lift ChartsCalibration Plots and ReliabilityDecision Curves and Net Benefit

9Distance- and Kernel-Based Methods

10Tree-Based Models and Ensembles

11Handling Real-World Data Issues

12Dimensionality Reduction and Feature Selection

13Model Tuning, Pipelines, and Experiment Tracking

14Model Interpretability and Responsible AI

15Deployment, Monitoring, and Capstone Project

Courses/Supervised Machine Learning: Regression and Classification/Classification II: Thresholding, Calibration, and Metrics

Classification II: Thresholding, Calibration, and Metrics

32343 views

Make cost-aware decisions by selecting thresholds, calibrating probabilities, and using the right metrics.

Content

5 of 15

Threshold Selection Strategies

Thresholds with Sass and Sense
4982 views
intermediate
humorous
sarcastic
machine learning
gpt-5-mini
4982 views

Versions:

Thresholds with Sass and Sense

Watch & Learn

AI-discovered learning video

Sign in to watch the learning video for this topic.

Sign inSign up free

Start learning for free

Sign up to save progress, unlock study materials, and track your learning.

  • Bookmark content and pick up later
  • AI-generated study materials
  • Flashcards, timelines, and more
  • Progress tracking and certificates

Free to join · No credit card required

Threshold Selection Strategies — Where Probabilities Go to Wear Costumes

"A probability isn't a decision. A threshold is the wardrobe change."

You already know how to get a probability out of a model (hello, logistic regression and friends). You also know how to visualize classifier behavior across thresholds using ROC and PR curves. Great — those were the rehearsals. Now we pick the outfit for opening night: the threshold. This note walks through principled, practical, and delightfully pragmatic ways to choose a threshold for binary classification.


Why thresholding deserves a moment of existential thought

  • Your model spits out p = P(y=1 | x). That’s probability, not a verdict. A threshold turns p into a yes/no call.
  • Different thresholds change precision, recall, specificity, F1, and business outcomes. ROC/PR curves showed you the landscape — threshold selection chooses the vantage point.
  • Bad thresholds = wasted effort, false alarms, missed opportunities, possibly regulatory trouble. Choose carefully.

The core decision-theory rule (aka the adult way to set a threshold)

If false positives cost c_fp and false negatives cost c_fn, minimize expected cost by predicting positive when:

p > c_fp / (c_fp + c_fn)

Equivalently, using odds:

p/(1-p) > c_fp / c_fn

Why this works: choosing positive risks a FP with probability (1-p), costing (1-p)c_fp; choosing negative risks a FN with probability p, costing pc_fn. Compare the two.

This is Bayes-style decision making. It depends on your costs, not on some arbitrary 0.5.

Practical tip: always convert business penalties into relative costs (c_fp vs c_fn). If hospital readmission is catastrophic and a false alarm is cheap, set threshold low.


Simple strategies you’ll actually use in the wild

  1. Fixed default (0.5)
    • Pros: simple. Cons: assumes calibrated probabilities and balanced costs/classes. Often wrong.
  2. Maximize a metric on validation set (F1, accuracy, MCC)
    • Compute metric for many thresholds; pick argmax. Works if metric reflects business goal.
  3. Youden's J (ROC-based)
    • Choose threshold maximizing Sensitivity + Specificity - 1 (TPR - FPR). Good when you treat errors symmetrically.
  4. Minimize distance to top-left on ROC
    • Choose threshold minimizing sqrt((1-TPR)^2 + FPR^2). Geometric heuristic.
  5. PR-based selection
    • If classes are imbalanced and precision matters, use PR curve to find threshold giving required precision or recall.
  6. Cost-based threshold (see decision-theory above)
    • Use when you can quantify costs.

When to use ROC vs PR for picking thresholds

  • ROC-based rules (Youden, min-distance) assume roughly equal class importance and are insensitive to class imbalance.
  • PR-based selection is better when positive class is rare and you care about precision/recall trade-offs. A high ROC AUC can hide bad precision at relevant recall levels.

Think: ROC tells you the ability to rank positives above negatives; PR tells you how many of the things you call positive are actually positive. Use PR when false-positives are painful or positives are rare.


Calibration matters — don’t threshold on lies

If your model is miscalibrated, a probability p of 0.6 might not mean 60% true positives. Thresholds that rely on absolute p (like cost-based thresholds) require calibration.

Common calibration fixes:

  • Platt scaling (sigmoid / parametric) — fits a logistic to model scores on validation data
  • Isotonic regression — non-parametric, more flexible but needs more data

Always calibrate on a held-out set, then pick thresholds on another held-out set (or use nested CV). Otherwise you’ll overfit the threshold to noise.


Algorithmic recipe (pseudocode) — pick threshold by maximizing F1

# inputs: val_probs (N), val_labels (0/1), thresholds = np.linspace(0,1,1000)
best_t, best_f1 = 0, -inf
for t in thresholds:
    preds = val_probs >= t
    f1 = f1_score(val_labels, preds)
    if f1 > best_f1:
        best_f1 = f1
        best_t = t
# use best_t on test/production

Notes: repeat with cross-validation to estimate variability and avoid overfitting to a single validation split.


Advanced/robust approaches

  • Cross-validated thresholding: pick thresholds in each fold then average or pick most frequent threshold
  • Cost curves & decision curves: plot net benefit over thresholds to pick based on utility rather than metrics
  • Per-group thresholds: different thresholds for subpopulations when base rates differ (be careful with fairness implications)
  • Reject option (abstain): allow the model to say "I don't know" when p is near the threshold; route to human review

Table — quick compare of selection strategies

Strategy When to use Pros Cons
Default 0.5 Quick prototypes Simple Often wrong with imbalance/costs
Max F1 / MCC Metric-driven goals Directly optimizes your metric Overfits if no holdout; metric-dependent
Youden's J Symmetric errors Simple ROC-based Ignores prevalence
Min-distance (ROC) General tradeoff Intuitive geometry Not cost-aware
PR-based Rare positives Focuses on precision/recall Can be noisy with few positives
Cost-based (Bayes) Known costs Decision-theory optimal Needs quantifiable costs & calibration

A practical checklist before you deploy

  • Is the probability well-calibrated? If not, calibrate.
  • Do you know the relative costs of FP and FN? If yes, use cost-based thresholding.
  • If you must optimize a metric (F1, MCC), pick threshold on a held-out set or via CV.
  • If positives are rare, prefer PR-guided thresholds over naive ROC heuristics.
  • Compute confidence intervals for performance at the chosen threshold.
  • Consider a reject option if misclassifications are costly.

Parting shot (a tiny rant and a tiny wisdom)

Choosing a threshold is the most human part of modeling: it requires values, priorities, and trade-offs. Your model gives you probabilities; your organization gives you consequences. Bring both to the table.

"Calibration gets you honesty. Thresholding gets you judgment. You need both."

Key takeaways:

  • Use decision-theory (costs) when possible — it’s principled.
  • Use PR-based thresholds for rare-event problems.
  • Calibrate first, then threshold.
  • Validate thresholds with held-out or cross-validated data to avoid overfitting.

Now go pick a threshold like you mean it — and if anyone says “just use 0.5,” ask them about their utility matrix and whether they enjoy false alarms.

Flashcards
Mind Map
Speed Challenge

Comments (0)

Please sign in to leave a comment.

No comments yet. Be the first to comment!

Ready to practice?

Sign up now to study with flashcards, practice questions, and more — and track your progress on this topic.

Study with flashcards, timelines, and more
Earn certificates for completed courses
Bookmark content for later reference
Track your progress across all topics