Supervised Machine Learning: Regression and Classification

Chapters

1Foundations of Supervised Learning

2Data Wrangling and Feature Engineering

3Exploratory Data Analysis for Predictive Modeling

4Train/Validation/Test and Cross-Validation Strategies

5Regression I: Linear Models

6Regression II: Regularization and Advanced Techniques

7Classification I: Logistic Regression and Probabilistic View

8Classification II: Thresholding, Calibration, and Metrics

9Distance- and Kernel-Based Methods

10Tree-Based Models and Ensembles

11Handling Real-World Data Issues

12Dimensionality Reduction and Feature Selection

13Model Tuning, Pipelines, and Experiment Tracking

Grid Search and Random Search Bayesian Optimization Basics Successive Halving and Hyperband Early Stopping and Warm Starts Hyperparameter Spaces and Priors Pipeline Composition and Caching ColumnTransformers for Heterogeneous Data Custom Transformers and Estimators Cross-Validated Pipelines Refit Strategies and Model Persistence Reproducible Experiment Tracking Logging and Metadata Management Parallel and Distributed Tuning Budget-Aware Optimization Reusing and Sharing Artifacts

14Model Interpretability and Responsible AI

15Deployment, Monitoring, and Capstone Project

Courses/Supervised Machine Learning: Regression and Classification/Model Tuning, Pipelines, and Experiment Tracking

Model Tuning, Pipelines, and Experiment Tracking

19387 views

Automate workflows, search hyperparameters, and track experiments reproducibly.

Content

5 of 15

Hyperparameter Spaces and Priors

Sassy Priors & Hyperparameter Space Design

2223 views

intermediate

humorous

machine learning

education theory

gpt-5-mini

2223 views

Versions:

Sassy Priors & Hyperparameter Space Design

Watch & Learn

AI-discovered learning video

Start learning for free

Bookmark content and pick up later
AI-generated study materials
Flashcards, timelines, and more
Progress tracking and certificates

Free to join · No credit card required

Hyperparameter Spaces and Priors — The Chaotic Map You Learn to Love

"Pick your priors like you pick your coffee strength: too weak and nothing wakes up; too strong and you'll regret it halfway through the semester." — Your future hyperparameter-tuned self

You already know about Early Stopping, Warm Starts, Successive Halving, and Hyperband — those are our budget-savvy friends that help us stop training terrible models early and reuse work when sensible. You also just trimmed down your feature mess with dimensionality reduction and feature selection, reducing redundancy and improving signal. Great. Now we need to decide where the knobs live, what values they can take, and what beliefs (priors) we bring to the table when searching them. This is the art and science of hyperparameter spaces and priors.

Why hyperparameter spaces and priors matter

If you search a space that is too wide, you waste compute exploring absurd regions.
If you search a space that is too narrow, you miss the glory zone of performance.
If you pick the wrong distribution (prior), your optimizer (random, Bayesian, or bandit-style) may chase the wrong ghost.

Priors are not metaphysical; they are the probability distributions or sampling strategies we give the optimizer. Whether you're doing plain random search, Bayesian optimization, or launching a Hyperband run, the sampling policy is your prior belief about where good hyperparameters live.

Types of hyperparameters & canonical priors

Continuous (real) — e.g., learning rate, L2 penalty
- Use log-uniform for scale parameters (learning rates, regularization strengths) because multiplicative changes matter more than additive ones.
- Use normal or uniform if the parameter behaves linearly.
Integer — e.g., number of trees, depth, n_components (PCA)
- Sample integer values; consider sampling a continuous space and rounding only when values are small or when scale matters logarithmically.
Categorical — e.g., optimizer = {sgd, adam}, activation = {relu, tanh}
- Use discrete uniform or informed weights if you have prior preference.
Conditional — e.g., if model = XGBoost then tune max_depth, otherwise tune hidden_layers
- Express conditions explicitly in your search space; many optimizers support nested/conditional spaces.

Quick reference table — common hyperparams and recommended priors

Hyperparameter	Typical domain	Recommended prior/transformation	Why (intuition)
Learning rate (lr)	(1e-6, 1)	Log-uniform	Multiplicative effects; 1e-3 vs 1e-4 matters more than 0.01 vs 0.02
L2 (alpha)	(1e-8, 10)	Log-uniform	Regularization strength spans orders of magnitude
Number of trees (n_estimators)	[10, 5000]	Integer linear or log-scale	Often more trees help but cost scales; log-sampling finds small/large quickly
Max depth	[2, 50]	Integer uniform	Depth is discrete and often small values matter most
Dropout rate	[0, 0.9]	Beta(2,5) or clipped normal	Probability in [0,1]; Beta allows skew towards small values
PCA components (n_components)	[1, min(n_features, 300)]	Integer linear	Often linear search or informed by variance explained

Priors in practice: examples & pitfalls

1) The log-uniform salvation

If you try a Uniform(0.0001, 0.1) for learning rate, you'll bias toward large values numerically even though tiny values can be better. Use LogUniform(1e-6, 1e-1). In code, sample u ~ Uniform(log(a), log(b)) and set x = exp(u).

2) Beware of naive integer encoding

Don't treat categorical as integers (0,1,2) and feed them to algorithms that assume ordering. If you sample an integer and pass it to an algorithm that expects a categorical value, you accidentally imply an ordinal relationship.

3) Conditional spaces and wasted compute

If a hyperparameter only matters for one model or one stage of a pipeline, make it conditional. Running expensive evaluations for irrelevant parameters is a crime against the compute budget.

4) Your prior should reflect scale knowledge

If feature scaling or dimensionality reduction changed the signal (e.g., you reduced to 10 PCA components), that affects sensible ranges for parameters like max_features, hidden_layer sizes, or regularization — shrink ranges accordingly.

How priors interact with tuning methods you already know

Random Search: No model of the objective, so the prior is literally the sampling distribution. If you use log-uniform, random search will actually explore multiplicative scales properly.
Bayesian Optimization (BO): BO builds a surrogate model. The prior (initial distribution for random trials + any prior mean for the surrogate) influences where BO starts exploring. Use a few random draws that reflect your beliefs before BO goes greedy.
Successive Halving / Hyperband: These need brackets and resource allocation. If your prior thinks small models are good (e.g., low n_estimators / small hidden sizes), Hyperband will often keep those early and scale up promising ones. If your prior always samples huge models, you blow budget fast.
Warm Starts: If your model supports warm-starting (e.g., incremental estimators or continuing boosting rounds), structure the search so that parameter changes that do not require retraining from scratch can be explored more cheaply. For example, increase n_estimators incrementally and warm-start to try different learning rates — but be careful: some hyperparams (like max_depth) cannot be trivially warm-started.

Practical recipe: design a sensible hyperparameter space (step-by-step)

Start with domain knowledge: what ranges made sense during manual dev? Use that as center.
Transform scale parameters into log-space (lr, alpha). Use log-uniform sampling.
Use informative priors if you have evidence (e.g., prior runs, literature). Otherwise use weak but sensible priors (e.g., Beta for probabilities favoring small dropout).
Express conditionals explicitly (model-specific params). Keep the global space compact.
Run a short random-search pilot (20–50 trials). Inspect results with experiment tracking; update priors.
Move to BO or bandit methods with the updated priors and leverage warm-starts if valid.

Example: a compact Optuna-style space (pseudocode)

# Pseudocode / conceptual
with trial:
    model = trial.suggest_categorical('model', ['rf', 'xgb', 'mlp'])
    if model == 'rf':
        n_estimators = trial.suggest_int('rf__n_estimators', 50, 2000, log=True)
        max_depth = trial.suggest_int('rf__max_depth', 3, 40)
        max_features = trial.suggest_uniform('rf__max_features', 0.1, 1.0)
    elif model == 'xgb':
        lr = trial.suggest_loguniform('xgb__lr', 1e-5, 1e-1)
        max_depth = trial.suggest_int('xgb__max_depth', 3, 12)
    elif model == 'mlp':
        lr = trial.suggest_loguniform('mlp__lr', 1e-6, 1e-2)
        hidden = trial.suggest_int('mlp__hidden_units', 16, 1024, log=True)
        dropout = trial.suggest_beta('mlp__dropout', 2, 5)

Note: Many libraries don't have suggest_int(..., log=True); implement by sampling log and exponentiating.

Experiment tracking & reproducibility: record your priors

If your experiment tracking stores only hyperparameter values but not the prior/space definition, you won't be able to reproduce the search behavior later. Log:

The entire search space definition (bounds, transforms)
Seed(s) for pseudo-random samplers
Sampling strategy (random, BO, Hyperband) and settings (brackets, eta)

This ties back into the earlier module on experiment tracking: priors are part of your experiment design and must be versioned.

Final takeaways — TL;DR for the lazy (and brilliant)

Use log-scale for multiplicative parameters (learning rates, regularization). Treat probabilities with Beta, counts with integer ranges.
Make your hyperparameter space conditional and compact. No free-floating meaningless knobs.
Run a small pilot to learn a prior; then refine and escalate to BO/Hyperband. Record everything.
Remember: pruning strategies (Successive Halving, Hyperband) and warm starts can massively change which priors are efficient. Think about budget when designing priors.

Parting wisdom: designing hyperparameter spaces is half science, half game design. If your space is a chaotic monster, even the best optimizer will learn to be afraid of you. Be kind, be informed, and log everything — your future self will thank you.

Flashcards

Mind Map

Speed Challenge

Comments (0)

Please sign in to leave a comment.

No comments yet. Be the first to comment!

Ready to practice?

Study with flashcards, timelines, and more

Earn certificates for completed courses

Bookmark content for later reference

Track your progress across all topics