AI Terminology and Mental Models
Build a shared vocabulary and simple mental models for AI discussions.
Content
Parameters and hyperparameters
Versions:
Watch & Learn
AI-discovered learning video
Sign in to watch the learning video for this topic.
Parameters and Hyperparameters — The ML Kitchen (No Apron Required)
You already know what a model is and how data feeds it (shout-out to Models and algorithms and Understanding Data). Now it’s time to pick up the cooking metaphor and stop burning the soufflé.
Quick orientation (no déjà vu)
You’ve seen that a model turns data into predictions. Parameters and hyperparameters are the two families of values that determine how that transformation happens. One family is learned from the data during training; the other is set by you (or by clever search algorithms) before/during training.
Why this matters: choosing and tuning hyperparameters affects whether your model generalizes or just memorizes. Parameters are where the model stores what it learned from your data.
The difference, in plain (and slightly theatrical) language
Parameters: The learned bits inside the model. These are numeric values updated during training to minimize loss — e.g., weights and biases in a neural network, coefficients in linear regression, or thresholds in a decision tree.
Hyperparameters: The knobs you set before training starts (or that a meta-algorithm tweaks). They control the training process and model complexity — e.g., learning rate, number of layers, regularization strength, batch size, epochs, tree depth.
Think of parameters as the filling of a pie that you adjust while tasting; hyperparameters are the oven temperature and baking time you choose before putting the pie in.
Mental models that actually help (use these instead of memorizing definitions)
Knobs vs settings
- Parameters = the fine-grained dial position the model finds by tasting the data.
- Hyperparameters = the set of rules you pick for the tasting process.
Student vs curriculum
- Parameters = the knowledge the student accumulates (weights in their brain).
- Hyperparameters = the curriculum, exam schedule, and study pace.
Search vs target
- Hyperparameters define the search strategy; parameters are the discovered target.
These help you ask the right questions: am I changing the model itself or the way it learns?
Concrete examples (so it stops being abstract)
| Term | Machine-learning example | Where it lives | Who picks it |
|---|---|---|---|
| Parameter | Neural-network weight | Inside model after training | Learned from data during optimization |
| Parameter | Linear model coefficient | Inside model file (.coef_) | Learned by fitting |
| Hyperparameter | Learning rate | Training algorithm config | Chosen by you/search algorithm |
| Hyperparameter | Number of trees in random forest | Model design choice | Chosen by you/search algorithm |
Code peek (sketch)
# scikit-learn logistic regression
from sklearn.linear_model import LogisticRegression
clf = LogisticRegression(C=0.5, max_iter=200) # C is a hyperparameter
clf.fit(X_train, y_train)
print(clf.coef_) # parameters learned from data
print(clf.intercept_)
There — hyperparameter C controls regularization strength; coef_ and intercept_ are parameters learned by fit().
Why hyperparameters matter: the practical consequences
- Too large model complexity (wrong hyperparameters) → overfitting (high variance).
- Too small or bad training settings → underfitting (high bias) or failure to converge.
- Learning rate too high → loss explodes. Too low → training is glacial.
- Batch size affects noise in optimization and speed.
And important: this interacts with data quality (we covered in Understanding Data). No hyperparameter wizardry will fix garbage data.
How to tune hyperparameters (a practical checklist)
- Pick a metric: accuracy, F1, AUC, MSE — whatever maps to your goal.
- Decide search strategy:
- Grid search — exhaustively try combos (good for small spaces).
- Random search — surprisingly efficient for many dims.
- Bayesian optimization / Hyperband / Optuna — smarter, budget-aware searches.
- Use cross-validation to estimate generalization.
- Limit compute with budgets (time, GPUs).
- Track results (logs, experiment tracking tools).
- Watch learning curves to detect under/overfitting.
Quick rule: start simple (fewer hyperparameters), then increase complexity.
Tuning techniques — pros and cons (short)
- Grid search: transparent but expensive and inefficient in high-dim spaces.
- Random search: often better than grid when only a few hyperparameters matter.
- Bayesian optimization: sample-efficient but needs complexity to set up.
- Early stopping: treats number of epochs as tunable and saves compute.
Special topics you should care about
- Regularization (lambda, dropout rate): hyperparameters that combat overfitting by controlling complexity.
- Transfer learning / fine-tuning: pretrained parameters can be frozen or partially updated; hyperparameters for fine-tuning (learning rate schedule, which layers to unfreeze) matter a lot.
- AutoML: automates hyperparameter selection — but understand the knobs to interpret results.
Common confusions (and clarifications)
"If I tune hyperparameters, aren’t I cheating?"
- No — tuning is part of model selection. Just use proper cross-validation and holdout data to avoid overfitting the validation set.
"Are parameters ever hyperparameters?"
- They’re conceptually distinct, but meta-learning blurs lines: e.g., in meta-learning, you can optimize what were traditionally hyperparameters.
"Do hyperparameters learn from data?"
- Only if you use an outer loop (Bayesian optimization, grid search) that treats validation performance as feedback. But they’re not updated by gradient descent inside the model.
Actionable next steps (do these after reading)
- Inspect model parameters in a simple model (e.g., linear regression) and print coef_. See how they change with different data slices.
- Run a small random search on 2–3 hyperparameters (learning rate, batch size, regularization) and plot validation vs. training curves.
- Try transfer learning on a small dataset: freeze most layers, tune the learning rate for the last layers.
Closing mic drop
Parameters are the story your model learned from the data; hyperparameters are the reading plan and study conditions you set so the learning actually happens. Get the settings wrong and your model either memorizes everything (terrible at new stuff) or learns nothing at all. Get the settings right and your model sings.
Tweak the knobs thoughtfully, validate like a scientist, and remember: good data still beats clever hyperparameter tuning.
Key takeaways
- Parameters = learned from data. Hyperparameters = set by you or by a higher-level search.
- Tune hyperparameters with a method and a budget; always validate.
- Use mental models (knobs vs settings, student vs curriculum) to reason quickly when things go wrong.
Version: run an experiment, not a ritual. Your keyboard is a lab instrument — use it.
Comments (0)
Please sign in to leave a comment.
No comments yet. Be the first to comment!