LLM Behavior and Capabilities
Understand alignment, sensitivity to phrasing, non-determinism, and other behavioral properties that your prompts must account for.
Content
Pretraining and Fine-Tuning
Versions:
Watch & Learn
AI-discovered learning video
Pretraining and Fine-Tuning — How LLMs Learn to Be Useful (and Occasionally Dramatic)
"Pretraining gives the model its personality. Fine-tuning teaches it manners."
You already know how modern LLMs spit out tokens by juggling probabilities (see Foundations > Useful Mental Models of LLMs, Position 15). You also (hopefully) carry an evaluation mindset from Day One (Position 14), and you’re aware that models sit behind multiple safety layers and moderation mechanisms (Position 13). Good. Now let’s connect those dots: how does a raw model become the obedient, creative, or sometimes baffling conversationalist you prompt today? Enter pretraining and fine-tuning — the apprenticeship and finishing school of LLMs.
TL;DR (for the scanners)
- Pretraining builds the model's broad prior — a probability map over language — by predicting tokens across massive, diverse corpora.
- Fine-tuning sculpts that prior toward a narrower set of behaviors (helpfulness, factuality, safety) using curated data and/or human feedback.
- The same underlying math (log-likelihood, cross-entropy) governs both, but the data, objective tweaks, and training regimes create very different outcomes.
1) Pretraining: The Bakery of Language Habits
Think of pretraining as feeding the model a monstrous buffet of text — books, webpages, code, tweets, forum threads. The job? Learn the statistical patterns of language so it can predict the next token.
- Objective (simplified): minimize cross-entropy / maximize likelihood of the training tokens.
Loss = - sum_t log P(token_t | context_{<t}; theta)
- What it creates:
- Grammatical fluency — the model learns how words glue together.
- World knowledge — facts that occur often in the training data.
- Commonsense priors — default assumptions the model carries into every prompt.
But crucially: pretraining is broad and shallow on behavior. It doesn't know "follow this instruction" unless such instructions are in the data.
Emergent Abilities & In-Context Learning
Some impressive behaviors (reasoning, code synthesis, few-shot learning) emerge when models are large enough and trained on rich data. These are not explicit features but statistical generalizations. Remember: probability is the humble god of these models — more data, more parameters, more surprising emergent behavior.
2) Fine-Tuning: From Wild Linguist to Specialist
Fine-tuning takes the pretrained model and nudges its parameters using smaller, targeted datasets or feedback signals. There are flavors:
- Supervised Fine-Tuning (SFT): training on input-output pairs (e.g., question → good answer).
- Instruction Tuning: SFT on many instruction-response pairs so the model understands "do this when told." (This is why instruction-tuned models follow prompts better.)
- RLHF (Reinforcement Learning from Human Feedback): humans rank outputs, a reward model is trained, then policy optimization (e.g., PPO) nudges the model toward higher human-preference scores.
- Parameter-Efficient Methods: adapters, LoRA, prompt tuning — tweak fewer weights to adapt models without full re-training.
Why fine-tune?
- Align the model with desired behaviors (helpful, honest, harmless).
- Specialize for a domain (legal, medical, customer support).
- Reduce harmful or hallucinated responses — but not perfectly.
3) Table: Pretraining vs Fine-Tuning (Quick Comparison)
| Aspect | Pretraining | Fine-tuning |
|---|---|---|
| Data size | Massive (web-scale) | Smaller, curated |
| Objective | General next-token likelihood | Task- or preference-specific loss |
| Outcome | Broad priors; emergent ability | Targeted behavior; alignment |
| Risk | Memorization / data contamination | Overfitting / catastrophic forgetting |
| Role in prompt engineering | Sets base probabilities | Changes model's responses to prompts |
4) Practical Consequences for Prompt Engineers (Yes, This Is Your Cheat Sheet)
- Know the prior. Pretraining creates the default voice and assumptions. If the model defaults to a style, it's coming from those priors.
- Fine-tuning changes the landscape. An instruction-tuned model will obey prompts more reliably than a vanilla pretrained model — so fewer hacks required.
- RLHF = softer priorities. Models trained with RLHF optimize for human preference signals, which can make them conservative, verbose, or avoidant when unsure (sometimes to the point of being evasive).
- Prompting vs Fine-tuning tradeoff: If you need consistent behavior across many inputs, fine-tuning (or adapters) may be better than endlessly crafting complex prompts.
- Beware of distribution shifts. A model fine-tuned on sanitized customer support interactions may struggle with edgy or unfamiliar queries.
5) Safety, Evaluation, and the Fine-Tuning Tightrope
You already learned to evaluate from Day One — keep that lens. Fine-tuning can improve safety, but it can also introduce new risks:
- Overfitting to annotator norms: If raters have biases, the model inherits them.
- Catastrophic forgetting: Aggressive fine-tuning can erase useful pretraining knowledge.
- Reward hacking in RLHF: Models can optimize for the reward model, exploiting its blind spots.
Rule of thumb: Always evaluate with the same rigorous mindset you used earlier — adversarial tests, distribution-shift checks, and safety benchmarks.
6) Quick How-To: When to Fine-Tune vs Prompt-Engineer
- If you need a one-off behavior or rare tweak: try prompt engineering first.
- If you need consistent behavior across millions of queries: consider fine-tuning or adapters.
- If data is private or small: use parameter-efficient tuning (LoRA, adapters) to avoid leaking or catastrophic forgetting.
- If safety and alignment are critical: combine instruction tuning + RLHF + robust evaluation pipelines.
Closing — The Big Picture (and a Tiny Pep Talk)
Pretraining gives an LLM its habits; fine-tuning teaches it habits with constraints and goals. As a prompt engineer, you live at the intersection: you manipulate the model's inputs to get desirable outputs while remembering that those outputs are ultimately determined by the model's ingrained priors and any tuning applied to them.
Key takeaways:
- Pretraining = broad priors. Fine-tuning = targeted behavior. Both use the same math but differ in data and intent.
- Always evaluate (you already know why from Day One) — fine-tuning can produce subtle failures.
- Combine tools smartly: use prompting, instruction tuning, RLHF, and adapters as complementary levers.
Want a mini-exercise? Take a base model and an instruction-tuned sibling. Prompt both with a tricky, ambiguous request. Compare answers, then try a short supervised fine-tune on 100 examples. Watch the personality shift. Document what changed and why. That's practicing the art.
Version Name: "Pretrain vs Fine-Tune: A Chaotic Courtship"
Comments (0)
Please sign in to leave a comment.
No comments yet. Be the first to comment!