Examples: Zero-, One-, and Few-Shot
Use demonstrations to steer behavior, balancing exemplar quality, order effects, and when to skip examples entirely.
Content
Order and Primacy Effects
Versions:
Watch & Learn
AI-discovered learning video
Sign in to watch the learning video for this topic.
Order and Primacy Effects — Why the first example is basically the class president
"Humans (and apparently LLMs) have favorites, and it's almost always the one they met first." — Your mildly obsessed prompt-engineering TA
You're already armed with Selecting Quality Exemplars (Position 4) and Counterexamples for Boundaries (Position 5), and you learned how to Feed the model the right facts at the right time from Supplying Context and Grounding. Great. Now we're talking about the social dynamics of examples: who sits at the front of the classroom matters. In prompt engineering terms, this is the study of order, primacy, and recency effects in zero-, one-, and few-shot prompting.
Quick TL;DR before we gossip
- Primacy effect: examples placed earlier in the list tend to influence the model more strongly. They're the anchors.
- Recency effect: sometimes the last examples carry weight, especially in streaming or short-context settings.
- Order matters more as you add more exemplars: zero-shot has none, one-shot has one (so it's the anchor), few-shot can be a playground of bias depending on order.
- Use ordering as an active tool: anchor with a canonical exemplar, deploy contrastive pairs, or sandwich counterexamples depending on intent.
Why does order matter? (Short answer: attention & anchoring)
LLMs don't “read” like humans, but they weigh context. When given multiple labeled examples, the model tends to treat the earliest patterns as the default rule and later ones as refinements or exceptions. It's like meeting a charismatic person first: you judge the entire group by them.
Practical consequences:
- If your first example strongly favors a certain phrasing, the model will mimic that style across outputs.
- Ambiguous tasks get resolved in the direction of the earliest exemplar.
Question for you: imagine you want consistent output style across varied inputs — would you place your ideal stylistic exemplar first or last? (Hint: first.)
Order effects across zero-, one-, and few-shot
Zero-shot
- No examples, so ordering is irrelevant.
- Your control is the system message and the grounding block — make those precise (remember: Supplying Context and Grounding).
One-shot
- The single example is an anchor. Choose the exemplar wisely (see Selecting Quality Exemplars).
- One-shot is powerful because one clean, well-labeled exemplar can flip the model into the desired format or voice.
Few-shot
- Order becomes a strategic lever.
- First = anchor (primacy). Last = sometimes final nudge (recency), especially in longer conversations or interfaces that truncate earlier context.
- Mixing good & bad exemplars without intention leads to confused behavior (remember Counterexamples for Boundaries — position matters!).
Practical ordering strategies (aka the prompt engineer's toolbox)
Anchor-first (Primacy anchoring)
- Put the clearest, most canonical exemplar first.
- Use when you want consistent style/format across all outputs.
Contrast pairs
- Put a positive exemplar first, then a negative (or vice versa) to sharpen the distinction.
- Use when you want the model to discriminate between acceptable and unacceptable outputs.
Sandwich counterexamples
- Good examples first, then a counterexample, then another good example.
- Helps the model learn boundaries while still being anchored to the correct behavior.
Interleaving/examples in order of difficulty
- Start simple, end complex — helpful for stepwise reasoning tasks.
Randomize during testing
- When you don't know the best order, run experiments with shuffled orders to measure sensitivity.
Pin the exemplar using delimiters
- Use explicit labels and delimiting blocks to reduce ambiguity: triple backticks, headers, or JSON blocks. This boosts the model's ability to parse the order as discrete examples.
Example prompt templates and what they teach us
Code block (toy demonstration — try running A/B tests):
SYSTEM: You are a helpful summarizer.
# Examples:
EX1:
Input: "An ancient oak fell in the storm."
Output: "A storm felled an ancient oak."
EX2:
Input: "A dog chased a cat, then it ran away."
Output: "A dog chased a cat, which ran away."
EX3:
Input: "The committee, tired of delays, voted unanimously."
Output: "Tired of delays, the committee voted unanimously."
---
Input: "A musician tuned her guitar before the show."
Output:
- If EX1 is canonical and clear, the model will likely prefer short, active rephrasing.
- If you swap the order so EX3 is first (more complex syntax), outputs trend toward complex rearrangements.
Experiment idea: swap EX2 and EX1. Does the model favor adding relative clauses? Track that.
A small table: strategies vs. when to use
| Strategy | Use when | Risk |
|---|---|---|
| Anchor-first | Need uniform style | Model locks to style even when a different one fits input |
| Contrast pairs | Teaching boundary cases | Model may overfit to exceptions if contrast is too strong |
| Sandwiching counterexamples | Want robust rule with exceptions | If counterexamples are too late, they may be ignored (primacy wins) |
| Randomize in testing | You're tuning & don't trust priors | More compute/time needed |
Expert take: "Ordering is not a bug — it's a feature. Use it intentionally. The model is offering you a lever; if you don't pull it, your results will still be pulled — by the first thing it sees."
Quick troubleshooting checklist
- Want consistent output? Put the best exemplar first and label it clearly.
- Getting weird exceptions? Add contrastive counterexamples earlier or sandwich them between canonical examples.
- Seeing inconsistent behavior across runs? Randomize order in evaluation to estimate sensitivity.
- Context getting chopped? Put the single most important exemplar near the system message or re-pin via grounding blocks.
Closing: The last word (but not the last example)
Order and primacy effects are subtle levers that turn a handful of exemplars into a reliably shaped behavior. Think of examples not just as examples, but as a tiny curriculum: who shows up first sets the tone. Combine that with the learnings from Selecting Quality Exemplars and Counterexamples for Boundaries, and pin things tightly using the grounding techniques from Supplying Context. Then experiment. Shuffle, A/B, and discover which order actually makes the model sing rather than mutter.
Final challenge: pick a real prompt you're using now, and try three orders: canonical-first, counterexample-first, and randomized. Which gives the clearest, most reliable output? If you tell me the prompt, I will roast the bad orders and help craft the winning lineup.
Comments (0)
Please sign in to leave a comment.
No comments yet. Be the first to comment!