Examples: Zero-, One-, and Few-Shot
Use demonstrations to steer behavior, balancing exemplar quality, order effects, and when to skip examples entirely.
Content
Selecting Quality Exemplars
Versions:
Watch & Learn
AI-discovered learning video
Sign in to watch the learning video for this topic.
Selecting Quality Exemplars — The Art and Science of Few-Shot Wizardry
"An exemplar is like a really good burrito: balanced, well-wrapped, and not leaking salsa all over the prompt." — Probably Me, 3 coffees in
You're already comfortable with one-shot demonstrations (we practiced making the model learn from a single, crystal-clear example) and with few-shot prompt patterns (Position 3) (you know where to drop examples in the prompt). You also learned how to supply context and ground facts into the model using structured context blocks and delimiters. Now we level up: how do you pick the actual examples that make few-shot prompting sing instead of fizzle? This is the picky, chef-y part of prompt engineering.
Why exemplar selection matters (quick recap)
- Few-shot performance depends less on volume and more on quality of exemplars.
- Good exemplars teach the model the pattern, not just the answer.
- Bad exemplars teach the model your mistakes, your biases, and your sloppy habits.
Imagine teaching someone to bake by showing them one burnt cookie and one raw batter blob. That's not going to go well.
What makes an exemplar high quality? (The checklist you’ll tattoo on your brain)
- Representativeness — The exemplar should reflect the distribution of real inputs. If 80% of your inputs are short tweets, don’t exemplify with long legal contracts.
- Clarity & Unambiguity — Each exemplar’s input and target output must be unambiguous. No riddles. No missing steps.
- Format consistency — Same structure, same delimiting, same labeling. If answers use bullet lists, keep using bullet lists.
- Diversity (but purposeful) — Cover edge cases and typical cases without adding noise. Pick exemplars that span the main modes of the task.
- Difficulty gradient — Mix easy and slightly harder examples so the model generalizes across difficulty.
- No label leakage — Don’t hide the solution in the input. Metadata that leaks the answer will make your prompt brittle.
- Canonicalization — Prefer normalized, canonical outputs when possible (dates formatted consistently, standardized vocabulary).
- Avoid overfitting style — If you want stylistic variation in outputs, don’t lock exemplars into one tiny voice unless that voice is your goal.
- Ethical sanity check — Remove exemplars that propagate harmful stereotypes or privacy leaks.
Quick examples: Good vs Bad exemplar
| Aspect | Bad exemplar | Good exemplar |
|---|---|---|
| Representativeness | A 10k-word legal memo for a social media caption task | Short tweets/examples (10–280 chars) |
| Clarity | "Make it better" → "Sure" | "Input: ...\nOutput: [Formalized sentence]" |
| Label leakage | Input: "Spam message (spam)" | Input: "Get rich quick..." → Output: "spam" |
Practical exemplar patterns (workable templates)
Use consistent delimiters and context blocks (you learned this in Supplying Context & Grounding). Example prompt skeleton for Position 3 few-shot:
<SYSTEM CONTEXT BLOCK: Grounding facts, constraints, style guide>
---
EXAMPLES:
# Example 1
Input: <text A>
Output: <target A>
---
# Example 2
Input: <text B>
Output: <target B>
---
# Example 3
Input: <text C>
Output: <target C>
---
Now do this for the new input:
Input: <NEW TEXT>
Output:
Why delimiters? Because they keep exemplars separate from instructions and prevent accidental token bleeding.
Selecting exemplars from a dataset — strategies (not magic, but useful)
- Nearest-neighbor (semantic)
- Embed your new input and pick k nearest exemplars. Great for retrieval-augmented few-shot.
- Stratified sampling
- If dataset has classes or typical cases, choose exemplars from each stratum.
- Clustering + centroid exemplars
- Cluster the dataset in embedding space; pick exemplars nearest each cluster centroid.
- Diverse maximin (greedy)
- Pick exemplars that maximize minimum distance from each other (ensures variety).
- Adversarial/Hard-negative mining
- Include examples the model previously got wrong to force learning.
Pseudocode (cheap, effective):
# Given: embeddings for dataset and new_input
k = 3
new_emb = embed(new_input)
# Option A: nearest neighbors
exemplars = top_k_nearest(dataset_embeddings, new_emb, k)
# Option B: diversity + relevance
candidates = top_k_nearest(dataset_embeddings, new_emb, 10)
exemplars = greedy_maximin_select(candidates, k)
Dos and Don’ts (because humans like rules)
Do:
- Use human-reviewed exemplars for critical tasks.
- Standardize output tokens (e.g., JSON, specific keywords) to make parsing reliable.
- Test with multiple exemplar sets; different exemplars produce different behaviors.
Don’t:
- Dump dozens of random examples and hope for the best. Quality over quantity.
- Include conflicting exemplars without clarifying how to resolve conflicts.
- Put crucial grounding facts inside exemplars; they belong in the context block.
Measuring exemplar quality (empirical checks)
- Run A/B tests: compare accuracy, BLEU, ROUGE, or task-specific metrics across exemplar sets.
- Check output consistency: ask the model to reformat the same content multiple times.
- Evaluate calibration: does the model know when it’s unsure? (Confidence tokens or auxiliary scoring helps.)
Quick experiment: hold new inputs constant, swap exemplar sets. If outputs change drastically, your exemplar choice is a dominant variable — iterate.
A small worked example (rewrite informal to professional)
Bad exemplar:
Input: "hey, r u free 2 chat?"
Output: "Are you available to chat?"
Why bad? It’s okay, but too colloquial and identical phrasing across exemplars risks narrow style capture.
Better exemplar set (3-shot):
- Input: "hey, r u free 2 chat?"\nOutput: "Are you available for a brief discussion?"
- Input: "need this report ASAP pls"\nOutput: "Could you please provide the report at your earliest convenience?"
- Input: "thx for ur help!"\nOutput: "Thank you for your assistance."
Pattern: consistent label format, variety in phrasing, slightly formalized vocabulary. Place these in Position 3 with the system context containing tone and length constraints and delimiters around examples.
Closing: Your exemplar selection cheat-sheet
- Start with clarity: one clear input → one clear target.
- Ensure representativeness and purposeful diversity.
- Use embedding retrieval + diversity heuristics for scalable selection.
- Keep exemplars and grounding separate with delimiters and context blocks.
- Test, measure, iterate — and avoid leaning on too few exemplars for critical systems.
Final thought: Exemplars aren’t magic spells. They’re carefully chosen teaching moments. If you craft them like a picky teacher — clear, varied, and representative — the model will learn the lesson. If you rush them, you’ll get a class full of burnt cookies.
Version check: you’re building on one-shot clarity and Position 3 patterning — now your exemplar selection strategy brings the meal together. Go try three different exemplar sets and see which one actually makes the model stop lying to you.
Happy prompting. Bring snacks. Debug with snacks.
Comments (0)
Please sign in to leave a comment.
No comments yet. Be the first to comment!