Generative AI: Prompt Engineering Basics

Chapters

1Foundations of Generative AI

What Is Generative AI AI vs ML vs Deep Learning Transformer Architecture Primer Tokens and Tokenization Probabilities and Next-Token Prediction Temperature and Top-p Sampling Context Window and Limits Prompt–Response Loop System, Developer, and User Messages Capabilities and Limitations Hallucinations and Uncertainty Determinism vs Stochasticity Safety Layers and Moderation Evaluation Mindset from Day One Useful Mental Models of LLMs

2LLM Behavior and Capabilities

3Core Principles of Prompt Engineering

4Writing Clear, Actionable Instructions

5Roles, Personas, and System Prompts

6Supplying Context and Grounding

7Examples: Zero-, One-, and Few-Shot

8Structuring Outputs and Formats

9Reasoning and Decomposition Techniques

10Iteration, Testing, and Prompt Debugging

11Evaluation, Metrics, and Quality Control

12Safety, Ethics, and Risk Mitigation

13Tools, Functions, and Agentic Workflows

14Retrieval-Augmented Generation (RAG)

15Multimodal and Advanced Prompt Patterns

Courses/Generative AI: Prompt Engineering Basics/Foundations of Generative AI

Foundations of Generative AI

21730 views

Establish how modern LLMs generate text, the role of tokens and probabilities, and the constraints that shape prompt behavior.

Content

6 of 15

Temperature and Top-p Sampling

Temperature and Top-p Unleashed — Sassy, Practical, Clear

2312 views

beginner

humorous

science

gpt-5-mini

2312 views

Versions:

Temperature and Top-p Unleashed — Sassy, Practical, Clear

Watch & Learn

AI-discovered learning video

Start learning for free

Bookmark content and pick up later
AI-generated study materials
Flashcards, timelines, and more
Progress tracking and certificates

Free to join · No credit card required

Temperature and Top-p Sampling — The Flavor Knobs of Creativity

"The model already knows what comes next. Your job is to choose how adventurous its guess should be."

You already know from the previous sections how transformers predict the next token by assigning probabilities over the vocabulary and how tokens are the atomic pieces of text. Now we get to the fun part: how we sample from that probability cocktail to produce text. Two of the most important controls are temperature and top-p (nucleus) sampling. These are the dials that decide whether the model plays it safe or goes rogue with poetic chaos.

Why this matters (quick reminder)

When we talk about next-token prediction, the model calculates a score (logit) for every possible token, converts those scores into probabilities via softmax, and then we pick a token. How we pick it changes the entire personality of the output. Deterministic picks give boring but stable outputs; stochastic picks add variety and creativity.

Think of it this way: the model hands you a ranked list of plausible next tokens with probabilities. Temperature and top-p are different policies that decide how likely we are to choose risky options from that list.

Temperature: the randomness knob

What it is: A scalar T > 0 applied to logits before softmax. Mathematically, p_i = softmax(z_i / T), where z_i are logits.
Intuition: Lower T concentrates probability on the highest-scoring tokens (less random). Higher T flattens the distribution, making rare tokens more likely (more random).

Analogy: Temperature is like a thermostat for your writing style. At T = 0.1 the model behaves like a stoic librarian; at T = 1.5 it behaves like a caffeinated poet who just discovered metaphors.

Examples:

T -> 0: argmax sampling, almost deterministic, repetitive
T = 0.7: balanced creativity (common default)
T >= 1.2: high creativity, risk of nonsense or contradictions

Pro tip: temperature is applied to logits, not to probabilities. Practically you divide logits by T, then run softmax, then sample.

Top-p (nucleus) sampling: the selective crowd

What it is: Sort tokens by probability. Take the smallest set of tokens whose cumulative probability >= p. Then normalize those token probabilities and sample from them.
Intuition: Instead of picking a fixed number of top tokens, top-p picks a dynamic, probability-mass-based window. It keeps enough tokens to cover p of the distribution and ignores the long tail.

Analogy: Imagine picking songs for a party. Top-k is like saying "only play the top 10 hits." Top-p is like saying "play whatever songs account for 90% of my guests' favorite-list votes." Sometimes that set is 3 songs, sometimes 20 — flexible and context-aware.

Why this is handy: it avoids both the tyranny of tiny top-k lists and the chaos of sampling from the full tail. It adapts to how peaky or flat the model's distribution is.

Temperature vs Top-p vs Top-k: quick table

Feature	Temperature	Top-p	Top-k
Controls randomness	Yes, smooth control	Indirectly, by limiting tail	Hard cutoff on rank
Adaptive to distribution	No	Yes	No
Risk of nonsensical tokens	Increases with T	Controlled by p	Controlled by k
Typical use case	Fine-tuning creativity	Stable randomness with context	Simple, fast control

How they work together (and in what order)

Typical pipeline:

Compute logits from the model for each token.
Optionally divide logits by temperature T.
Convert to probabilities via softmax.
Apply top-p: determine nucleus set and re-normalize.
Sample one token from the resulting distribution.

You can combine temperature and top-p. A common pattern: use T around 0.6-1.0 and p around 0.8-0.95.

Pseudo-code:

# logits: vector of shape vocab_size
scaled_logits = logits / T
probs = softmax(scaled_logits)
# select nucleus
sorted_tokens = sort_by_prob_descending(probs)
cumulative = 0
nucleus = []
for token in sorted_tokens:
    nucleus.append(token)
    cumulative += probs[token]
    if cumulative >= p:
        break
nucleus_probs = renormalize(probs over nucleus)
next_token = sample_from(nucleus_probs)

Practical tips and defaults

Start with T = 0.7 and p = 0.9 for creative writing.
For factual or instruction-following outputs, lower T (0.0 to 0.5) and/or use greedy decoding.
If you see repetition or looped phrases, reduce T or use nucleus sampling with a tighter p.
If the model produces awkward subword artifacts, that can come from high T interacting with tokenization. Subword tokens multiplied make gibberish more likely at high T.

Questions to ask yourself:

Do I value correctness or diversity more in this prompt?
Is the task short and precise, or open-ended and creative?

Common failure modes

High temperature + high p -> plausible sounding but factually wrong hallucinations.
Low temperature + strict top-p -> overly conservative or repetitive outputs.
Tokenization artifacts: sampling at token-level may break words if you land on rare subword tokens. If that matters, consider controlling at higher-level constraints or using beam search for strict coherence.

Quick decision guide

Need precise, reliable answer? Use low T or greedy decoding.
Want variety but coherent text? Use moderate T (0.6-0.9) with p = 0.8-0.95.
Want maximum creativity and flavor? Increase T, maybe relax p.

Closing: the creative control panel

Temperature and top-p are not mystical hacks. They are interpretable levers that trade off confidence vs creativity. They sit on top of the probabilities you learned about in the next-token section and operate at the token level shaped by tokenization.

Smart takeaway: treat temperature as the "adventurousness" knob and top-p as the "only let the sensible crowd speak" filter. Use them together to craft model behavior that fits your goal: conservative when you need correctness, playful when you need ideas.

Key takeaways:

Temperature scales logits and directly adjusts randomness.
Top-p dynamically selects a probability mass nucleus to sample from.
Combining both gives flexible, controllable outputs.

Go test it: change T a little, nudge p a fraction, and watch the same model transform from a reliable assistant to a freewheeling novelist. That tiny twist is where prompt engineering turns into an art form.

Tags: beginner, humorous, science

Flashcards

Mind Map

Speed Challenge

Comments (0)

Please sign in to leave a comment.

No comments yet. Be the first to comment!

Ready to practice?

Study with flashcards, timelines, and more

Earn certificates for completed courses

Bookmark content for later reference

Track your progress across all topics