Generative AI: Prompt Engineering Basics

Chapters

1Foundations of Generative AI

2LLM Behavior and Capabilities

3Core Principles of Prompt Engineering

4Writing Clear, Actionable Instructions

5Roles, Personas, and System Prompts

6Supplying Context and Grounding

Curating Background Information Injecting Data Snippets Grounding With Sources Retrieval Summaries in Prompts Citing and Linking Evidence Planning Context Budgets Chunking and Windowing Pinning Critical Facts Canonical Source Selection Structured Context Blocks Delimiters and Separators Unknowns and Clarification Triggers Session Memory Strategies Preventing Context Leakage Updating Stale Context

7Examples: Zero-, One-, and Few-Shot

8Structuring Outputs and Formats

9Reasoning and Decomposition Techniques

10Iteration, Testing, and Prompt Debugging

11Evaluation, Metrics, and Quality Control

12Safety, Ethics, and Risk Mitigation

13Tools, Functions, and Agentic Workflows

14Retrieval-Augmented Generation (RAG)

15Multimodal and Advanced Prompt Patterns

Courses/Generative AI: Prompt Engineering Basics/Supplying Context and Grounding

Supplying Context and Grounding

27295 views

Feed the model the right facts at the right time using structured context blocks, delimiters, and source pinning.

Content

6 of 15

Planning Context Budgets

Context Budgeting but Make It Practical

4229 views

intermediate

humorous

education theory

science

gpt-5-mini

4229 views

Versions:

Context Budgeting but Make It Practical

Watch & Learn

AI-discovered learning video

Start learning for free

Bookmark content and pick up later
AI-generated study materials
Flashcards, timelines, and more
Progress tracking and certificates

Free to join · No credit card required

Planning Context Budgets — The Art of Feeding the Beast Without Starving the Brain

"If context were calories, you'd be trying to feed a marathoner with a cupcake and a sticky note." — Your inner prompt engineer, probably drunk on tokens

You're already familiar with retrieval summaries and citing/linking evidence, and we've seen how roles, personas, and system prompts can steer the model's behavior. Now we need to get pragmatic: what actually goes into the context window, why, and how to pick the bits that matter when tokens are limited, latency matters, or costs start looking like a bad dinner tab. This is Planning Context Budgets: choosing, compressing, and allocating the precious real estate of your prompt so the model produces useful, grounded output.

Why a context budget is a thing (and why you care)

Token limits are real: LLMs have finite context windows and tokens cost money. You cannot dump the entire Internet into every prompt.
Relevance beats volume: More text isn't always better; irrelevant context often creates noise and hallucination risk.
Latency and UX: Large contexts slow things down and increase user wait time. Your users want answers, not a loading spinner named Regret.

Think of it like packing a carry-on for a week: prioritize essentials, compress bulky stuff, and pick outfits that mix-and-match.

Quick recap: where this sits in the pipeline

Retrieval summaries: you already use them to condense retrieved docs into a succinct digest. Those summaries should be part of your context budget.
Citing/linking evidence: when you include sources, you have to decide which sources to include verbatim, which to summarize, and which to only cite by reference.
Roles/personas/system prompts: decide which persona-level constraints and priorities live in the system layer (cheap, persistent tokens) vs the upfront prompt. Use system instructions to offload constant expectations.

Core principles for planning context budgets (aka the commandments)

Prioritize by use-case impact. If a piece of context changes the answer, include it. If it only mildly colors phrasing, summarize or omit.
Compress aggressively. Summaries, bullet points, structured metadata — all reduce token cost while preserving signal.
Segment context by permanence. Put stable instructions (tone, role, safety rules) in system prompts. Put dynamic facts (user state, recent retrievals) in the request context.
Score and select. Rank candidate documents by relevance, recency, and trustworthiness; include top-k until budget is reached.
Fallback to summaries. When docs exceed budget, include short summaries and explicit citations rather than the full text.
Be explicit about process. Tell the model which sections are authoritative, which are optional, and where to look first.

A practical workflow: Plan, Score, Allocate, Execute

Plan: determine the total token budget for context (B). Subtract the estimated token needs for system persona, user query, and required output format. Remaining is the allocable budget.
Score: for each retrieved item compute a relevance score combining: topicality (semantic similarity), recency, trust/source quality, and length penalty.
Allocate: include full text for top items until you hit a threshold (e.g., 60% of allocable budget). Convert less-critical items into summaries or metadata until you fill the budget.
Execute: build the final prompt with clear section markers and role/system instructions to guide prioritization.

Pseudocode for selection

budget = B - (system_tokens + user_query_tokens + expected_answer_tokens)
ranked_docs = sort(docs, by=relevance_score)
selected = []
used = 0
for doc in ranked_docs:
  if doc.tokens + used <= budget * full_text_fraction:
    selected.append({type: 'full', doc: doc})
    used += doc.tokens
  else:
    summary = summarize(doc)
    selected.append({type: 'summary', doc: summary})
    used += summary.tokens
  if used >= budget:
    break
return build_prompt(selected)

Replace summarize(doc) with your retrieval-summary pipeline that compresses content into key claims and citations.

Concrete prompt structure (example)

Use explicit labels so the model knows what's priority:

System prompt: persona, constraints, output format, and a one-line instruction: 'Prioritize information from Section A over B.'
Section A: High-priority documents (full text or long summaries)
Section B: Supporting data (shorter summaries, metadata)
Section C: Links and citations only (for traceability)
User query: the actual task
Tools or memory: optional

Example layout (markdown in the prompt):

SYSTEM: You are an expert summarizer. Follow the priority order below. Output must include citations.

SECTION A - HIGH PRIORITY (full text up to X tokens)

Document 1: ...

SECTION B - SUMMARIES

Doc 4 summary (source: url)

USER QUERY: ...

How to choose between full docs, summaries, and citations

When to include	Strategy	Why it works
High-confidence, decisive evidence	Full text	Gives model raw facts and reduces summarization error
Supporting or long docs	Summarize key claims + citations	Saves tokens while preserving signal
Many but low-relevance hits	List of citations or metadata	Allows model to ask to fetch more when needed

Tricks and advanced tactics

Dynamic budget shifting: if the model requests a deeper dive, allow an on-demand retrieval step and append relevant doc excerpts to a follow-up prompt.
Chunking + sliding window: for long sources, chunk by section headings and include only the chunks with the highest similarity to the query.
Role-based weighting: system prompt can instruct the model to prioritize certain sources or types of evidence (e.g., peer-reviewed over blogs).
Progressive summarization: first-level summaries condensed into second-level ultra-summaries if tokens are scarce.
Cache golden summaries: for frequently retrieved documents, store compact summaries so you don't pay the summarization cost every time.

Example micro-case: customer support agent

Scenario: A user asks why their bill suddenly increased. You have 20 documents: account history, service change logs, system outage notices, pricing policy, and a support chat.

Put account history and billing adjustments in Section A (full or near-full).
Summarize pricing policy and outage notices in Section B.
Include a Section C list of raw transcripts and logs with timestamps for traceability.

This avoids burying the agent in entire support transcripts while preserving the facts that change the user's bill.

Closing — TL;DR and parting wisdom

Plan the budget first, then fill it. Don't blindly shove everything into the window.
Prioritize impact, compress ruthlessly, and use system prompts to offload permanence.
Make the model's life easier by labeling priorities. Explicit structure leads to better, more grounded answers.

Final thought: a good context budget is like a good playlist — curated, purposeful, and leaves room for the encore. If your model keeps hallucinating, it's probably starving for the right tracks.

Versioning: store your budget strategies alongside prompt templates so you can iterate. The model isn't a magician — it's a very fancy parrot whose attention you must manage.

Flashcards

Mind Map

Speed Challenge

Comments (0)

Please sign in to leave a comment.

No comments yet. Be the first to comment!

Ready to practice?

Study with flashcards, timelines, and more

Earn certificates for completed courses

Bookmark content for later reference

Track your progress across all topics