Generative AI: Prompt Engineering Basics

Chapters

1Foundations of Generative AI

2LLM Behavior and Capabilities

3Core Principles of Prompt Engineering

4Writing Clear, Actionable Instructions

5Roles, Personas, and System Prompts

6Supplying Context and Grounding

7Examples: Zero-, One-, and Few-Shot

8Structuring Outputs and Formats

9Reasoning and Decomposition Techniques

10Iteration, Testing, and Prompt Debugging

11Evaluation, Metrics, and Quality Control

12Safety, Ethics, and Risk Mitigation

13Tools, Functions, and Agentic Workflows

14Retrieval-Augmented Generation (RAG)

RAG Concepts and Benefits Embeddings and Vectorization Indexing and Chunking Tactics Query Construction Prompts Re-Ranking and Fusion Citation and Attribution Formats Freshness and Recency Strategies Reducing Hallucinations with RAG Hybrid Sparse–Dense Search Context Compression Techniques Budget-Aware Retrieval RAG Evaluation Methods Answer–Source Separation Dynamic Routing and Switching Vector Store Hygiene

15Multimodal and Advanced Prompt Patterns

Courses/Generative AI: Prompt Engineering Basics/Retrieval-Augmented Generation (RAG)

Retrieval-Augmented Generation (RAG)

23136 views

Combine prompts with retrieval to ground answers in external knowledge, improving accuracy and traceability.

Content

3 of 15

Indexing and Chunking Tactics

Chunking: Controlled Chaos (Planner → Executor Edition)

7843 views

intermediate

humorous

information-retrieval

gpt-5-mini

7843 views

Versions:

Chunking: Controlled Chaos (Planner → Executor Edition)

Watch & Learn

AI-discovered learning video

Start learning for free

Bookmark content and pick up later
AI-generated study materials
Flashcards, timelines, and more
Progress tracking and certificates

Free to join · No credit card required

Indexing and Chunking Tactics — The Art of Turning a Novel into Useful Googleable Snacks

"You can give a model the entire internet, but if it can’t find the right sentence, it’s still guessing." — Probably me, at 2 a.m.

You already know the basics of RAG (we covered concepts and benefits) and how embeddings vectorize meaning (that lovely chapter we just did). Now we’re entering the pragmatic, slightly messy, very important realm: how to slice your documents and how to store those slices so retrieval is fast, accurate, and cheap. This is the stage where retrieval meets engineering discipline — and where many projects quietly explode or quietly succeed.

Why indexing + chunking matter (without the fluff)

Embeddings live at the chunk level. If chunks are garbage, embeddings are garbage.
Chunking controls precision vs recall: big chunks = more context but fuzzier retrieval; small chunks = precise hits but more noise and more storage.
Index structure affects latency, memory, and the scalability of your RAG system.

(Tip: if you read the previous section on Tools, Functions, and Agentic Workflows, think of chunking/indexing as the planner’s strategy choice — the executor actually runs the splits, embeds, upserts, and monitors outcomes.)

Core tactics: chunking strategies

1) Semantic chunking (preferred when possible)

What: Split by logical units — paragraphs, sections, headings, code blocks.
Why: Keeps semantically coherent pieces, so an embedding represents a single idea.
When not to use: Rare docs without clear structure (e.g., raw logs).

2) Fixed-size chunking (token-based)

What: Split into N-token chunks (e.g., 200–500 tokens).
Why: Predictable embedding sizes and costs; aligns with model token limits.
Drawback: May split ideas mid-sentence.

3) Sliding windows / overlap

Add 10–30% overlap between chunks to preserve context across boundaries.
Helps when the answer spans a boundary; costs more space but increases retrieval recall.

4) Hybrid: headings + truncation

Use headings to create semantic chunks, but if a heading block is huge, break it by tokens.

Heuristics: how big should a chunk be?

Short content (FAQs): 50–150 tokens
Documentation & articles: 150–400 tokens
Books or long reports: 300–800 tokens with overlap

Table: Chunk size trade-offs

Chunk size	Pros	Cons
Small (50–150 tokens)	Precise retrieval, cheap to re-rank	More vectors, potential missing context
Medium (150–400 tokens)	Good balance of context and precision	Moderate storage & compute
Large (400–800 tokens)	Lots of context in one hit	Lower precision, higher cost

Indexing architectures — which engine for which vibe

Flat vector (brute force): simple, great for small corpora, predictable recall.
HNSW (Hierarchical Navigable Small World): excellent latency & recall for medium/large datasets.
IVF + PQ (Inverted File + Product Quantization): efficient for huge datasets where memory matters.
Managed vector DBs (Pinecone, Weaviate, Milvus, Qdrant): add metadata filters, multi-tenancy, and durability.

Practical rule: start simple (FAISS or managed DB) and optimize when you hit latency or cost problems.

Metadata, filtering, and hybrid retrieval

Always store and index metadata with each chunk: doc_id, section, timestamp, source_url, confidentiality flags.
Use metadata filters for precision (e.g., only fetch from docs last updated < 2024 or only from internal manuals).
Hybrid retrieval: combine lexical search (BM25) + vector search for cases where exact phrase matching matters (e.g., legal citations, code).

Example flow:

Run BM25 to get candidates (good for exact matches).
Run vector search to get semantic candidates.
Union/rerank candidates using a cross-encoder or scoring heuristic.

Practical pipeline (planner → executor) with functions & error handling

Planner: decides split strategy, chooses index, selects embedding model, sets metadata.
Executor: runs splitting, embeds chunks, upserts to vector DB, logs metrics, retries errors.

Pseudocode (Python-style):

# Planner
strategy = choose_chunking_strategy(doc)
chunks = split_doc(doc, strategy)
metadata = build_metadata(doc)

# Executor (robust)
try:
    embeddings = embed_batch(chunks)
    upsert_to_index(chunks, embeddings, metadata)
except TransientError as e:
    retry(upsert_to_index, attempts=3)
except Exception as e:
    log_error(e, context=doc.id)
    alert_oncall(e)

Observability best practices: track embedding time, upsert latency, index size, retrieval latency, recall@k, and MRR. Embed hashing and index versioning make debugging reproducible.

Incremental indexing, re-embedding, and lifecycle

Upserts vs full rebuilds: upserts are faster, but schema or embedding model changes often require a rebuild.
Version your embeddings and index schema: store embedding_model_id with each vector.
Re-embed only changed documents where possible.

When to re-embed:

You change the embedding model
You adjust chunking strategy significantly
You change tokenization behavior

Privacy, PII, and legal hygiene

Remove or redact PII before indexing, or mark chunks as restricted and apply filters.
Track data provenance in metadata for audits.
Embed hashes of original text for integrity checks (don’t store raw PII when you can avoid it).

Quick checklist (because you’ll forget one of these)

Decide semantic vs token chunking
Use 10–30% overlap for boundary sensitivity
Store rich metadata and model ids
Prefer HNSW for moderate scale; IVF+PQ for massive scale
Implement hybrid lexical + semantic retrieval where needed
Version your index and embeddings
Monitor recall@k, latency, and index growth
Redact or flag PII

Final Mic Drop / TL;DR

Chunking is the design decision that governs everything: recall, precision, cost, and how often your system says something confidently wrong. Treat chunking and indexing as product features, not afterthoughts. Use the planner–executor pattern from our Tools & Functions section: the planner picks the strategy, the executor runs robust, observable jobs. Start with semantic chunks + medium size + 10–20% overlap, store metadata, and iterate.

Go forth and slice responsibly. Your users will thank you. (Possibly with bug reports — but fewer ones if you follow this guide.)

Flashcards

Mind Map

Speed Challenge

Comments (0)

Please sign in to leave a comment.

No comments yet. Be the first to comment!

Ready to practice?

Study with flashcards, timelines, and more

Earn certificates for completed courses

Bookmark content for later reference

Track your progress across all topics