Generative AI: Prompt Engineering Basics

Chapters

1Foundations of Generative AI

2LLM Behavior and Capabilities

3Core Principles of Prompt Engineering

4Writing Clear, Actionable Instructions

5Roles, Personas, and System Prompts

6Supplying Context and Grounding

7Examples: Zero-, One-, and Few-Shot

8Structuring Outputs and Formats

9Reasoning and Decomposition Techniques

10Iteration, Testing, and Prompt Debugging

11Evaluation, Metrics, and Quality Control

12Safety, Ethics, and Risk Mitigation

13Tools, Functions, and Agentic Workflows

14Retrieval-Augmented Generation (RAG)

RAG Concepts and Benefits Embeddings and Vectorization Indexing and Chunking Tactics Query Construction Prompts Re-Ranking and Fusion Citation and Attribution Formats Freshness and Recency Strategies Reducing Hallucinations with RAG Hybrid Sparse–Dense Search Context Compression Techniques Budget-Aware Retrieval RAG Evaluation Methods Answer–Source Separation Dynamic Routing and Switching Vector Store Hygiene

15Multimodal and Advanced Prompt Patterns

Courses/Generative AI: Prompt Engineering Basics/Retrieval-Augmented Generation (RAG)

Retrieval-Augmented Generation (RAG)

23136 views

Combine prompts with retrieval to ground answers in external knowledge, improving accuracy and traceability.

Content

5 of 15

Re-Ranking and Fusion

RAG ReRank & Fusion — Chaotic Good TA

570 views

intermediate

humorous

machine-learning

gpt-5-mini

570 views

Versions:

RAG ReRank & Fusion — Chaotic Good TA

Watch & Learn

AI-discovered learning video

Start learning for free

Bookmark content and pick up later
AI-generated study materials
Flashcards, timelines, and more
Progress tracking and certificates

Free to join · No credit card required

Re-Ranking and Fusion in RAG — The Chaotic Good Guide

"Think of retrieval like speed-dating for knowledge. Re-ranking is where you decide which of the suitors actually deserve a second date. Fusion is the awkward montage where you try to stitch together all the good parts without sounding insane."

You're already comfortable with query-construction tricks (Position 4) and chunking/indexing tactics (Position 3). Great — we won't waste time on the basics. Instead, we'll take those well-formed queries and well-chunked indexes and show how to make retrieval actually useful: re-ranking the hits and fusing them into a coherent, faithful, and useful answer. We'll also fold in the previous topic (tools, functions, planner–executor patterns) so your pipeline is not only smart, but debuggable and manageable.

Why re-rank & fuse? (aka the problem statement)

Raw retrieval (BM25/k-NN embeddings) gives you candidates — often noisy, partially relevant, or overlapping.
A generator left to its own devices will either ignore relevant documents or hallucinate from tangential ones.

Re-ranking weeds the pile for the highest-quality evidence. Fusion assembles the evidence into an answer that maximizes useful information while minimizing contradictions and hallucination.

Re-Ranking: The Gatekeeper

What it is: Re-ranking takes an initial candidate set R (from BM25 or vector search) and re-orders or re-scores them using a finer-grained model (usually a cross-encoder or a stronger bi-encoder).

Common techniques:

Lexical re-rankers: BM25 or TF-IDF refinement (fast; baselines).
Dense re-rankers: dot-product of stronger embeddings (Faiss/Annoy/KNN refinements).
Cross-encoder re-rankers: feed (query, doc) pairs to a transformer that outputs a relevance score — slow but high fidelity.
Learning-to-rank ensembles: combine features (BM25 score, embedding similarity, doc recency, citations) into a learned model.

When to use what

Method	Speed	Accuracy	Cost	Use when...
BM25	Very fast	Low-medium	Cheap	small index; keyword-heavy queries
Dense bi-encoder	Fast	Medium	Moderate	semantic matches, many queries
Cross-encoder	Slow	High	Expensive	few candidates (<100), need precision
Hybrid (BM25 + cross)	Medium	High	Moderate-High	practical production tradeoff

Example re-ranker prompt / objective

Code-style pseudocode for a cross-encoder re-ranker:

function cross_rank(query, candidates):
  scores = []
  for doc in candidates:
    // model returns scalar relevance score
    score = cross_encoder.score(concat(query, "\n--\n", doc.text))
    scores.append((doc, score))
  return sort_descending(scores)

Question to ask yourself: Do I care more about precision (top-1 quality) or throughput? If precision, favor cross-encoders.

Fusion: The Art of Not Being Dumb with Good Docs

Fusion means combining multiple documents into what the generator will use. There are two broad families:

Early Fusion: merge text chunks into a single prompt/context before generation (concatenation, summarization).
Late Fusion: generate answers from individual chunks (or subsets) and then aggregate (voting, scoring, final synthesis).

Notable patterns

Concatenate (naive): just glue top-k into the prompt. Simple, but context length and contradictions bite you.
Extract-and-Consolidate: extract facts from each doc (or ask a model to summarize each), then synthesize those summaries.
Fusion-in-Decoder (FiD): encode each doc independently, pass encoded representations to the decoder so it can attend cross-doc — higher-quality but requires architecture support.
RAG-Sequence vs RAG-Token:
- RAG-Sequence: generates sequences conditioned on individual retrieved docs and then merges candidate outputs.
- RAG-Token: fuses at the token level — the model considers all documents while generating each token (more coherent, more compute).

Late fusion strategies (practical)

Voting: generate answers per doc, pick most common answer.
Score-weighted merge: weight each doc's contribution by re-ranker score, then synthesize.
Fact extraction + aggregator: extract structured facts (triples), then render them into prose.

Putting it together: A Planner–Executor Pipeline (with tools)

You used planner–executor before for tools. Do the same here.

Planner (tool): builds a retrieval plan — which indices, query rewrites, k candidates per index.
Retriever (executor tool): runs BM25 and dense search in parallel.
Re-Ranker (tool/function): a cross-encoder or learned ranker reorders the combined set.
Fusion module (function): chooses strategy (FiD/concat/extract+merge) and prepares inputs for the generator.
Generator (LLM): produces the final answer. Optionally call a citation function to attach sources.
Observability tool: logs scores, chosen docs, hallucination flags.

Pseudocode:

plan = planner.create(query)
candidates = retriever.search(plan)
ranked = reranker.rank(query, candidates)
fused_input = fusion.prepare(ranked.top_k)
answer = generator.generate(fused_input)
logger.log({query, ranked.top_k_ids, scores, answer})

Tips: make each piece a callable function (tool) so you can instrument errors, timeouts, and retries. If the re-ranker times out, fall back to a faster ranker — graceful degradation.

Evaluation: How do you measure success?

Retrieval metrics: Recall@k, MRR — are the true evidence docs in the candidates?
Re-ranker metrics: NDCG, MAP — is ordering improved?
Generation metrics: ROUGE/BLEU (weak for open answers), factuality checks, hallucination rate (automatic fact checks), citation precision.
Human eval: faithfulness, helpfulness, concision.

Quick practical checklist (copy-paste for your next sprint)

Build initial retriever with BM25 + dense embeddings.
Add a cross-encoder re-ranker for top-100 candidates.
Choose fusion: FiD if you can, else extract-and-consolidate.
Instrument every tool call for latency, failures, and selected docs.
Add fallback rules (e.g., if re-ranker fails, use BM25 top-k).
Track recall@k and hallucination metrics each deployment.

Final notes & spicy thoughts

Re-ranking is the hill where you win or lose precision. If your top-3 are garbage, the generator will be glamorous garbage. Invest in a re-ranker.
Fusion is the art of making multiple truths sing in harmony without producing choir-of-lies. Structured extraction + careful weighting is often your best friend.
Treat re-ranking and fusion as independent modules (tools) you can A/B and observe. The planner–executor pattern you learned earlier fits beautifully here.

"Good retrieval gets you the sources; clever re-ranking picks the right ones; thoughtful fusion makes the model tell you the truth in a way that doesn’t make you want to cry into your keyboard."

Want a cheat-sheet prompt for testing a re-ranker? Try this

System: You are a relevance scorer. Score how well the document answers the user query.
User: [QUERY]
Document: [DOC]
Assistant: Provide a numeric score 0-100 and a short justification (1-2 lines).

Use that output to debug misrankings: where is your re-ranker overconfident? Underconfident? Fix by adding features or augmenting training data.

Summary: Re-rank to get the right evidence. Fuse to make that evidence readable, accurate, and concise. Wrap them as tools in your planner–executor pipeline, instrument everything, and always have fallbacks. Now go make search-stories that don't lie to people.

Flashcards

Mind Map

Speed Challenge

Comments (0)

Please sign in to leave a comment.

No comments yet. Be the first to comment!

Ready to practice?

Study with flashcards, timelines, and more

Earn certificates for completed courses

Bookmark content for later reference

Track your progress across all topics