Generative AI: Prompt Engineering Basics

Chapters

1Foundations of Generative AI

2LLM Behavior and Capabilities

3Core Principles of Prompt Engineering

4Writing Clear, Actionable Instructions

5Roles, Personas, and System Prompts

6Supplying Context and Grounding

7Examples: Zero-, One-, and Few-Shot

8Structuring Outputs and Formats

9Reasoning and Decomposition Techniques

Outline-Then-Detail Pattern Scratchpad and Notes Fields Rationale-Lite Approaches Self-Ask and Subquestioning Hypothesis Generation Back-Solving Strategies Plan-Then-Execute Split Compare-and-Contrast Prompts Constraint Propagation Uncertainty and Confidence Cues Verification Steps First Sanity Checks and Estimation Socratic Questioning Prompts Eliminating Irrelevant Paths Chain-of-Thought Considerations

10Iteration, Testing, and Prompt Debugging

11Evaluation, Metrics, and Quality Control

12Safety, Ethics, and Risk Mitigation

13Tools, Functions, and Agentic Workflows

14Retrieval-Augmented Generation (RAG)

15Multimodal and Advanced Prompt Patterns

Courses/Generative AI: Prompt Engineering Basics/Reasoning and Decomposition Techniques

Reasoning and Decomposition Techniques

26709 views

Elicit better thinking with outline-first strategies, hypothesis testing, and verification-first prompting.

Content

5 of 15

Hypothesis Generation

Hypothesis-Generator: Detective Mode

3740 views

intermediate

humorous

education theory

gpt-5-mini

3740 views

Versions:

Hypothesis-Generator: Detective Mode

Watch & Learn

AI-discovered learning video

Start learning for free

Bookmark content and pick up later
AI-generated study materials
Flashcards, timelines, and more
Progress tracking and certificates

Free to join · No credit card required

Hypothesis Generation — The Detective Work of Prompt Engineering

"You don't need a magic model. You need better guesses."

If Self-Ask and Subquestioning taught you how to interrogate a problem like a polite but relentless lawyer, and Rationale-Lite gave you the economical shorthand for why an answer made sense, then Hypothesis Generation is the moment you become the detective: you generate plausible explanations, rank them, and design small experiments (prompts) to see which one survives interrogation.

This lesson builds on Structuring Outputs and Formats: once you generate hypotheses, you'll want to express them in strict schemas so your model's answers can be parsed, tested, and scored automatically.

Why hypothesis generation matters (and why humans still beat magic)

Models can spit plausible-sounding answers. Hypotheses force us to consider alternatives instead of accepting the first shiny thing.
Hypotheses make reasoning testable. Instead of "the model said X", you get "Hypothesis A predicts outcome Y; run the test; measure Z."
Hypothesis-driven prompts reduce confirmation bias: they make your prompt a little scientific method instead of a wish.

Think of it like debugging code: you don't randomly change lines hoping for the best. You form hypotheses about what might be broken, then run targeted tests. Prompt engineering is the same, but with words.

Types of hypotheses you'll use (quick table)

Type	What it looks like	When to use
Causal	'If prompt lacks context, model hallucinates'	Model gives wrong facts or invents sources
Correlational	'Short prompts tend to return generic answers'	You want to decide prompt length tradeoffs
Heuristic	'Asking for steps reduces missing substeps'	Designing task decomposition prompts
Edge-case	'Dates near DST confuse the model'	Robustness and QA

A practical workflow: From observation to tested hypothesis

Observe
- Gather failing examples or behaviors (low precision, hallucination, missing steps).
Generate 5 candidate hypotheses (fast, sloppy, creative). Use Rationale-Lite to attach a 1-2 line reason for each.
Prioritize by plausibility and measurability.
Design micro-tests (prompts + output schema) to distinguish hypotheses.
Run tests on batches, parse outputs, score by metrics.
Iterate: refine hypotheses or decompose them into subhypotheses using Self-Ask.

Example: model keeps inventing sources

Observation: answers include fake citations.
Hypotheses:
- H1: The prompt doesn't request source format (causal).
- H2: The model hallucinates when the knowledge cutoff isn't specified (heuristic).
- H3: Asking for 'no made-up sources' is ambiguous and ignored (correlational).
- H4: Short prompts are missing a constraint token (edge-case).
Tests: design 4 prompts each targeting one hypothesis, keep response schema strict (see below).

Prompt templates for hypothesis generation

Quick generator: "List 5 hypotheses for why the model [observed behavior]. For each, give a 1-sentence rationale and a 1-line test you can run."
Example prompt you can drop into a model to brainstorm hypotheses:

You saw that model X frequently invents references. Generate 5 possible hypotheses explaining this. For each hypothesis include:
  - Hypothesis: short sentence
  - Rationale (rationale-lite): 1 sentence
  - Test Prompt: one short prompt to run that would confirm or disconfirm this hypothesis
Return as a JSON array of objects.

Note: tie this to a JSON schema (below) for easy parsing and scoring.

Output schema: make hypotheses machine-actionable

You already learned to enforce structure. Here’s a minimal schema you can use when asking the model to generate hypotheses:

[{
  'id': 'H1',
  'hypothesis': 'string',
  'rationale_lite': 'string',
  'test_prompt': 'string',
  'expected_outcome': 'string',
  'priority': 'low|medium|high'
}]

Using a schema means you can automatically run the test_prompt, parse the result, and compute whether expected_outcome occurred. This closes the loop from ideation to evaluation.

How to design tests that actually distinguish hypotheses

Keep tests minimal: change only the variable implicated by the hypothesis.
Use structured outputs so automated checks are possible. For example, instruct model to return JSON with fields 'sources' (array) and 'confidence' (0-1).
Use control prompts: run the same base prompt with and without the hypothesized change.

Example micro-test (pseudo):

Base prompt: Explain topic T and provide up to 3 sources.
Test 1 (H1): Add 'Provide only real sources; if none known, answer "no sources"'.
Compare counts of fabricated sources across runs.

Decomposition & Self-Ask: when a hypothesis is too big

If a hypothesis is broad ("the model hallucinated because of prompt ambiguity"), decompose it:

Use Self-Ask to list subquestions that must be true for the hypothesis to hold.
Convert subquestions into test prompts.

Example subquestions:

Did the prompt include an explicit phrase forbidding invention?
Did the model list any sources with URL patterns?
Was the question time-bounded (post-cutoff)?

Answer each subquestion with short, structured outputs — Rationale-Lite works excellently here.

Common pitfalls and how to avoid them

Confirmation bias: don’t just craft tests that confirm your favorite hypothesis. Design discriminative tests.
Overgeneration: many hypotheses are useless. Use priority scoring (impact x ease) to triage.
Vagueness: 'because the model is dumb' is not a hypothesis. Make it testable.
Schema drift: if the model keeps returning malformed JSON, include schema enforcement and a validator step.

A tiny pseudocode experiment runner

for each hypothesis in hypotheses:
  run test_prompt N times
  parse outputs using schema
  compute metric compare to expected_outcome
  record pass_rate
rank hypotheses by pass_rate vs expected

This is your experimental loop. Repeat, refine, and don't be afraid to throw away hypotheses that don't survive.

Closing: Why this matters for prompt engineering

Hypothesis generation turns prompt work from artisanal guesswork into a repeatable method. When combined with Rationale-Lite (quick why notes), Self-Ask (decompose tests), and strict output schemas (structuring outputs and formats), you get a robust pipeline:

Brainstorm plausible causes
Attach lightweight rationales
Design structured, testable prompts
Run, parse, score, and iterate

Final thought: models will always be probabilistic storytellers. Your job is to be a skeptical editor — propose competing stories, choose the most falsifiable, and let data (and the model's behavior) decide. That’s where progress lives.

Key takeaways

Generate multiple, testable hypotheses, not just one favored explanation.
Use Rationale-Lite so each hypothesis carries a compact justification.
Make tests minimal and outputs structured; automate parsing and scoring.
Decompose big hypotheses with Self-Ask into concrete subtests.

Go forth like a charmingly cranky detective: make bold guesses, demand proof, and never trust a source without a JSON schema.

Flashcards

Mind Map

Speed Challenge

Comments (0)

Please sign in to leave a comment.

No comments yet. Be the first to comment!

Ready to practice?

Study with flashcards, timelines, and more

Earn certificates for completed courses

Bookmark content for later reference

Track your progress across all topics