jypi
  • Explore
ChatWays to LearnMind mapAbout

jypi

  • About Us
  • Our Mission
  • Team
  • Careers

Resources

  • Ways to Learn
  • Mind map
  • Blog
  • Help Center
  • Community Guidelines
  • Contributor Guide

Legal

  • Terms of Service
  • Privacy Policy
  • Cookie Policy
  • Content Policy

Connect

  • Twitter
  • Discord
  • Instagram
  • Contact Us
jypi

© 2026 jypi. All rights reserved.

Generative AI: Prompt Engineering Basics
Chapters

1Foundations of Generative AI

2LLM Behavior and Capabilities

3Core Principles of Prompt Engineering

4Writing Clear, Actionable Instructions

5Roles, Personas, and System Prompts

6Supplying Context and Grounding

7Examples: Zero-, One-, and Few-Shot

8Structuring Outputs and Formats

9Reasoning and Decomposition Techniques

Outline-Then-Detail PatternScratchpad and Notes FieldsRationale-Lite ApproachesSelf-Ask and SubquestioningHypothesis GenerationBack-Solving StrategiesPlan-Then-Execute SplitCompare-and-Contrast PromptsConstraint PropagationUncertainty and Confidence CuesVerification Steps FirstSanity Checks and EstimationSocratic Questioning PromptsEliminating Irrelevant PathsChain-of-Thought Considerations

10Iteration, Testing, and Prompt Debugging

11Evaluation, Metrics, and Quality Control

12Safety, Ethics, and Risk Mitigation

13Tools, Functions, and Agentic Workflows

14Retrieval-Augmented Generation (RAG)

15Multimodal and Advanced Prompt Patterns

Courses/Generative AI: Prompt Engineering Basics/Reasoning and Decomposition Techniques

Reasoning and Decomposition Techniques

26699 views

Elicit better thinking with outline-first strategies, hypothesis testing, and verification-first prompting.

Content

5 of 15

Hypothesis Generation

Hypothesis-Generator: Detective Mode
3740 views
intermediate
humorous
education theory
gpt-5-mini
3740 views

Versions:

Hypothesis-Generator: Detective Mode

Watch & Learn

AI-discovered learning video

Sign in to watch the learning video for this topic.

Sign inSign up free

Start learning for free

Sign up to save progress, unlock study materials, and track your learning.

  • Bookmark content and pick up later
  • AI-generated study materials
  • Flashcards, timelines, and more
  • Progress tracking and certificates

Free to join · No credit card required

Hypothesis Generation — The Detective Work of Prompt Engineering

"You don't need a magic model. You need better guesses."

If Self-Ask and Subquestioning taught you how to interrogate a problem like a polite but relentless lawyer, and Rationale-Lite gave you the economical shorthand for why an answer made sense, then Hypothesis Generation is the moment you become the detective: you generate plausible explanations, rank them, and design small experiments (prompts) to see which one survives interrogation.

This lesson builds on Structuring Outputs and Formats: once you generate hypotheses, you'll want to express them in strict schemas so your model's answers can be parsed, tested, and scored automatically.


Why hypothesis generation matters (and why humans still beat magic)

  • Models can spit plausible-sounding answers. Hypotheses force us to consider alternatives instead of accepting the first shiny thing.
  • Hypotheses make reasoning testable. Instead of "the model said X", you get "Hypothesis A predicts outcome Y; run the test; measure Z."
  • Hypothesis-driven prompts reduce confirmation bias: they make your prompt a little scientific method instead of a wish.

Think of it like debugging code: you don't randomly change lines hoping for the best. You form hypotheses about what might be broken, then run targeted tests. Prompt engineering is the same, but with words.


Types of hypotheses you'll use (quick table)

Type What it looks like When to use
Causal 'If prompt lacks context, model hallucinates' Model gives wrong facts or invents sources
Correlational 'Short prompts tend to return generic answers' You want to decide prompt length tradeoffs
Heuristic 'Asking for steps reduces missing substeps' Designing task decomposition prompts
Edge-case 'Dates near DST confuse the model' Robustness and QA

A practical workflow: From observation to tested hypothesis

  1. Observe
    • Gather failing examples or behaviors (low precision, hallucination, missing steps).
  2. Generate 5 candidate hypotheses (fast, sloppy, creative). Use Rationale-Lite to attach a 1-2 line reason for each.
  3. Prioritize by plausibility and measurability.
  4. Design micro-tests (prompts + output schema) to distinguish hypotheses.
  5. Run tests on batches, parse outputs, score by metrics.
  6. Iterate: refine hypotheses or decompose them into subhypotheses using Self-Ask.

Example: model keeps inventing sources

  1. Observation: answers include fake citations.
  2. Hypotheses:
    • H1: The prompt doesn't request source format (causal).
    • H2: The model hallucinates when the knowledge cutoff isn't specified (heuristic).
    • H3: Asking for 'no made-up sources' is ambiguous and ignored (correlational).
    • H4: Short prompts are missing a constraint token (edge-case).
  3. Tests: design 4 prompts each targeting one hypothesis, keep response schema strict (see below).

Prompt templates for hypothesis generation

  • Quick generator: "List 5 hypotheses for why the model [observed behavior]. For each, give a 1-sentence rationale and a 1-line test you can run."

  • Example prompt you can drop into a model to brainstorm hypotheses:

You saw that model X frequently invents references. Generate 5 possible hypotheses explaining this. For each hypothesis include:
  - Hypothesis: short sentence
  - Rationale (rationale-lite): 1 sentence
  - Test Prompt: one short prompt to run that would confirm or disconfirm this hypothesis
Return as a JSON array of objects.

Note: tie this to a JSON schema (below) for easy parsing and scoring.


Output schema: make hypotheses machine-actionable

You already learned to enforce structure. Here’s a minimal schema you can use when asking the model to generate hypotheses:

[{
  'id': 'H1',
  'hypothesis': 'string',
  'rationale_lite': 'string',
  'test_prompt': 'string',
  'expected_outcome': 'string',
  'priority': 'low|medium|high'
}]

Using a schema means you can automatically run the test_prompt, parse the result, and compute whether expected_outcome occurred. This closes the loop from ideation to evaluation.


How to design tests that actually distinguish hypotheses

  • Keep tests minimal: change only the variable implicated by the hypothesis.
  • Use structured outputs so automated checks are possible. For example, instruct model to return JSON with fields 'sources' (array) and 'confidence' (0-1).
  • Use control prompts: run the same base prompt with and without the hypothesized change.

Example micro-test (pseudo):

Base prompt: Explain topic T and provide up to 3 sources.
Test 1 (H1): Add 'Provide only real sources; if none known, answer "no sources"'.
Compare counts of fabricated sources across runs.

Decomposition & Self-Ask: when a hypothesis is too big

If a hypothesis is broad ("the model hallucinated because of prompt ambiguity"), decompose it:

  • Use Self-Ask to list subquestions that must be true for the hypothesis to hold.
  • Convert subquestions into test prompts.

Example subquestions:

  • Did the prompt include an explicit phrase forbidding invention?
  • Did the model list any sources with URL patterns?
  • Was the question time-bounded (post-cutoff)?

Answer each subquestion with short, structured outputs — Rationale-Lite works excellently here.


Common pitfalls and how to avoid them

  • Confirmation bias: don’t just craft tests that confirm your favorite hypothesis. Design discriminative tests.
  • Overgeneration: many hypotheses are useless. Use priority scoring (impact x ease) to triage.
  • Vagueness: 'because the model is dumb' is not a hypothesis. Make it testable.
  • Schema drift: if the model keeps returning malformed JSON, include schema enforcement and a validator step.

A tiny pseudocode experiment runner

for each hypothesis in hypotheses:
  run test_prompt N times
  parse outputs using schema
  compute metric compare to expected_outcome
  record pass_rate
rank hypotheses by pass_rate vs expected

This is your experimental loop. Repeat, refine, and don't be afraid to throw away hypotheses that don't survive.


Closing: Why this matters for prompt engineering

Hypothesis generation turns prompt work from artisanal guesswork into a repeatable method. When combined with Rationale-Lite (quick why notes), Self-Ask (decompose tests), and strict output schemas (structuring outputs and formats), you get a robust pipeline:

  • Brainstorm plausible causes
  • Attach lightweight rationales
  • Design structured, testable prompts
  • Run, parse, score, and iterate

Final thought: models will always be probabilistic storytellers. Your job is to be a skeptical editor — propose competing stories, choose the most falsifiable, and let data (and the model's behavior) decide. That’s where progress lives.


Key takeaways

  • Generate multiple, testable hypotheses, not just one favored explanation.
  • Use Rationale-Lite so each hypothesis carries a compact justification.
  • Make tests minimal and outputs structured; automate parsing and scoring.
  • Decompose big hypotheses with Self-Ask into concrete subtests.

Go forth like a charmingly cranky detective: make bold guesses, demand proof, and never trust a source without a JSON schema.

Flashcards
Mind Map
Speed Challenge

Comments (0)

Please sign in to leave a comment.

No comments yet. Be the first to comment!

Ready to practice?

Sign up now to study with flashcards, practice questions, and more — and track your progress on this topic.

Study with flashcards, timelines, and more
Earn certificates for completed courses
Bookmark content for later reference
Track your progress across all topics