Structuring Outputs and Formats
Specify output schemas, enforce structure, and design responses for easy parsing, scoring, and downstream use.
Content
JSON and Schema Enforcement
Versions:
Watch & Learn
AI-discovered learning video
Sign in to watch the learning video for this topic.
JSON and Schema Enforcement — Make Models Behave Like Responsible Data Citizens
"If the model could speak in structured data only, life would be so much easier." — Your inner engineer, finally pleased
You already learned how to make outputs pretty with headings and sections and how to bend models to your will with bullets, lists, and outlines. Now we go from "pretty" to "precise": how to make a model output valid JSON that your app can actually parse without needing a therapist.
This lesson builds on the earlier discussion of example-based steering (zero-, one-, and few-shot). We'll use demonstrations to teach the model how to pack its output into a schema-shaped box, and we'll show strategies for validating and enforcing that schema.
Why care about JSON + schemas?
- Reliability: Your downstream code breaks less when inputs adhere to a predictable structure.
- Safety: You can limit allowed keys and types to reduce hallucinated junk.
- Automation: Clean data lets you wire outputs straight into databases, UIs, or APIs.
Imagine requiring a date but getting "sometime next year". Cute, but unhelpful. Schemas make the model choose the right words (or say "I don’t know").
Anatomy of a minimal schema (informal)
Think of a schema like a top-down shopping list for the model:
- Fields (names): what keys must appear, e.g.,
id,title,tags. - Types: string, integer, boolean, array, object.
- Required: which fields are mandatory.
- Enums/Formats: limit values (e.g., status: "open" | "closed") or formats (e.g.,
date-time).
Tiny example (human reading):
{
"id": "string (uuid)",
"title": "string",
"priority": "integer 1..5",
"tags": "array of strings",
"created_at": "ISO 8601 date-time"
}
You can formalize that with JSON Schema when you need machine-checkable rules.
Prompt patterns to enforce JSON
Here are practical prompting strategies. We assume you understand headings/lists and occasional example-driven steering from the previous topics.
- Explicit contract (zero-shot) — tell the model exactly what to output:
Output ONLY valid JSON that matches this schema. Do not add any extra text or explanation.
Schema:
- id (string, uuid)
- title (string)
- priority (integer, 1-5)
- tags (array of strings)
- created_at (string, ISO 8601)
If you cannot provide a field, set it to null and include "error" with a short message.
Useful when you want the model to try without examples.
- One-shot — show a single exemplar JSON and then ask for new data in the same shape:
Example:
{"id":"123e4567-e89b-12d3-a456-426614174000","title":"Fix bug","priority":3,"tags":["backend","urgent"],"created_at":"2025-03-09T12:00:00Z"}
Now produce JSON for: "Write a new task: Document API".
Works well when the model needs a nudge for formatting, quoting, or ordering.
- Few-shot — provide multiple, varied examples to reduce odd edge behaviors:
- Example 1: all fields present
- Example 2: some nulls
- Example 3: enum variation
Then ask for the new JSON. Few-shot helps with corner cases and teaches the model how to express errors, missing data, or optional fields.
Demonstrations: what goes wrong and how to fix it
Problem 1 — Extra commentary
Model output: { "id": "..." } Here's more info...
Fix: "Return only JSON. If you have anything else to say, put it in the error field or output a separate JSON object meta: { message: ... }."
Problem 2 — Wrong types
Model outputs numbers as strings or dates in natural language.
Fix: Give strict format examples and show one or two cases where a wrong type is corrected. Use an explicit rule: "If you cannot format a field, set it to null and populate error." Few-shot works especially well here.
Problem 3 — Hallucinated keys
Model invents priority_rating when you asked for priority.
Fix: Provide a whitelist: "Allowed keys: id, title, priority, tags, created_at. Any other key will be removed. If you think another field is needed, map it into meta." Also validate server-side.
Validation loop — catch and correct
You should always validate model output with a schema validator. Example pseudocode (Python-ish):
from jsonschema import validate, ValidationError
schema = {...} # your JSON Schema
try:
validate(instance=model_output, schema=schema)
except ValidationError as e:
# Option A: ask model to reformat (round-trip)
# Option B: return an error to the user
Option A (auto-repair): send the invalid output back with a prompt like:
The JSON you returned failed validation: <error summary>. Please return a corrected JSON that conforms exactly to the schema. Output only JSON.
This is powerful, but beware of loops — always add a max retry limit.
Table: Strategies at a glance
| Strategy | When to use | Pros | Cons |
|---|---|---|---|
| Zero-shot with strict instructions | Simple outputs, many calls | Fast, low prompt size | Can be inconsistent on edge cases |
| One-shot exemplar | One-off formats or quoting | Better formatting fidelity | Example might bias content |
| Few-shot varied examples | Complex formats, error handling | Best reliability | Larger prompt, more prep |
| Post-validate + repair loop | Mission-critical systems | Highest correctness | Extra latency, complexity |
Practical checklist before you trust model JSON
- Did you include a clear, machine-checkable schema?
- Did you provide at least one example when types/formatting are finicky?
- Did you tell the model to return only JSON (or use delimiters)?
- Do you validate server-side and have a repair or rejection flow?
- Is there an
errorfield pattern for graceful degradation?
Closing — TL;DR (with attitude)
- Make the contract explicit. Tell the model the exact keys, types, and formats.
- Show examples when the format is picky — few-shot is your friend for tricky fields.
- Validate always. Never trust the model as the single source of truth.
- Fall back gracefully. If the model can’t supply valid data, have it return a controlled error object.
Final thought: teaching a model to obey a schema is like training a toddler to put toys back in labeled bins. It takes repetition, examples, and occasional stern reminders — but once they're trained, you can throw them at your production pipeline with much less fear.
Go forth and make clean JSON. Your parsers will weep tears of joy.
Comments (0)
Please sign in to leave a comment.
No comments yet. Be the first to comment!