Structuring Outputs and Formats
Specify output schemas, enforce structure, and design responses for easy parsing, scoring, and downstream use.
Content
Tags, Markers, and Delimiters
Versions:
Watch & Learn
AI-discovered learning video
Sign in to watch the learning video for this topic.
Tags, Markers, and Delimiters — The Secret Sauce for Reliable LLM Outputs
"If the model is a jazz musician, tags and delimiters are the sheet music. Without them you still might get a tune — but also a lot of noise."
You already learned about using Markdown and code blocks for readable responses and about producing XML, CSV, and TSV for structured data. You also practiced zero/one/few-shot prompting to nudge model behavior. Now we zoom in on the nuts and bolts that make structured outputs machine-parseable and robust: tags, markers, and delimiters.
Why they matter (and why you should care)
- Models are probabilistic storytellers. Without firm boundaries, they improvise. Sometimes that improvisation is art; usually it is not what your downstream parser expects.
- Tags and delimiters give explicit instructions to both the model and any automated consumer of its output. They reduce ambiguity, prevent bleed from instructions, and make parsing deterministic-ish.
Think of this like building a sandwich. Markdown saved you from soggy bread. CSV/XML taught you how to slice ingredients. Tags and delimiters are the toothpicks that stop the sandwich from collapsing in transit.
The vocabulary
- Tag: A named label that encloses content, e.g.,
... or [SUMMARY]...[/SUMMARY]. Great for hierarchical or typed content. - Marker: A special token or phrase used to mark boundaries, e.g., START / END or ===SECTION===. Usually single-use and distinctive.
- Delimiter: A character or sequence used to separate items, e.g., comma, pipe (|), or triple backticks for code blocks.
Patterns and templates that work
1) Explicit start/end markers (my personal favorite)
Use unique, unlikely tokens to wrap the desired output region. This shrugs off model verbosity.
Please output ONLY the section between <START> and <END>.
<START>
{ "summary": "Short summary here", "score": 0.87 }
<END>
Why this is good:
- If the model adds explanations, your parser can ignore everything outside
/ . - Works well when models invent additional commentary.
Tip: choose tokens not likely to appear in natural text, e.g., <<
2) Named tags for typed data
When you need multiple fields or nested structures, use XML-like tags or JSON blocks with markers.
<RESULT>
<TITLE>...</TITLE>
<KEYWORDS>k1,k2,k3</KEYWORDS>
<BODY>
...
</BODY>
</RESULT>
Or combine marker + JSON to enforce machine-friendly schema:
<<BEGIN_JSON>>
{
"title": "...",
"tags": ["ai","prompting"]
}
<<END_JSON>>
3) Delimiters inside delimited outputs
When your field content might include the delimiter itself, pick a higher-level delimiter. Example: you want to output CSV rows where fields may contain commas: wrap the CSV in triple backticks or markers.
<CSV_START>
"name","note"
"Alice","Loves, commas, and chaos"
"Bob","No commas"
<CSV_END>
This strategy played nicely with earlier lessons on CSV/TSV formatting.
Dealing with ambiguity: escaping and nested delimiters
- If your field may contain the same token you use as a delimiter, either use escaping (e.g., backslash or doubling quotes) or switch to a safer wrapper (e.g., JSON or a unique tag).
- For deeply nested content (code inside text inside XML), prefer JSON or base64-encoding the inner blob so it cannot break the outer schema.
Example: wrapping code in JSON
<<BEGIN_JSON>>
{
"filename":"script.py",
"content":"def f():\n print(\"Hello\")"
}
<<END_JSON>>
Practical templates and pseudocode for prompt designers
- Minimal, rigid extractor-friendly template
Output the JSON between <START> and <END> only.
<START>
{ "summary": "...", "score": 0.0 }
<END>
- Human + machine-friendly hybrid
Provide a short explanation, then the machine block.
Explanation:
- One-sentence human summary
<<MACHINE>>
{"summary":"...","items":[... ]}
<</MACHINE>>
- Few-shot patterning with markers
If you few-shot, show examples using the same markers and ordering you want. Order matters: the model will mimic the sequence and style.
Example 1:
<OUT>
{ "label":"spam" }
</OUT>
Example 2:
<OUT>
{ "label":"ham" }
</OUT>
Now your task:
<OUT>
This leverages earlier lessons on few-shot examples. Keep exemplars consistent and high-quality; trashy exemplars produce trashy outputs.
When to skip tags and go freeform
- If you want creative prose, do not force rigid delimiters. You will lose naturalness.
- If downstream is forgiving or uses semantic parsing (embedding similarity), you might avoid strict tags.
But for any automation, logging, or downstream validation, always go rigid.
Common pitfalls and how to fix them
- Model omits tags: Make tags extremely explicit. Add: "If you cannot produce valid JSON, output ERROR within
tags." That gives you a detectable fallback. - Model scribbles outside markers: Only parse content inside markers. If marker missing, flag as failed generation.
- Delimiter collisions: If you see repeated collisions, choose rarer Unicode characters like ⧉ or use base64 for blobs.
Quick checklist before deploying a prompt
- Use a unique top-level marker for the entire machine-readable region.
- Use typed tags for multiple fields.
- Protect nested content with escaping or encoding.
- Provide 0-3 exemplar outputs using the same markers if you need behavioral nudging.
- Add a graceful fallback tag like
when format validation fails.
Tiny case study: form intake bot
Problem: user messages with messy addresses, and you must output CSV rows.
Bad prompt: "Extract address info as CSV."
- Model may add commentary, include headers, or forget quoting.
Better prompt:
Only output CSV between <CSV> and </CSV>.
<CSV>
"name","street","city"
</CSV>
Now parse: "John Doe, 42 Baker St, Springfield"
Even better with few-shot: show two prior conversions using the exact markers and quoting rules. That reduces order effects and clarifies quoting expectations.
Final pep talk and power move
Tags, markers, and delimiters are not boring plumbing. They are the difference between a reliable pipeline and a mysterious, time-consuming bug that you blame on charts. The more you standardize your boundaries, the less your model will improvise, and the happier your downstream systems (and your future self) will be.
Key takeaways:
- Be explicit: mark the start and end of machine output.
- Be consistent: exemplars should use the exact format you want.
- Be defensive: add fallbacks and encodings for messy inner text.
Final commandment: do not trust unstated formats. Teach the model the fenceposts, then expect it to stay in the yard.
Comments (0)
Please sign in to leave a comment.
No comments yet. Be the first to comment!