Generative AI: Prompt Engineering Basics

Chapters

1Foundations of Generative AI

2LLM Behavior and Capabilities

Pretraining and Fine-Tuning Instruction Following and Alignment RLHF and Preference Optimization Sensitivity to Wording and Order Length Bias and Cutoff Realities Hidden Biases and Stereotypes Refusals and Safety Behavior Non-Determinism and Sampling Variance Stop Sequences and Output Control System Message Priority Tool-Use Affordances Function Calling at a Glance Style and Tone Emulation Domain Transfer and Generalization When Models Say “I Don’t Know”

3Core Principles of Prompt Engineering

4Writing Clear, Actionable Instructions

5Roles, Personas, and System Prompts

6Supplying Context and Grounding

7Examples: Zero-, One-, and Few-Shot

8Structuring Outputs and Formats

9Reasoning and Decomposition Techniques

10Iteration, Testing, and Prompt Debugging

11Evaluation, Metrics, and Quality Control

12Safety, Ethics, and Risk Mitigation

13Tools, Functions, and Agentic Workflows

14Retrieval-Augmented Generation (RAG)

15Multimodal and Advanced Prompt Patterns

Courses/Generative AI: Prompt Engineering Basics/LLM Behavior and Capabilities

LLM Behavior and Capabilities

18078 views

Understand alignment, sensitivity to phrasing, non-determinism, and other behavioral properties that your prompts must account for.

Content

4 of 15

Sensitivity to Wording and Order

Wordsmith Warfare — Prompt Order & Wording, No Chill

837 views

beginner

humorous

sarcastic

science

gpt-5-mini

837 views

Versions:

Wordsmith Warfare — Prompt Order & Wording, No Chill

Watch & Learn

AI-discovered learning video

Start learning for free

Bookmark content and pick up later
AI-generated study materials
Flashcards, timelines, and more
Progress tracking and certificates

Free to join · No credit card required

Sensitivity to Wording and Order — Why a Few Words Can Make an LLM Do a Weird Dance

"You asked for a poem about a carrot, and it wrote a haiku about betrayal. Welcome to prompt engineering."

We already talked about how modern LLMs predict tokens left-to-right using probabilities, and how RLHF and preference optimization nudge models toward humanlike behavior and safer outputs. Now let’s lean into the part your professor demonstrated with a grave tone: the model is exquisitely, embarrassingly sensitive to wording and order. That means the difference between "Summarize this in one sentence" and "In one sentence, summarize this" can be literally the difference between elegance and nonsense. Let's unpack why — and how to exploit this sensitivity like a polite puppet master.

Quick refresher (building on what you already learned)

From Foundations: models generate text token-by-token and their outputs are shaped by token probabilities and context windows.
From Instruction Following & Alignment: models are trained to follow instructions; RLHF pushes outputs toward human preferences, but it doesn't make the model immune to prompt phrasing.

So: alignment helps steer the ship, but wording and ordering are the waves.

The mechanics: why wording and order matter (short, sharp, nerdy)

Conditional probability is king. The model picks the next token based on what came before. Change the words preceding a target point and you change the distribution of what comes next. Small tweak → different probability landscape → different output.
Tokenization quirks. Splitting words differently (e.g., special characters, punctuation, casing) changes token boundaries. That can nudge token probabilities and suddenly "color" becomes "col or" in the model's internal view.
Recency and attention. Information presented later in the prompt often influences the model more (recency bias). In few-shot setups, the order of examples matters.
Role and system messages matter. In chat APIs, a System message that says "You are an expert" will have stronger, consistent influence than embedding that instruction after examples.
RLHF is a soft constraint. Preference optimization nudges outputs toward human-like answers but doesn’t erase the raw statistical tendencies learned during pretraining. So a slight wording change can still flip an answer.

Real-world examples — tiny changes, big differences

Imagine you want a concise summary of a paragraph. Try these three prompts (they're nearly identical):

Prompt A: "Summarize the passage in one sentence."
Prompt B: "In one sentence, summarize the passage."
Prompt C: "Please provide one-sentence summary of the passage."

You might get a tight, single-sentence summary from A and B, but C could return a bulleted mini-outline (because the polite "please" + phrasing nudges a different pattern from training data). Weird? Yes. Predictable? Absolutely if you test it.

Table: small prompt differences and likely effects

Prompt variant	Typical nudged behavior
Instruction-first ("Do X.")	More direct compliance; concise answers
Embedded instruction ("Please do X if...")	More verbose, polite patterns appear
Examples last in few-shot	Strong recency influence; recent example style copied

Few-shot ordering: why the sequence of examples matters

Put exemplar A then B then C: the model treats the last examples as freshest signals.
If your examples escalate in complexity, the model might mimic that escalation.

Pro tip: If you want the model to emulate a style, put the cleanest, most representative example last.

Wording traps — common ways users accidentally sabotage prompts

Ambiguity: "List things to avoid" vs. "List things to avoid when X" — missing context → hallucinations.
Passive vs active phrasing: "Explain how X was done" vs "Explain how to do X" — different answer frames (history vs steps).
Negation: Double negatives confuse both humans and models. "Don’t not include" → chaos.
Politeness bias: Words like "please" or hedging phrases can tilt outputs to be more verbose or uncertain.

A few short experiments (copy-paste and play)

Try these in any chat model; notice the differences.

Version 1 (instruction-heavy):
System: "You are a concise assistant. Provide exactly one sentence."
User: "Summarize the following text in one sentence: [TEXT]"

Version 2 (example-heavy):
User: "Example: 'X' -> 'One-sentence summary'
Example: 'Y' -> 'One-sentence summary'
Now do this: [TEXT]"

Version 3 (question-first):
User: "In one sentence, what is [TEXT]?"

Which version gives you the cleanest one-sentence summary? Probably Version 1 or 3, depending on system messaging.

Practical prompt-engineering rules born from chaos

Be explicit and consistent. If you need a 3-bullet list, say exactly that and show a template.
Order for influence: Put system/role instructions first. Put critical constraints (format, length, safety) early.
Example ordering matters: Put the exemplar whose style you want most last in few-shot sequences.
Use separators: --- or ### to segment prompt parts. The model treats these like structural cues.
Test systematically: change one word at a time and record outputs. Maintain a prompt-variant log.
Control randomness: temperature and decoding settings interact with wording sensitivity. Lower temperature = less variance.

When to not over-optimize

Sometimes simple wording gets the job done. Over-engineering prompts can produce brittle prompts that break when the model is updated. Balance polish and robustness: use templates but keep fallback prompts.

Closing mic-drop — the takeaway

LLMs are probabilistic pattern imitators, not literal robots. Wording and order change the patterns they match.
Alignment (RLHF) and instruction-following help, but they’re not magic. They nudge behavior but don’t remove sensitivity to prompt phrasing or example order.
Your job as a prompt engineer: design prompts that give the clearest, most stable statistical signal for the output you want — use order, role, examples, separators, and constraints intentionally.

Final thought: If the model misbehaves, don’t blame the AI; debug the prompt. Think of prompting as crafting a charismatic director’s note to a very obedient but easily distracted actor.

Key takeaways (cheat-sheet):

Be explicit, put constraints early.
Use system role for global behavior.
Order your examples: last = strongest influence.
Test small wording changes; expect big output differences.
Keep prompts robust, not brittle.

Now go forth, experiment, and cause responsibly predictable behavior. Or at least hilarious outputs you can learn from.

Flashcards

Mind Map

Speed Challenge

Comments (0)

Please sign in to leave a comment.

No comments yet. Be the first to comment!

Ready to practice?

Study with flashcards, timelines, and more

Earn certificates for completed courses

Bookmark content for later reference

Track your progress across all topics