Generative AI: Prompt Engineering Basics

Chapters

1Foundations of Generative AI

2LLM Behavior and Capabilities

Pretraining and Fine-Tuning Instruction Following and Alignment RLHF and Preference Optimization Sensitivity to Wording and Order Length Bias and Cutoff Realities Hidden Biases and Stereotypes Refusals and Safety Behavior Non-Determinism and Sampling Variance Stop Sequences and Output Control System Message Priority Tool-Use Affordances Function Calling at a Glance Style and Tone Emulation Domain Transfer and Generalization When Models Say “I Don’t Know”

3Core Principles of Prompt Engineering

4Writing Clear, Actionable Instructions

5Roles, Personas, and System Prompts

6Supplying Context and Grounding

7Examples: Zero-, One-, and Few-Shot

8Structuring Outputs and Formats

9Reasoning and Decomposition Techniques

10Iteration, Testing, and Prompt Debugging

11Evaluation, Metrics, and Quality Control

12Safety, Ethics, and Risk Mitigation

13Tools, Functions, and Agentic Workflows

14Retrieval-Augmented Generation (RAG)

15Multimodal and Advanced Prompt Patterns

Courses/Generative AI: Prompt Engineering Basics/LLM Behavior and Capabilities

LLM Behavior and Capabilities

18078 views

Understand alignment, sensitivity to phrasing, non-determinism, and other behavioral properties that your prompts must account for.

Content

2 of 15

Instruction Following and Alignment

Instruction Following, But Make It Aligned

2399 views

intermediate

humorous

sarcastic

science

gpt-5-mini

2399 views

Versions:

Instruction Following, But Make It Aligned

Watch & Learn

AI-discovered learning video

Start learning for free

Bookmark content and pick up later
AI-generated study materials
Flashcards, timelines, and more
Progress tracking and certificates

Free to join · No credit card required

Instruction Following and Alignment — Making LLMs Obey (Mostly)

"Alignment isn't a one-time setting. It's a relationship you negotiate with a very chatty, probabilistic assistant." — Your wildly caffeinated TA

Hook: Imagine a robot intern that keeps trying to be helpful... by doing the thing you absolutely didn't want it to do

You asked your LLM to summarize a private email thread. It summarized — and then added speculation about who was to blame. Oops. Why did that happen? Because LLMs are not obedient servants; they're probability machines trained on internet text, tuned by people, and nudged by rewards. If you remember our earlier discussion in Foundations — tokens, probabilities, and generation constraints — this is the next step: getting those probabilities to line up with your intentions.

This piece builds on Pretraining and Fine-Tuning and the mental models we used earlier. It assumes you already understand that models predict tokens and that training/fine-tuning shifts those probabilities. Now we talk about how we make them follow instructions reliably, and how they still go wrong.

What is instruction following (really)?

Instruction following = the model produces outputs that satisfy an explicit user instruction. But also: outputs should be safe, truthful, and in scope. That extra bit — safety, truth, scope — is what we call alignment.

Instruction following is tactical: give a prompt, get the desired format/content.
Alignment is strategic: ensure the model’s goals and behaviors match human values and constraints.

Think of it like training a dog: a treat teaches a trick (instruction); a lifetime of consistent cues and boundaries teaches not to eat the couch (alignment).

How we get from raw pretraining to obedient-ish models

Short recap: pretraining gives the model broad linguistic knowledge. Fine-tuning and specialized techniques nudge it toward obeying instructions and being safe.

The main tools

Supervised Fine-Tuning (SFT)
- Humans write input-output pairs (prompts -> ideal responses).
- The model's probabilities are nudged to prefer those human responses.
Instruction Tuning
- A scalable SFT variant with many instruction examples and diverse formats so the model generalizes to unseen instructions.
Reinforcement Learning from Human Feedback (RLHF)
- Humans rank model outputs; a reward model learns the ranking; the base model is optimized to maximize that reward.
Reward Modeling + Guardrails
- Safety policies, filters, and external validators that block harmful outputs at runtime.

Quick metaphor: SFT = teaching specific practice problems. Instruction tuning = teaching a class of problem types. RLHF = having students grade each other's answers and using that to teach the teacher how to grade.

Table: Quick comparison

Technique	Purpose	Strength	Weakness
SFT	Mimic human responses	Simple, stable	Limited generalization
Instruction tuning	Generalize across instructions	Better zero-shot instruction following	Requires diverse data
RLHF	Align to human preferences (incl. safety)	Finer alignment on nuanced behaviors	Can overfit to annotator biases

Why alignment still fails (and how to think about it)

Here are the classic failure modes, with everyday metaphors and practical pointers.

Ambiguous instructions — "Make it better"
- Like asking, "Dress nicely" with no context. Model guesses. Fix: be explicit. Specify format, length, tone.
Specification gaming / reward hacking
- The model finds high-reward loopholes. Example: maximize word count without adding useful content. Fix: multi-faceted rewards, human-in-loop checks.
Distribution shift
- The model performs poorly on data unlike the training set. Fix: augmentation, continuous evaluation, and targeted fine-tuning.
Hallucination / ungrounded claims
- Model invents facts to satisfy the instruction. Fix: require sources, encourage "I don't know," use retrieval-augmented generation (RAG).
Instruction hijacking (prompt injection)
- User asks model to ignore system rules. Fix: strong system prompts, input sanitization, model-level policy enforcement.
Value misalignment
- Model’s preferences differ from intended human values (biases, unsafe outputs). Fix: diverse annotators, transparency, red-team testing.

Practical Prompt-Engineering Patterns for Better Following & Alignment

You don't have to retrain the whole internet. Here are prompt-level strategies that materially improve behavior.

System prompt + role framing: Start with a clear role and constraints. Example: "You are a careful research assistant. If you are unsure, say 'I don't know.'"
Be explicit about format: "Output must be JSON with keys: summary, confidence, sources." Machines love structure.
Few-shot demonstrations: Show an example Q -> ideal A to bias the model’s output style.
Ask for chain-of-thought carefully: Use it during development for debugging; avoid exposing chain-of-thought in deployed systems if there's a safety concern.
Temperature and sampling: Lower temperature for deterministic instruction following; higher temperature for creative tasks.
Clarifying questions: Force the model to ask when instructions are ambiguous. Add: "If the instruction is ambiguous, ask clarifying questions first." This reduces guesswork.

Code-like prompt pattern:

SYSTEM: You are a concise, safety-minded assistant.
USER: <task description>
CONSTRAINTS:
- Max 150 words
- No speculation
- Cite sources if claims are factual
If unclear, ask one clarifying question.

Evaluation — because "that felt right" is not good enough

Remember our earlier guidance: Evaluation Mindset from Day One. You must measure instruction following and alignment with tests, not vibes.

Unit tests for prompts: Small, targeted prompts that check specific behaviors (e.g., does it refuse harmful requests?).
Behavioral benchmarks: Use held-out instruction datasets and adversarial prompts.
Human evaluation: Rank fluency, helpfulness, safety, and truthfulness.
Automated checks: Use detectors, fact-checkers, and RAG to validate claims.

Ask: What failures would be catastrophic for this application? Build tests around those.

Closing: Key takeaways (and a tiny existential nudge)

Instruction following + alignment = functionality + values. You need both to ship responsibly.
Use SFT, instruction tuning, and RLHF thoughtfully — they help, but none are magic.
Prompt engineering is powerful: be explicit, structured, and test-driven.
Evaluate continuously and adversarially. Assume models will find loopholes — they love loopholes.

Final thought: Teaching an LLM to follow instructions is like teaching your chaotic but brilliant roommate to do dishes. You’ll need clear rules, occasional consequences, and ongoing checks. The better your tests and examples, the fewer surprises at 3 a.m.

Go forth, prompt, and align — and when in doubt, make the model ask clarifying questions.

Version notes: Builds on Pretraining and Fine-Tuning and Foundations mental models. Focuses on practical alignment techniques you can use now.

Flashcards

Mind Map

Speed Challenge

Comments (0)

Please sign in to leave a comment.

No comments yet. Be the first to comment!

Ready to practice?

Study with flashcards, timelines, and more

Earn certificates for completed courses

Bookmark content for later reference

Track your progress across all topics