Relational Databases and SQL
Model data and write efficient SQL to query, transform, and maintain integrity.
Content
Keys, Constraints, and Indexes
Versions:
Watch & Learn
AI-discovered learning video
Sign in to watch the learning video for this topic.
Keys, Constraints, and Indexes — the Database's Identity, Rules, and Speed Boosters
"Tables are neat. Keys, constraints, and indexes make them not only neat but obedient, fast, and trustworthy."
You're coming in hot after learning the relational model and how tables/schemas/types work. You also remember designing bigger Python programs with classes and typed attributes. Good — because keys, constraints, and indexes are where the relational world meets the object world and the machine room that serves queries.
Why this matters (and why engineers obsess)
- Keys define identity. Like an
idattribute on a Python class, a primary key says "this row is that object." - Constraints encode rules the database enforces for correctness — think of them as class invariants but enforced by the DBMS, not by your code.
- Indexes are the speed-ups that save user patience (and your server bill). They make SELECTs fast, but they cost time on INSERT/UPDATE/DELETE and space on disk.
If you want stable, correct systems (and fast ones), you need to use these deliberately.
Keys: identity and relationships
Primary keys (PK)
- Definition: A column or set of columns whose values uniquely identify a row in a table.
- Properties: Not NULL, unique. Usually used as the default reference for relationships.
SQL example:
CREATE TABLE users (
id SERIAL PRIMARY KEY, -- surrogate key (auto-increment)
username TEXT NOT NULL UNIQUE,
email TEXT NOT NULL
);
- Surrogate key (like
id) vs natural key (likeusernameoremail): surrogate keys are stable and small; natural keys carry meaning but can change.
Composite keys
- When uniqueness requires multiple columns.
CREATE TABLE enrollments (
student_id INT,
course_id INT,
PRIMARY KEY (student_id, course_id)
);
Micro explanation: Composite PKs are like using (firstName, lastName, birthdate) to identify a person — possible but brittle if any part changes.
Constraints: the DB enforcing the rules
Common constraints:
- NOT NULL — column must have a value.
- UNIQUE — values must be distinct.
- PRIMARY KEY — shorthand for NOT NULL + UNIQUE (and designates identity).
- FOREIGN KEY — enforces relationships between tables (referential integrity).
- CHECK — a custom rule (e.g., age >= 0).
- DEFERRABLE — check constraint enforcement can be deferred until transaction commit.
Example with FK and cascade rules:
CREATE TABLE posts (
id SERIAL PRIMARY KEY,
author_id INT NOT NULL,
content TEXT NOT NULL,
FOREIGN KEY (author_id) REFERENCES users(id)
ON DELETE CASCADE -- if a user is deleted, delete their posts
ON UPDATE CASCADE
);
Cascade options:
CASCADE: delete/update dependent rows automatically.SET NULL: set the FK to NULL (requires FK column nullable).RESTRICT/NO ACTION: block deletion/update if dependents exist.
Why this matters: constraints keep bad data out. They stop race conditions that slip past application checks — the DB is the single source of truth.
Indexes: how to make queries stop crawling
Indexes are data structures (most commonly B-trees) that map key values to row locations.
- CREATE INDEX idx_name ON table(column) creates a B-tree index by default.
- Unique indexes also enforce uniqueness.
- Partial indexes index only rows that meet a condition (good for sparse, high-selectivity cases).
- Composite indexes index multiple columns in a specific order; order matters!
Example:
CREATE INDEX idx_posts_author ON posts(author_id);
CREATE UNIQUE INDEX ux_users_email ON users(email);
CREATE INDEX idx_orders_date_status ON orders (status, created_at);
-- Partial index
CREATE INDEX idx_active_users ON users (last_login)
WHERE active = true;
B-tree vs Hash (quick)
- B-tree: good for range queries (
<,>,BETWEEN) and equality. Default choice. - Hash: excellent for equality, not for ranges; some DBs support hash indexes but they are niche.
Covering indexes
If an index contains all columns a query needs (SELECT column1, column2 WHERE ...), the DB can answer the query from the index alone — avoiding touching the table (faster).
Index trade-offs (be explicit)
- Pros: faster SELECTs, quicker JOIN lookups, ability to enforce UNIQUE.
- Cons: extra disk space, slower INSERT/UPDATE/DELETE (index updates), and complexity when deciding which columns to index.
Indexing heuristics:
- Index columns used in JOINs and WHERE filters with high selectivity.
- Avoid indexing low-selectivity columns (e.g., booleans) unless used in composite or partial indexes.
- Put the most selective column first in composite indexes.
- Keep indexes narrow (fewer bytes) to store more entries per page and improve cache efficiency.
Practical examples and rules of thumb
Identity in code and DB: If your Python class has an
idattribute, map it to a surrogate primary key in SQL. That keeps object identity stable even if business attributes change.Use foreign keys to prevent orphans:
ALTER TABLE comments
ADD FOREIGN KEY (post_id) REFERENCES posts(id) ON DELETE CASCADE;
When you design an API that does frequent reads by a field (e.g.,
email), add a unique index on it — but only after verifying it's needed.Use
EXPLAIN(orEXPLAIN ANALYZE) to see query plans. If a full table scan shows up and the column is selective, consider an index.Maintainability: drop unused indexes. Each index costs you on writes.
Advanced notes (tiny tidbits)
- Deferrable constraints: useful in complex transactions where two tables reference each other — defer check to commit.
- Index-only scans: possible when the index covers the query. They're magical for read-heavy workloads.
- Normalization vs index strategy: normalized schemas reduce duplication but often increase JOINs — index wisely for JOIN keys.
Quick checklist before shipping
- Every table has a sensible PK (surrogate if needed).
- Foreign keys are declared to enforce referential integrity.
- Unique constraints exist where business rules require them.
- Indexes exist for frequent WHERE and JOIN columns; use EXPLAIN to validate.
- Unused indexes were removed to save write costs.
Key takeaways
- Primary keys = stable identity (your DB's
idattribute). - Constraints = rules enforced by the DB so your app doesn't silently corrupt data.
- Indexes = the DB’s fast lane — use them for reads, but expect write penalties.
"Think of constraints as the law, keys as IDs, and indexes as express lanes. Laws keep everyone honest, IDs keep everyone unique, and express lanes keep the impatient users happy — but all three cost something to maintain."
If you enjoyed this, next we can map these concepts to actual query plans (EXPLAIN ANALYZE) and show how different indexes change the plan — with real CS50-style sample datasets.
Comments (0)
Please sign in to leave a comment.
No comments yet. Be the first to comment!