Relational Databases and SQL
Model data and write efficient SQL to query, transform, and maintain integrity.
Content
Relational Model
Versions:
Watch & Learn
AI-discovered learning video
Sign in to watch the learning video for this topic.
Relational Model (CS50) — Tables, Keys, and Why Your Data Shouldn’t Repeat Itself
"This is the moment where the concept finally clicks — your data stops acting like a messy roommate and starts behaving like a well-trained librarian."
You just finished structuring larger Python programs with classes, iterators, decorators, and typing. Nice. Now imagine your classes have to live in a different world: a world of tables, rows, and rules. The relational model is that world. It’s the conceptual backbone of SQL databases, and understanding it will make everything from designing schemas to debugging JOINs far less soul-crushing.
What the relational model is (short version, big payoff)
- Relation = table. Think spreadsheet: rows are tuples, columns are attributes.
- Tuple = row/record. One instance of data (e.g., one user).
- Attribute = column/property. One field on which tuples agree (e.g., email).
- Schema = the table's structure (column names and types) — like a class definition in Python.
- Instance = the current data in the table — like all existing objects of that class at runtime.
Micro explanation
- A relation schema is the recipe (column names/types).
- A relation instance is the dish you made with that recipe (the actual rows in the table).
Why it matters: the relational model gives precise rules (integrity constraints) so data stays consistent, queryable, and efficient. No more mystery duplicates or weird missing links.
Mapping from Python class land (you know this) to relational land
You’ve defined classes and used typing; think of a class as a table blueprint: each instance -> a row. But beware — objects can have nested references, methods, and behavior. Tables store only data. This mismatch is the famous object-relational impedance. ORMs (like SQLAlchemy, Django ORM) are translators.
Example: a simple class vs table
# Python class
class Person:
id: int
name: str
email: str
# relational schema
CREATE TABLE person (
id INTEGER PRIMARY KEY,
name TEXT NOT NULL,
email TEXT UNIQUE
);
- id ~ primary key (unique identifier).
- email marked UNIQUE to prevent duplicates (a constraint).
Iterators and cursors: when you SELECT * FROM person, the database returns a sequence of rows you can iterate over — same idea as your Python iterators.
Decorators and validation: you might wrap functions that insert rows with decorators that validate fields before hitting the DB (application-level integrity). But don’t rely on that alone — put constraints in the database too.
Keys and constraints — the relational model's bouncers
- Primary key: a column (or set of columns) that uniquely identifies each tuple. No NULLs allowed.
- Candidate key: any column or set of columns that could be a primary key. The primary key is one chosen candidate.
- Foreign key: a column that references a primary key in another relation (enforces relationships).
- Unique constraint: no duplicate values allowed in that column.
- NOT NULL: ensures a column must have a value.
Example: two tables with a foreign key
CREATE TABLE users (
user_id INTEGER PRIMARY KEY,
username TEXT UNIQUE NOT NULL
);
CREATE TABLE posts (
post_id INTEGER PRIMARY KEY,
author_id INTEGER NOT NULL,
content TEXT,
FOREIGN KEY(author_id) REFERENCES users(user_id) ON DELETE CASCADE
);
Referential integrity: the DB enforces that every author_id in posts matches an existing user_id in users. If a user is deleted, ON DELETE CASCADE can remove their posts automatically.
Normalization — how to stop duplicating everything
Normalization is a set of rules to reduce redundancy and avoid anomalies (insertion, update, deletion problems). The common levels you’ll see in CS50:
- 1NF (First Normal Form): atomic values only. No repeating lists in a single column.
- 2NF (Second Normal Form): 1NF + no partial dependency on a subset of a composite key. (Mostly matters when your PK is composite.)
- 3NF (Third Normal Form): 2NF + no transitive dependencies (non-key attributes depend only on the key).
Real-world analogy: you wouldn’t write every student’s advisor phone number repeatedly in every course row. Instead, create an advisors table and reference it with a foreign key. That way, updating the phone number happens once.
Example of denormalization problem
Bad (redundant):
| student_id | student_name | advisor_name | advisor_phone |
|---|
If advisor_phone changes, you update multiple rows — update anomaly. Proper fix: separate students and advisors tables and reference advisor_id.
Relational algebra basics (the math behind SQL)
- Selection (σ): filters rows (WHERE).
- Projection (π): picks columns (SELECT columns).
- Join (⨝): combines relations based on matching attributes (INNER JOIN, LEFT JOIN).
- Union, Intersection, Difference: set operations on compatible relations.
Why care? SQL is an implementation of these ideas. Thinking algebraically helps optimize queries and reason about results.
Quick join mental model:
- INNER JOIN = intersection of related data (only matched rows).
- LEFT JOIN = all left rows, plus matched right rows (NULLs if none).
Practical tips from the trenches (aka how to not cry when scaling your DB)
- Always pick a primary key (numeric surrogate keys are fine).
- Use foreign keys to enforce relationships — do not trust application code alone.
- Normalize to 3NF unless performance needs denormalization; optimize later, not prematurely.
- Add indexes on columns you frequently filter or join on (but don't index everything).
- Put essential integrity rules in the database (NOT NULL, UNIQUE, FK). Application checks (decorators) are helpful but not a substitute.
Quick examples: SELECT and a JOIN
-- Find all posts with their author's username
SELECT p.post_id, p.content, u.username
FROM posts p
JOIN users u ON p.author_id = u.user_id
WHERE u.username LIKE 'alex%';
This mirrors: iterate posts, look up user by foreign key, yield combined result — which is exactly what you might do in Python but much faster inside the DB.
Closing: Key takeaways
- Relation = table; tuple = row; attribute = column. Think spreadsheet, but with strict rules.
- Keys enforce identity and relationships. Primary keys for uniqueness; foreign keys for links.
- Normalization prevents redundancy. Keep data atomic and dependencies clean.
- Relational algebra is the logic behind SQL. Selection, projection, join — learn them like your core library functions.
- Use both app-level checks (decorators, typing) and DB-level constraints. They play different roles.
Remember: after writing neat classes and iterators in Python, the relational model teaches you how to store their data cleanly and reliably. It’s the difference between tossing your belongings in a pile and actually folding them into labeled drawers.
Want a small challenge? Convert a small set of nested Python dataclasses (Students with Courses) into 3 normalized SQL tables and write the JOIN that lists students with all their course names. It’s like refactoring, but for your data model.
Comments (0)
Please sign in to leave a comment.
No comments yet. Be the first to comment!