CS50 - Web Programming with Python and JavaScript

Chapters

1Orientation and Web Foundations

2Tools, Workflow, and Git

3HTML5 and Semantic Structure

4CSS3, Layouts, and Responsive Design

5Python Fundamentals for the Web

6Flask, Routing, and Templates

7Data, SQL, and ORM Patterns

Relational modeling basics Normalization and denormalization SQLite setup and usage SQL select insert update delete Joins and subqueries Indexes and query planning Transactions and isolation Connection pooling Migrations with Alembic SQLAlchemy core concepts ORM models and relationships Lazy and eager loading Validation and constraints Data seeding and fixtures Caching strategies

8State, Sessions, and Authentication

9JavaScript Essentials and the DOM

10Asynchronous JS, APIs, and JSON

11Frontend Components and React Basics

12Testing, Security, and Deployment

Courses/CS50 - Web Programming with Python and JavaScript/Data, SQL, and ORM Patterns

Data, SQL, and ORM Patterns

31513 views

Model, query, and evolve data with SQL, migrations, and ORM techniques for robust persistence.

Content

2 of 15

Normalization and denormalization

Normalization vs Denormalization in SQL (Web Apps Guide)

5988 views

intermediate

humorous

database

web-development

sql

gpt-5-mini

5988 views

Versions:

Normalization vs Denormalization in SQL (Web Apps Guide)

Watch & Learn

AI-discovered learning video

Start learning for free

Bookmark content and pick up later
AI-generated study materials
Flashcards, timelines, and more
Progress tracking and certificates

Free to join · No credit card required

Normalization and Denormalization — When to Love Structure and When to Break the Rules

"Normalization is like Marie Kondo for your database — tidy, efficient, and declutters anomalies. Denormalization is that friend who keeps a spare couch in the living room because you host 20 people every Saturday."

You already know how to model relations from Relational modeling basics and how to render queries into Jinja templates with Flask. Now let’s decide whether to keep your schema pure and normalized, or to purposely duplicate data to speed up the user experience. This is the chapter where pragmatism meets theory — with a little theatrical chaos.

Why this matters for web apps

Web apps are read-heavy: pages often fetch lots of related data (posts + tags + author + comments). Too many joins can slow response time.
ORMs (like SQLAlchemy) make relationships easy, but they can also hide expensive queries (hello N+1 problem in Flask views!).
Schema design affects caching, migrations (Flask CLI + Flask-Migrate), and deploy complexity.

Normalization reduces redundancy and prevents anomalies. Denormalization reduces join cost and speeds reads. Neither is "wrong" — it's context.

Quick refresher (the one-line version)

Normalization: Split data into tables to eliminate redundancy and ensure consistency (1NF → 2NF → 3NF commonly). Good for writes and correctness.
Denormalization: Intentionally duplicate or aggregate data to optimize read performance. Good for read-heavy endpoints and reducing joins.

Normalization: the clean, reliable choice

What it prevents

Update anomalies: change author name once, not in 10 rows.
Insert anomalies: can't add an order item without an order template blowup.
Delete anomalies: deleting a post doesn't wipe an author record.

Classic example (normalized)

SQL (conceptual):

CREATE TABLE users (id INTEGER PRIMARY KEY, name TEXT, email TEXT);
CREATE TABLE posts (id INTEGER PRIMARY KEY, user_id INTEGER REFERENCES users(id), title TEXT, body TEXT);
CREATE TABLE tags (id INTEGER PRIMARY KEY, name TEXT);
CREATE TABLE post_tags (post_id INTEGER REFERENCES posts(id), tag_id INTEGER REFERENCES tags(id), PRIMARY KEY (post_id, tag_id));

SQLAlchemy (simplified):

class User(Base):
    id = Column(Integer, primary_key=True)
    name = Column(String)

class Post(Base):
    id = Column(Integer, primary_key=True)
    user_id = Column(Integer, ForeignKey('user.id'))
    tags = relationship('Tag', secondary='post_tags', back_populates='posts')

Why normalized here: tags are shared across posts, authors are normalized, many-to-many relationship lives in join table.

Denormalization: speed with responsibility

Denormalization examples:

Storing author_name on the posts table (duplicate of users.name)
Storing a tags_text column containing comma-separated tag names or a JSON array
Maintaining an aggregated comments_count on the posts table

Denormalized example (practical)

ALTER TABLE posts ADD COLUMN tags_json JSONB;  -- list of tag names
ALTER TABLE posts ADD COLUMN comments_count INTEGER DEFAULT 0;

In SQLAlchemy you might map tags_json = Column(JSON) and render it into templates without extra joins.

Benefits

Fewer joins → faster reads and less ORM query complexity (fewer N+1 issues)
Simpler queries for endpoints that show summary data (like feeds)
Good for paginated lists where you need a lot of rows but only a few columns

Costs and hazards

Data duplication → potential inconsistency
More complex writes: every update path must keep denormalized fields in sync
Larger storage
Harder migrations and reasoning about correctness

Practical patterns for web apps (ORM-aware)

Start normalized. Ship fast. Measure.
Profile your slow endpoints with EXPLAIN, APM, or logging from Flask views.
If joins are the bottleneck, apply targeted denormalization — not wholesale copy-paste.

Common patterns:

Derived columns (comments_count, likes_count) maintained in application logic or via DB triggers.
Cached JSON blobs (tags_json or author snapshot) for rendering in feeds.
Materialized views for expensive aggregated queries (refresh on schedule or with triggers).
Use partial denormalization + indexes for frequent filters.

Example: avoid hitting users table for every post in a feed by storing author_snapshot = {id, name, avatar_url} on Post.

Safe denormalization: a checklist

Do you have metrics proving the join or aggregation is slow? (Yes → continue)
Can the denormalized value be reconstructed reliably? (e.g., tag names)
Who is responsible for updating the denormalized copy? (App code, DB trigger, background job?)
Have you added tests to ensure the derived fields stay consistent?
Have you handled migrations cleanly with Flask CLI + Flask-Migrate (Alembic)?

A typical strategy:

Add new denormalized column via migration.
Backfill it in a safe background job.
Change reads to prefer the new column.
Keep original normalized data for writes.
Monitor for drift and remove redundant paths only when comfortable.

ORM tips: avoid the common traps

N+1 problem: Use eager loading (joinedload/selectinload in SQLAlchemy) before denormalizing blindly.
Association objects: If your join table has extra columns, keep it normalized and denormalize only aggregated info.
JSON fields: Great for flexible denorm, but indexes are limited — consider GIN indexes for JSONB (Postgres).

Code pattern (SQLAlchemy) to avoid N+1:

posts = session.query(Post).options(selectinload(Post.tags), selectinload(Post.user)).limit(20).all()

If this still performs poorly for your feed, add a post_preview denormalized column and use it.

Alternatives to denormalization

Materialized views (Postgres): fast reads, refresh strategy required
Redis/memcached caching of rendered results (fast, but need eviction logic)
Read replicas: scale reads horizontally (adds operational complexity)
Indexing: sometimes the right index fixes the problem without denormalization

Quick decision guide

Read-heavy + expensive joins + measurable latency → consider denormalization
Frequent writes to the duplicated data → lean toward normalization
Need transactional consistency at all times → normalization + DB features (FKs, constraints)

Key takeaways

Start with normalization for correctness and simplicity.
Measure before you denormalize. Evidence > intuition.
Denormalization is a tool: apply it narrowly and automate consistency (background jobs, triggers, or app code).
Combine denormalization with smart ORM usage (eager loading), indexing, and caching.

"Normalize to avoid the bugs you don’t want; denormalize to avoid the latency your users complain about. Be pragmatic, and let metrics be your referee."

Actionable next steps for your CS50 web project

Profile the slow endpoint in your Flask app (log queries, use EXPLAIN).
Try eager loading via SQLAlchemy (selectinload/joinedload).
If still slow, add a migration with a denormalized column and backfill using Flask CLI + Flask-Migrate.
Add tests ensuring denormalized fields update correctly.

Go forth and design schemas that are both tidy and furious (when speed matters).

Flashcards

Mind Map

Speed Challenge

Comments (0)

Please sign in to leave a comment.

No comments yet. Be the first to comment!

Ready to practice?

Study with flashcards, timelines, and more

Earn certificates for completed courses

Bookmark content for later reference

Track your progress across all topics