jypi
  • Explore
ChatWays to LearnMind mapAbout

jypi

  • About Us
  • Our Mission
  • Team
  • Careers

Resources

  • Ways to Learn
  • Mind map
  • Blog
  • Help Center
  • Community Guidelines
  • Contributor Guide

Legal

  • Terms of Service
  • Privacy Policy
  • Cookie Policy
  • Content Policy

Connect

  • Twitter
  • Discord
  • Instagram
  • Contact Us
jypi

© 2026 jypi. All rights reserved.

CS50 - Web Programming with Python and JavaScript
Chapters

1Orientation and Web Foundations

2Tools, Workflow, and Git

3HTML5 and Semantic Structure

4CSS3, Layouts, and Responsive Design

5Python Fundamentals for the Web

6Flask, Routing, and Templates

7Data, SQL, and ORM Patterns

Relational modeling basicsNormalization and denormalizationSQLite setup and usageSQL select insert update deleteJoins and subqueriesIndexes and query planningTransactions and isolationConnection poolingMigrations with AlembicSQLAlchemy core conceptsORM models and relationshipsLazy and eager loadingValidation and constraintsData seeding and fixturesCaching strategies

8State, Sessions, and Authentication

9JavaScript Essentials and the DOM

10Asynchronous JS, APIs, and JSON

11Frontend Components and React Basics

12Testing, Security, and Deployment

Courses/CS50 - Web Programming with Python and JavaScript/Data, SQL, and ORM Patterns

Data, SQL, and ORM Patterns

31507 views

Model, query, and evolve data with SQL, migrations, and ORM techniques for robust persistence.

Content

2 of 15

Normalization and denormalization

Normalization vs Denormalization in SQL (Web Apps Guide)
5986 views
intermediate
humorous
database
web-development
sql
gpt-5-mini
5986 views

Versions:

Normalization vs Denormalization in SQL (Web Apps Guide)

Watch & Learn

AI-discovered learning video

Sign in to watch the learning video for this topic.

Sign inSign up free

Start learning for free

Sign up to save progress, unlock study materials, and track your learning.

  • Bookmark content and pick up later
  • AI-generated study materials
  • Flashcards, timelines, and more
  • Progress tracking and certificates

Free to join · No credit card required

Normalization and Denormalization — When to Love Structure and When to Break the Rules

"Normalization is like Marie Kondo for your database — tidy, efficient, and declutters anomalies. Denormalization is that friend who keeps a spare couch in the living room because you host 20 people every Saturday."

You already know how to model relations from Relational modeling basics and how to render queries into Jinja templates with Flask. Now let’s decide whether to keep your schema pure and normalized, or to purposely duplicate data to speed up the user experience. This is the chapter where pragmatism meets theory — with a little theatrical chaos.


Why this matters for web apps

  • Web apps are read-heavy: pages often fetch lots of related data (posts + tags + author + comments). Too many joins can slow response time.
  • ORMs (like SQLAlchemy) make relationships easy, but they can also hide expensive queries (hello N+1 problem in Flask views!).
  • Schema design affects caching, migrations (Flask CLI + Flask-Migrate), and deploy complexity.

Normalization reduces redundancy and prevents anomalies. Denormalization reduces join cost and speeds reads. Neither is "wrong" — it's context.


Quick refresher (the one-line version)

  • Normalization: Split data into tables to eliminate redundancy and ensure consistency (1NF → 2NF → 3NF commonly). Good for writes and correctness.
  • Denormalization: Intentionally duplicate or aggregate data to optimize read performance. Good for read-heavy endpoints and reducing joins.

Normalization: the clean, reliable choice

What it prevents

  • Update anomalies: change author name once, not in 10 rows.
  • Insert anomalies: can't add an order item without an order template blowup.
  • Delete anomalies: deleting a post doesn't wipe an author record.

Classic example (normalized)

SQL (conceptual):

CREATE TABLE users (id INTEGER PRIMARY KEY, name TEXT, email TEXT);
CREATE TABLE posts (id INTEGER PRIMARY KEY, user_id INTEGER REFERENCES users(id), title TEXT, body TEXT);
CREATE TABLE tags (id INTEGER PRIMARY KEY, name TEXT);
CREATE TABLE post_tags (post_id INTEGER REFERENCES posts(id), tag_id INTEGER REFERENCES tags(id), PRIMARY KEY (post_id, tag_id));

SQLAlchemy (simplified):

class User(Base):
    id = Column(Integer, primary_key=True)
    name = Column(String)

class Post(Base):
    id = Column(Integer, primary_key=True)
    user_id = Column(Integer, ForeignKey('user.id'))
    tags = relationship('Tag', secondary='post_tags', back_populates='posts')

Why normalized here: tags are shared across posts, authors are normalized, many-to-many relationship lives in join table.


Denormalization: speed with responsibility

Denormalization examples:

  • Storing author_name on the posts table (duplicate of users.name)
  • Storing a tags_text column containing comma-separated tag names or a JSON array
  • Maintaining an aggregated comments_count on the posts table

Denormalized example (practical)

ALTER TABLE posts ADD COLUMN tags_json JSONB;  -- list of tag names
ALTER TABLE posts ADD COLUMN comments_count INTEGER DEFAULT 0;

In SQLAlchemy you might map tags_json = Column(JSON) and render it into templates without extra joins.

Benefits

  • Fewer joins → faster reads and less ORM query complexity (fewer N+1 issues)
  • Simpler queries for endpoints that show summary data (like feeds)
  • Good for paginated lists where you need a lot of rows but only a few columns

Costs and hazards

  • Data duplication → potential inconsistency
  • More complex writes: every update path must keep denormalized fields in sync
  • Larger storage
  • Harder migrations and reasoning about correctness

Practical patterns for web apps (ORM-aware)

  1. Start normalized. Ship fast. Measure.
  2. Profile your slow endpoints with EXPLAIN, APM, or logging from Flask views.
  3. If joins are the bottleneck, apply targeted denormalization — not wholesale copy-paste.

Common patterns:

  • Derived columns (comments_count, likes_count) maintained in application logic or via DB triggers.
  • Cached JSON blobs (tags_json or author snapshot) for rendering in feeds.
  • Materialized views for expensive aggregated queries (refresh on schedule or with triggers).
  • Use partial denormalization + indexes for frequent filters.

Example: avoid hitting users table for every post in a feed by storing author_snapshot = {id, name, avatar_url} on Post.


Safe denormalization: a checklist

  • Do you have metrics proving the join or aggregation is slow? (Yes → continue)
  • Can the denormalized value be reconstructed reliably? (e.g., tag names)
  • Who is responsible for updating the denormalized copy? (App code, DB trigger, background job?)
  • Have you added tests to ensure the derived fields stay consistent?
  • Have you handled migrations cleanly with Flask CLI + Flask-Migrate (Alembic)?

A typical strategy:

  1. Add new denormalized column via migration.
  2. Backfill it in a safe background job.
  3. Change reads to prefer the new column.
  4. Keep original normalized data for writes.
  5. Monitor for drift and remove redundant paths only when comfortable.

ORM tips: avoid the common traps

  • N+1 problem: Use eager loading (joinedload/selectinload in SQLAlchemy) before denormalizing blindly.
  • Association objects: If your join table has extra columns, keep it normalized and denormalize only aggregated info.
  • JSON fields: Great for flexible denorm, but indexes are limited — consider GIN indexes for JSONB (Postgres).

Code pattern (SQLAlchemy) to avoid N+1:

posts = session.query(Post).options(selectinload(Post.tags), selectinload(Post.user)).limit(20).all()

If this still performs poorly for your feed, add a post_preview denormalized column and use it.


Alternatives to denormalization

  • Materialized views (Postgres): fast reads, refresh strategy required
  • Redis/memcached caching of rendered results (fast, but need eviction logic)
  • Read replicas: scale reads horizontally (adds operational complexity)
  • Indexing: sometimes the right index fixes the problem without denormalization

Quick decision guide

  • Read-heavy + expensive joins + measurable latency → consider denormalization
  • Frequent writes to the duplicated data → lean toward normalization
  • Need transactional consistency at all times → normalization + DB features (FKs, constraints)

Key takeaways

  • Start with normalization for correctness and simplicity.
  • Measure before you denormalize. Evidence > intuition.
  • Denormalization is a tool: apply it narrowly and automate consistency (background jobs, triggers, or app code).
  • Combine denormalization with smart ORM usage (eager loading), indexing, and caching.

"Normalize to avoid the bugs you don’t want; denormalize to avoid the latency your users complain about. Be pragmatic, and let metrics be your referee."


Actionable next steps for your CS50 web project

  1. Profile the slow endpoint in your Flask app (log queries, use EXPLAIN).
  2. Try eager loading via SQLAlchemy (selectinload/joinedload).
  3. If still slow, add a migration with a denormalized column and backfill using Flask CLI + Flask-Migrate.
  4. Add tests ensuring denormalized fields update correctly.

Go forth and design schemas that are both tidy and furious (when speed matters).

Flashcards
Mind Map
Speed Challenge

Comments (0)

Please sign in to leave a comment.

No comments yet. Be the first to comment!

Ready to practice?

Sign up now to study with flashcards, practice questions, and more — and track your progress on this topic.

Study with flashcards, timelines, and more
Earn certificates for completed courses
Bookmark content for later reference
Track your progress across all topics