SQL and Data Warehousing
Querying data efficiently and designing analytical storage for scale.
Content
OLAP and OLTP
Versions:
Watch & Learn
AI-discovered learning video
Dashboards vs Checkouts: The No‑Chill Breakdown on OLAP vs OLTP
If you run your quarterly dashboard directly on the production database, somewhere a DBA loses a minute of sleep. And your checkout page loses a minute of uptime.
Welcome back to the data kitchen. You just learned how we extract/transform/load (ETL vs ELT) and how we model relationships with ER diagrams. That was about ingredients and recipes. Today we talk about the ovens: the systems that actually cook our data for two very different meals.
Those ovens are OLTP and OLAP. They sound like a sci‑fi buddy cop show, but they are very real and very opinionated about how you use them.
What Are We Even Solving?
Quick vibe check:
- You need to process thousands of tiny, real‑time updates without breaking a sweat. That is OLTP.
- You need to slice two years of sales by region, product, time, and promo code, preferably before your coffee gets cold. That is OLAP.
Think coffee shop:
- OLTP is the barista taking your order, charging your card, and updating the inventory for oat milk in milliseconds.
- OLAP is the operations team asking: which drinks sell out on rainy Tuesdays, and should we buy more syrups in Q4?
Two different workouts: one is sprints (OLTP), the other is marathons of analysis (OLAP).
Definitions Without the Fluff
OLTP (Online Transaction Processing)
- Purpose: record operational events in real time.
- Characteristics: many small writes, strict ACID transactions, normalized ER models (hello, you again), low latency, high concurrency.
- Examples: e‑commerce checkout, bank transfers, ride status updates, user sign‑ups, inventory decrement.
OLAP (Online Analytical Processing)
- Purpose: ask big, cross‑cutting questions for reporting, BI, and data science.
- Characteristics: read‑heavy, complex aggregations, columnar storage, star/snowflake schemas, batch/streamed loads, hours-to-seconds query speed.
- Examples: executive dashboards, cohort analysis, churn prediction features, MRR by segment, ML feature stores (read path).
Connect the dots:
- From ETL/ELT: OLTP systems are your sources; your warehouse/lakehouse is OLAP land where the data gets modeled for analytic joy.
- From ER modeling: OLTP loves 3rd Normal Form (reduce anomalies). OLAP loves dimensional models (facts and dimensions) to make GROUP BY therapy delightful.
Side‑by‑Side: What Changes Between OLTP and OLAP?
| Dimension | OLTP | OLAP |
|---|---|---|
| Primary goal | Execute transactions fast and safely | Explore, aggregate, and analyze at scale |
| Workload | Many small reads/writes | Few but heavy read queries |
| Data model | Normalized ER, lots of tables and keys | Star/snowflake, facts and dimensions |
| Storage | Row‑oriented, indexes for point lookups | Columnar, compression, partitions |
| Concurrency | Thousands of concurrent users | Dozens to hundreds of analysts/jobs |
| Latency | Milliseconds | Seconds to minutes |
| Transactions | ACID, strict | Often read‑only; snapshot isolation; batch writes |
| Examples | Orders, payments, profile updates | Revenue by month, funnel analysis, forecast models |
Shortcut: OLTP answers "what just happened?" OLAP answers "what does it mean and what should we do?"
Architecture in One Breath
Operational apps write to OLTP databases. Change data capture (CDC) or scheduled jobs move that data (ETL/ELT) into a warehouse or lakehouse. We remodel it into dimensional tables. Then BI tools, notebooks, and ML pipelines ask big questions without elbowing the checkout line.
App -> OLTP DB --(CDC/Batch)--> Staging -> Transform -> Warehouse (OLAP) -> Dashboards/ML
Modeling: From ER to Stars
OLTP (ER‑style)
- Keeps data consistent when many users update at once.
- Tables minimize redundancy; lots of FKs.
Example schema (tiny slice):
customers(id PK, email, created_at)
orders(id PK, customer_id FK, total_amount, status, created_at)
order_items(id PK, order_id FK, product_id FK, qty, unit_price)
A transaction might look like:
BEGIN;
UPDATE products SET stock = stock - 1 WHERE id = 42 AND stock > 0;
INSERT INTO orders(customer_id, total_amount, status)
VALUES (123, 19.99, 'paid');
COMMIT;
This is the "do not double‑charge me" zone. ACID or chaos.
OLAP (Dimensional)
- Reshapes into a fact table (measurable events) plus dimensions (who/what/when/where) for fast joins and aggregations.
Example star:
fact_sales(order_id, date_key, product_key, customer_key, qty, amount)
dim_date(date_key PK, date, month, quarter, year)
dim_product(product_key PK, sku, category, brand)
dim_customer(customer_key PK, segment, region)
Now your analytics query sings:
SELECT d.year, p.category, SUM(f.amount) AS revenue
FROM fact_sales f
JOIN dim_date d ON f.date_key = d.date_key
JOIN dim_product p ON f.product_key = p.product_key
GROUP BY d.year, p.category
ORDER BY d.year, p.category;
Want more flair? Rollups and windows:
-- Yearly totals plus subtotals by category
SELECT d.year, p.category, SUM(f.amount) AS revenue
FROM fact_sales f
JOIN dim_date d ON f.date_key = d.date_key
JOIN dim_product p ON f.product_key = p.product_key
GROUP BY ROLLUP (d.year, p.category);
-- Monthly revenue and 3-month moving average
WITH monthly AS (
SELECT d.year, d.month, SUM(f.amount) AS rev
FROM fact_sales f JOIN dim_date d ON f.date_key = d.date_key
GROUP BY d.year, d.month
)
SELECT *, AVG(rev) OVER (ORDER BY year, month ROWS BETWEEN 2 PRECEDING AND CURRENT ROW) AS ma3
FROM monthly;
Performance Moves: Different Gym Routines
OLTP
- Index narrow, selective columns for point lookups (e.g., orders by id).
- Keep transactions short; avoid long‑running reads that hold locks.
- Normalize to reduce write amplification and anomalies.
OLAP
- Use columnar storage and compression (Snowflake, BigQuery, Redshift, DuckDB, Parquet files).
- Partition and cluster by date or high‑cardinality dimensions to prune I/O.
- Materialize common aggregates; use summary tables or materialized views.
- Embrace snapshot isolation to let readers ignore in‑flight loads.
OLTP indexes are scalpel sharp. OLAP partitions are bulldozers. Choose your tool accordingly.
Anti‑Patterns (A Short Horror Anthology)
- Running a 50‑way JOIN with window functions on the production OLTP DB at 9 AM. Users will feel that.
- Using the warehouse as the source of truth for app features that require immediate consistency.
- Skipping dimensional modeling and throwing analysts at a swamp of 3NF tables. Yes, they can JOIN; no, they will not be happy.
- Doing ETL transforms on the OLTP DB during peak hours. Your lock graphs do not need more drama.
The Grey Area: HTAP and Friends
Hybrid transactional/analytical processing (HTAP) claims you can have one engine to rule them all. Modern systems blur lines:
- Postgres with logical replication + read replicas for light analytics
- MySQL HeatWave, SingleStore, TiDB, AlloyDB, Snowflake Unistore
- Lakehouse patterns (Delta/Iceberg/Hudi) with BI on top
These are awesome, but the guiding principle still holds: isolate noisy analytic scans from latency‑sensitive transactions, whether via separate clusters, replicas, or workload isolation.
You can share a kitchen, but give the marathoner and the sprinter separate lanes.
Choosing Your Strategy: Five Questions
- Who is the user? App customers (OLTP) or analysts/ML/execs (OLAP)?
- What is the latency budget? Milliseconds vs seconds/minutes.
- What is the access pattern? Point lookups/updates vs scans/aggregations.
- How fresh must analytics be? Real‑time (consider CDC + streaming) vs hourly/daily.
- How complex are the joins? If you regularly do 10+ dimension joins, star schemas will save your soul.
From Data Wrangling to Warehouse Zen
- You wrangled data to make it clean and consistent.
- ETL/ELT moves that goodness into the right place at the right time.
- OLTP keeps the real world consistent; OLAP turns that reality into insight.
This is the logical progression: acquire → clean → model → move → optimize for the right workload.
Quick Cheatsheet
OLTP
- Optimize for write integrity and low latency.
- Normalize; index surgically; keep transactions tiny.
- Protect it from heavy analytical queries. Use replicas if needed.
OLAP
- Optimize for scans and aggregations.
- Model with facts/dimensions; use columnar storage, partitions, and materialized views.
- Schedule loads (or CDC) and design for idempotency.
If your query needs a transaction, it belongs on OLTP. If your query needs a coffee, it belongs on OLAP.
Final Takeaway
You would not measure marathon splits on a cashier’s barcode scanner. Do not force OLTP to be a warehouse, and do not make OLAP pretend to be a point‑of‑sale.
Build clear lanes: OLTP for reality, OLAP for meaning. The moment you separate them and let ETL/ELT be the bridge, your dashboards stop breaking your checkouts, your ML features stop starving for joins, and your team stops arguing with the DBA in all caps.
Action step: list your top three recurring queries this week. Label each OLTP or OLAP. Move each to the right engine. Watch your system — and your blood pressure — drop to normal.
Comments (0)
Please sign in to leave a comment.
No comments yet. Be the first to comment!