Data Modeling Fundamentals
Build robust and efficient semantic models: relationships, cardinality, star schemas, normalization considerations, and sizing strategies.
Content
Primary Keys and Lookup Tables
Versions:
Watch & Learn
Primary Keys and Lookup Tables — The One-True-Key Drama (but useful)
"A relationship without a primary key is like a blind date where nobody exchanged phone numbers." — Your future, less confused self
You've already been on the thrilling carousel of Power Query (cleaning, shaping, repeatable ETL), and you've seen why the star schema beats the snowflake for reporting speed and simplicity. You also just finished sparring with cardinality and cross-filter direction. Good — we’re past introductions and into the part where the model behaves like a grown-up: primary keys and lookup tables.
Why this matters now: if your facts can’t find their parents (lookup rows) via clean, unique keys, your relationships misfire, filters leak, and measures lie. Let’s make sure that doesn't happen.
TL;DR (Secret Sauce)
- Primary key: the column (or combination) in a lookup/dimension table that uniquely identifies each row. Must be unique and non-null. In Power BI, this is what fact tables point to in relationships.
- Lookup table (dimension): descriptive table (e.g., Customers, Products, Date) used to slice facts. One row per entity.
- Prefer single-column integer surrogate keys for performance and simplicity. Create them in Power Query during ETL.
What is a Primary Key, Really? (Keep it simple)
- Definition: A primary key is the unique identifier for each row in a lookup table. Think: CustomerID for customers, ProductID for products, DateKey for dates.
- Power BI implication: Relationships in the model are built between a column in the lookup (one side) and a column in the fact table (many side). The lookup column must represent each entity exactly once.
Key properties
- Uniqueness — no duplicates
- Not null — every row has a value
- Stable — values don’t change often
- Simple — single integer columns are ideal
Lookup Tables (The Responsible Parents)
Lookup tables (dimensions) hold attributes and are the surface for slicers and filters. They should be:
- At the correct grain: one row per distinct entity
- Wide, not deep: attributes, not transactional data
- Free of duplicate keys and null keys
Example: Sales (fact) references Customer(CustomerKey). Customer must have exactly one row per CustomerKey.
Natural Keys vs Surrogate Keys — The Family Feud
| Aspect | Natural Key | Surrogate Key (recommended) |
|---|---|---|
| Source | Real business value (e.g., SSN, SKU) | Artificial (auto-increment int, GUID) |
| Stability | Can change | Stable |
| Uniqueness enforcement | Source-dependent | You control it |
| Performance | Slower, larger | Faster, compact |
Conclusion: When possible, create an integer surrogate key in Power Query and use that as the primary key.
How to Create a Surrogate Primary Key in Power Query (M)
Use this in your ETL step — make the model clean before it reaches the data model.
// Add an index-based surrogate key
let
Source = YourSourceHere,
DistinctRows = Table.Distinct(Source, {"NaturalID"}),
AddKey = Table.AddIndexColumn(DistinctRows, "CustomerKey", 1, 1, Int64.Type)
in
AddKey
If you must compose a composite key (yikes), create a text combination and then hash/encode to an int if performance matters:
Table.AddColumn(tbl, "CompositeKey", each Text.Combine({[PartA], [PartB]}, "|"))
But prefer a single integer where possible.
Dealing with Duplicates, Nulls, and Granularity Mismatches
If Power BI refuses to create a one-to-many relationship and the lookup side shows duplicates:
- Check grain: Is the lookup table truly one row per entity? If not, aggregate or dedupe in Power Query (Table.Distinct or Group By).
- Check nulls: Replace nulls or filter them out. No nulls in keys.
- Check data types: Both columns must be the same data type (Text vs Whole Number).
- If domain necessitates multiple matches, consider a bridge table (many-to-many) or rethink schema.
Pro tip: If you find yourself enabling bidirectional cross-filtering to compensate for bad keys, stop and fix the keys instead.
Composite Keys & Many-to-Many — When Bad Meets Worse
- Composite keys (multiple columns) are allowed conceptually, but Power BI relationships are single-column. So you must create a composite column (concatenate) in both tables during ETL.
- True many-to-many relationships should be handled with a dedicated bridge table or by modeling the relationship as two one-to-many relationships (common lookup) so filters behave predictably.
Performance & Storage Notes (VertiPaq realities)
- Smaller, integer keys compress better in VertiPaq and improve model speed.
- Avoid GUIDs in huge tables — they don’t compress well and are slower.
- Keep lookup tables as small as possible and de-duped.
Practical Checklist (Designing Keys for Power BI)
- Create lookup tables at the correct grain (one row per entity).
- Generate a single-column integer surrogate key in Power Query.
- Ensure fact rows reference the same key (create/update in the ETL flow).
- Ensure same data types and no nulls.
- Avoid bidirectional filtering to patch bad keys.
- If duplicates exist, decide: dedupe, aggregate, or create a bridge.
Quick Troubleshooting Cheat Sheet
- Relationship grayed out / can’t create: check for duplicate values on lookup side.
- Aggregations not working as expected: check granularity/keys mismatch.
- Slicers showing unexpected values: verify lookup table is not missing rows or contains duplicates.
Tiny Real-World Example (Customer & Sales)
- Sales (fact): SaleID, SaleDate, CustomerKey, ProductKey, Amount
- Customer (lookup): CustomerKey, CustomerName, Region
Power Query should:
- Load distinct customers
- Add CustomerKey as index (surrogate)
- Replace missing natural IDs with placeholders or filter
- Ensure facts get the same CustomerKey value during ETL (merge step)
Final Pep Talk
Primary keys and lookup tables are the plumbing that makes your Power BI house not flood. Do the sane thing: build clean, unique, stable keys during Power Query ETL, prefer single integer surrogates, and keep your lookup tables one row per entity. Your measures will thank you. Your dashboards will be fast. And you will sleep better.
"If your model is a romance novel, primary keys are the phone numbers the characters exchange. Without them, nobody finds each other." — Also your future, slightly smug self
Key takeaways:
- Primary keys = uniqueness + stability
- Build keys in Power Query — don’t rely on ad-hoc fixes in the model
- Prefer integer surrogates; avoid GUIDs for big tables
Homework (5 minutes): Pick one lookup table in your PBIX. Does it have a single-row-per-entity grain and a stable key? If not, open Power Query and make one now. You're welcome.
Comments (0)
Please sign in to leave a comment.
No comments yet. Be the first to comment!