Power Query and Data Transformation
Master data ingestion and shaping using Power Query: cleaning, transforming, combining queries, and using M code for repeatable ETL.
Content
Power Query Editor Fundamentals
Versions:
Watch & Learn
Power Query Editor Fundamentals — the part of Power BI where your data gets a personality (and a haircut)
"You don't clean data because it's fun. You clean it because your dashboards deserve dignity." — probably a data hero
You're coming fresh from "Connecting to Data Sources" (nice work — you handled credentials, connectivity modes, and incremental loads without melting any neurons). Now the data is in Power BI, but it's raw: messy column names, inconsistent types, and dates that look like cryptic runes. Enter: Power Query Editor — a friendly, slightly dramatic workshop where we shape that unruly data into something trustworthy and fast.
What is Power Query Editor and why it matters (fast elevator pitch)
Power Query Editor is the visual environment inside Power BI Desktop where you transform, clean, and shape data before it becomes part of your reports. Think of it as the backstage crew for your dashboard's Broadway debut: trimming, fixing lighting, sometimes replacing the lead actor who shows up as "null."
This builds on your previous work with connectors: once you've authenticated and chosen the right connectivity mode, Power Query is where you apply the transformations that make the model reliable and performant.
The UI — the parts you’ll cozy up to
- Queries pane (left): Each source or transformation is a query.
- Data preview (center): Live snapshot of the current step.
- Applied Steps (right, in Query Settings): The step-by-step log of transformations.
- Formula bar: Shows the M code for the current step (enable via View > Formula Bar).
- Advanced Editor: See and edit the whole M script for a query.
Pro tip: Always keep the Formula Bar visible. It’s like seeing the receipts for what Power Query did.
The transformation workflow (the practical ritual)
- Connect (you did this — good)
- Inspect: Scan column headers, types, nulls
- Transform: Rename, change types, split columns, remove rows
- Shape: Pivot/unpivot, merge queries, group, aggregate
- Optimize: Enable query folding, disable load for intermediates, minimize steps
- Close & Apply
Ask at every step: Does this need to happen in Power Query? Can the source do it more efficiently (query folding)? If yes, let the source do the heavy lifting.
Key concepts, demystified (and meme-ready)
Applied Steps
Each action you take becomes a step. Steps are sequential and deterministic, which means: move one, and the rest keeps time like a polite clock.
- You can reorder some steps, but not always safely. Changing a type-conversion step before a split can break things.
Power Query M — the under-the-hood script
M is the functional language behind transformations. You don't have to be a poet, but knowing a few lines helps.
Example:
let
Source = Csv.Document(File.Contents("data/sales.csv"),[Delimiter=",", Columns=6, Encoding=1252, QuoteStyle=QuoteStyle.None]),
Promoted = Table.PromoteHeaders(Source, [PromoteAllScalars=true]),
ChangedTypes = Table.TransformColumnTypes(Promoted,{{"OrderDate", type date}, {"Sales", type number}})
in
ChangedTypes
Query Folding — the performance fairy
Query Folding means Power Query is smart enough to translate your transformations back to the source (e.g., SQL) so the heavy work happens in the database, not locally.
- Why care? Less data over the wire, faster refreshes, happier users.
- How to check? In Power Query, right-click a step and look for "View Native Query" (if enabled) — if it's available, folding is happening up to that step.
Common folding-breakers: custom functions, certain transforms (like adding index columns early), or operations that your source can't express.
Reference vs Duplicate (a table, because choices matter)
| Action | What it does | When to use it | Memory impact |
|---|---|---|---|
| Duplicate | Makes a full copy of the query with the same steps | When you want independent edits and both queries might diverge | Higher — duplicates repeat work unless folded |
| Reference | Creates a child query that points to the parent's final table | When you want a derived view (filter, aggregate) while keeping a single source of truth | Lower — more efficient, better for staging and folding |
Use Reference for most derived queries. Duplicate is for when you truly need a separate copy.
Practical transformations you'll use every other day
- Change Data Types: Always do this early. Bad types = bad aggregations.
- Split Columns: Delimiters, fixed width, or by positions.
- Merge Queries (Left/Right/Inner/Full): SQL-style joins for enriching tables.
- Group By: Summaries and aggregates.
- Pivot/Unpivot: Make tall data wide or wide data tall (unpivot is the unsung hero for analysis-ready tables).
- Replace Values / Remove Errors: Clean the cringe.
Ask: "Would a single SQL query on the source do this cheaper?" If yes, aim for folding.
Staging queries and disable load — the behind-the-scenes cleanup
- Create staging queries (Reference only) to centralize heavy transforms and preserve query folding.
- Right-click a query and choose Enable Load off for intermediate queries — prevents unnecessary tables from loading into the model.
Why? Fewer tables in the model = lower memory, faster refresh.
Advanced-ish tips for real-world reliability
- Use Parameters for server names, file paths, and date ranges — makes switching environments (dev → prod) easy.
- Be careful with locale and encoding for CSVs.
- Avoid heavy custom M if it prevents folding; prefer native SQL or source views if performance matters.
- Use Query Dependencies view to visualize how queries relate — it’s the family tree of your data transformations.
Quick diagnostic: If refresh is slow, check which step loses folding and whether you can push that transform to the source.
Closing: Your power moves as a Power Query user
- Always get types right early.
- Prefer Reference over Duplicate for derived tables.
- Keep an eye on Query Folding and push work to the source when possible (this complements your previous work on connectivity and incremental loads).
- Use Disable Load for staging queries — your model will thank you.
Final word: Power Query is where you translate messy reality into analytical truth. Be intentional: transform where necessary, fold where possible, and document your queries like you mean it.
Ready for the next step? In the next unit we'll build on this by exploring incremental refresh and diagnosistics within Power Query — how to make your refreshes both fast and reliable, even when your dataset behaves like a drama queen.
Version note: This guide assumed you've already connected to sources and considered connectivity modes; it's focused on transforming that data efficiently inside Power Query Editor.
Comments (0)
Please sign in to leave a comment.
No comments yet. Be the first to comment!