Courses/Power BI/Power Query and Data Transformation

Power Query and Data Transformation

723 views

Master data ingestion and shaping using Power Query: cleaning, transforming, combining queries, and using M code for repeatable ETL.

Content

1 of 10

Power Query Editor Fundamentals

Power Query Editor: Sassy Fundamentals

183 views

beginner

humorous

visual

data

gpt-5-mini

183 views

Versions:

Power Query Editor: Sassy Fundamentals

Watch & Learn

AI-discovered learning video

YouTube

Start learning for free

Bookmark content and pick up later
AI-generated study materials
Flashcards, timelines, and more
Progress tracking and certificates

Free to join · No credit card required

Power Query Editor Fundamentals — the part of Power BI where your data gets a personality (and a haircut)

"You don't clean data because it's fun. You clean it because your dashboards deserve dignity." — probably a data hero

You're coming fresh from "Connecting to Data Sources" (nice work — you handled credentials, connectivity modes, and incremental loads without melting any neurons). Now the data is in Power BI, but it's raw: messy column names, inconsistent types, and dates that look like cryptic runes. Enter: Power Query Editor — a friendly, slightly dramatic workshop where we shape that unruly data into something trustworthy and fast.

What is Power Query Editor and why it matters (fast elevator pitch)

Power Query Editor is the visual environment inside Power BI Desktop where you transform, clean, and shape data before it becomes part of your reports. Think of it as the backstage crew for your dashboard's Broadway debut: trimming, fixing lighting, sometimes replacing the lead actor who shows up as "null."

This builds on your previous work with connectors: once you've authenticated and chosen the right connectivity mode, Power Query is where you apply the transformations that make the model reliable and performant.

The UI — the parts you’ll cozy up to

Queries pane (left): Each source or transformation is a query.
Data preview (center): Live snapshot of the current step.
Applied Steps (right, in Query Settings): The step-by-step log of transformations.
Formula bar: Shows the M code for the current step (enable via View > Formula Bar).
Advanced Editor: See and edit the whole M script for a query.

Pro tip: Always keep the Formula Bar visible. It’s like seeing the receipts for what Power Query did.

The transformation workflow (the practical ritual)

Connect (you did this — good)
Inspect: Scan column headers, types, nulls
Transform: Rename, change types, split columns, remove rows
Shape: Pivot/unpivot, merge queries, group, aggregate
Optimize: Enable query folding, disable load for intermediates, minimize steps
Close & Apply

Ask at every step: Does this need to happen in Power Query? Can the source do it more efficiently (query folding)? If yes, let the source do the heavy lifting.

Key concepts, demystified (and meme-ready)

Applied Steps

Each action you take becomes a step. Steps are sequential and deterministic, which means: move one, and the rest keeps time like a polite clock.

You can reorder some steps, but not always safely. Changing a type-conversion step before a split can break things.

Power Query M — the under-the-hood script

M is the functional language behind transformations. You don't have to be a poet, but knowing a few lines helps.

Example:

let
  Source = Csv.Document(File.Contents("data/sales.csv"),[Delimiter=",", Columns=6, Encoding=1252, QuoteStyle=QuoteStyle.None]),
  Promoted = Table.PromoteHeaders(Source, [PromoteAllScalars=true]),
  ChangedTypes = Table.TransformColumnTypes(Promoted,{{"OrderDate", type date}, {"Sales", type number}})
in
  ChangedTypes

Query Folding — the performance fairy

Query Folding means Power Query is smart enough to translate your transformations back to the source (e.g., SQL) so the heavy work happens in the database, not locally.

Why care? Less data over the wire, faster refreshes, happier users.
How to check? In Power Query, right-click a step and look for "View Native Query" (if enabled) — if it's available, folding is happening up to that step.

Common folding-breakers: custom functions, certain transforms (like adding index columns early), or operations that your source can't express.

Reference vs Duplicate (a table, because choices matter)

Action	What it does	When to use it	Memory impact
Duplicate	Makes a full copy of the query with the same steps	When you want independent edits and both queries might diverge	Higher — duplicates repeat work unless folded
Reference	Creates a child query that points to the parent's final table	When you want a derived view (filter, aggregate) while keeping a single source of truth	Lower — more efficient, better for staging and folding

Use Reference for most derived queries. Duplicate is for when you truly need a separate copy.

Practical transformations you'll use every other day

Change Data Types: Always do this early. Bad types = bad aggregations.
Split Columns: Delimiters, fixed width, or by positions.
Merge Queries (Left/Right/Inner/Full): SQL-style joins for enriching tables.
Group By: Summaries and aggregates.
Pivot/Unpivot: Make tall data wide or wide data tall (unpivot is the unsung hero for analysis-ready tables).
Replace Values / Remove Errors: Clean the cringe.

Ask: "Would a single SQL query on the source do this cheaper?" If yes, aim for folding.

Staging queries and disable load — the behind-the-scenes cleanup

Create staging queries (Reference only) to centralize heavy transforms and preserve query folding.
Right-click a query and choose Enable Load off for intermediate queries — prevents unnecessary tables from loading into the model.

Why? Fewer tables in the model = lower memory, faster refresh.

Advanced-ish tips for real-world reliability

Use Parameters for server names, file paths, and date ranges — makes switching environments (dev → prod) easy.
Be careful with locale and encoding for CSVs.
Avoid heavy custom M if it prevents folding; prefer native SQL or source views if performance matters.
Use Query Dependencies view to visualize how queries relate — it’s the family tree of your data transformations.

Quick diagnostic: If refresh is slow, check which step loses folding and whether you can push that transform to the source.

Closing: Your power moves as a Power Query user

Always get types right early.
Prefer Reference over Duplicate for derived tables.
Keep an eye on Query Folding and push work to the source when possible (this complements your previous work on connectivity and incremental loads).
Use Disable Load for staging queries — your model will thank you.

Final word: Power Query is where you translate messy reality into analytical truth. Be intentional: transform where necessary, fold where possible, and document your queries like you mean it.

Ready for the next step? In the next unit we'll build on this by exploring incremental refresh and diagnosistics within Power Query — how to make your refreshes both fast and reliable, even when your dataset behaves like a drama queen.

Version note: This guide assumed you've already connected to sources and considered connectivity modes; it's focused on transforming that data efficiently inside Power Query Editor.

Flashcards

Mind Map

Speed Challenge

Comments (0)

Please sign in to leave a comment.

No comments yet. Be the first to comment!

Ready to practice?

Study with flashcards, timelines, and more

Earn certificates for completed courses

Bookmark content for later reference

Track your progress across all topics