Power Query and Data Transformation
Master data ingestion and shaping using Power Query: cleaning, transforming, combining queries, and using M code for repeatable ETL.
Content
Splitting, Merging, and Appending Queries
Versions:
Watch & Learn
Splitting, Merging, and Appending Queries — Power Query Mastery (aka Data Surgery)
"Data is messy. Power Query is your scalpel. Try not to nick a cardinality." — Probably a wise BI TA
You're already off the starter pack: you know how to connect to data sources and how to coerce types and fix text/number/date transformations. Now we get into the surgical toolkit: splitting columns into useful pieces, merging datasets like a careful matchmaker, and appending tables into one glorious stack. These operations are the practical glue that turns raw sources into analysis-ready models.
Why this matters: Splitting makes fields analysable (first/last names, years, ZIP codes), merging brings related data together (customer records + transactions), and appending aggregates repeating sources (monthly files → one table). Do them well, and your visuals stop lying to you.
1) Split Columns — Chop with intention
Imagine you have a "FullName" column: "Alexandra Lopez". You can't pivot by first name if it's trapped in a single cell. Splitting extracts structure.
When to split:
- Delimiter-based data (names separated by spaces, addresses with commas)
- Fixed-width fields (legacy systems that cram 10 chars per field)
- Patterns (uppercase boundaries, digit vs non-digit)
Common methods in Power Query UI:
- Split by delimiter (space, comma, comma+space)
- Split by number of characters / positions
- Split into rows (turns one row into many)
- Split by lowercase-to-uppercase or by digit/non-digit (advanced)
Quick example (M):
Table.SplitColumn(
Source,
"FullName",
Splitter.SplitTextByDelimiter(" ", QuoteStyle.Csv),
{"FirstName","LastName"}
)
Pro tips:
- After splitting, set data types (remember Data Type Detection & Conversion). FirstName should be text; BirthYear should be Int32.
- Trim whitespace (
Transform -> Format -> Trim) before merging on split fields. - If you need only the last token (e.g., file extensions), choose "Split by delimiter -> Right-most delimiter".
2) Merge Queries — The joinery that makes insights possible
Merging in Power Query = SQL joins. But with more drama and a GUI.
Classic analogy: You have a transactions table (lots of rows) and a customers table (unique customers). Merge them to attach customer attributes to each transaction.
Join kinds (short table):
| Join kind | What it returns | Use when... |
|---|---|---|
| Left Outer | All rows from left, matching from right | You have a primary fact table and want lookup fields |
| Right Outer | All rows from right, matching from left | Rare in practice; swap tables instead |
| Inner | Only matching rows | You want only intersections (e.g., valid keys) |
| Full Outer | All rows from both | Reconcile two datasets to see mismatches |
| Anti (Left/Right) | Non-matching rows | Find or remove orphan records |
Power Query UI steps:
- Home -> Merge Queries (or Merge Queries as New)
- Select tables and matching columns (hold Ctrl for multi-column keys)
- Pick Join Kind
- Expand the new column to select fields to bring in
M example (M code):
Table.ExpandTableColumn(
Table.NestedJoin(
SalesTable,
{"CustomerID"},
CustomersTable,
{"CustomerID"},
"CustomerInfo",
JoinKind.LeftOuter
),
"CustomerInfo",
{"Name","Segment"},
{"CustomerName","CustomerSegment"}
)
Important nuances:
- Data types matter. If the customer ID in one table is text and in the other is number, the merge yields zero matches. Convert types first.
- Trim and clean key columns. Leading/trailing spaces and invisible characters are silent killers.
- Cardinality matters. Many-to-many merges can explode rows unexpectedly. Know which side is unique.
- Fuzzy merge exists when keys aren't exact (typos), but use sparingly — it’s computationally heavy and approximate.
3) Append Queries — Stack like pancakes
Appending is straightforward: you're stacking tables with the same shape. Use this for combining multiple period files or similar-shape datasets.
When to append:
- Monthly CSVs with same columns
- Split extracts across regions that should be one report
How-to:
- Home -> Append Queries (or Append as New)
- Choose two or three+ tables
M snippet:
Table.Combine({SalesJan, SalesFeb, SalesMar})
Checklist before appending:
- Column names should match exactly. If column order differs but names match, Power Query aligns by name.
- Types should be consistent (or re-typed after append). Prefer setting types after the final append to avoid repeated conversions.
- Remove leftover metadata columns from source files (like "Source.Name") if not needed.
Power tip: Use the built-in "Combine Files" experience when connecting to a folder — it automates the pattern: sample file -> transform -> apply to all files.
4) Quick comparisons: Split vs Merge vs Append
| Operation | Conceptual | Common outcome |
|---|---|---|
| Split | Break one column into many | More granular columns (e.g., FirstName, LastName) |
| Merge | Bring columns from another table | Enrich table with lookup attributes |
| Append | Stack tables vertically | One big table with more rows |
5) Workflow & Performance Best Practices
- Clean and filter early: remove unneeded rows/columns before heavy merges/appends.
- Detect or set data types after major combining steps (set types once at the end to avoid wasted computation).
- Use
Referencequeries instead of Duplicate when branching transformations — it keeps evaluations leaner. - Avoid merging on computed columns if you can create a stable key in the source. Query folding matters: when queries fold to the source, heavy operations can be pushed to the server.
- If merging many-to-many, consider aggregating one side first to control row explosion.
Common gotchas (and how to avoid crying):
- "Why am I getting no matches?" — data type mismatch or leftover spaces. Convert and Trim.
- "My append has mismatched columns." — rename columns consistently before append.
- "Merge exploded rows." — check cardinality: did you unknowingly merge a one-to-many on both sides?
- "Performance is slow." — filter before merging; minimize columns; consider incremental refresh for large datasets.
Closing — Key takeaways
- Split to expose structure. Merge to enrich. Append to consolidate.
- Always clean keys and align data types before merging or appending — you learned type hygiene earlier, remember?
- Use staging/reference queries, set types at the right time, and keep performance in mind.
Final lab assignment (tiny challenge):
- Connect to a folder of monthly sales CSVs (your "Connecting to Data Sources" skill).
- Use Combine Files to produce one table. Clean columns, split "FullName" into two, and merge customer demographics from a separate customers query.
- Verify joins by counting unmatched keys (Left Anti Join).
Do this well and your reports will stop telling fibs. Do this poorly and the slicers will shame you.
Version note: you just graduated from "I can connect to data" to "I can shape data like an artist with a very precise scalpel." Keep slicing.
Comments (0)
Please sign in to leave a comment.
No comments yet. Be the first to comment!