Numerical Computing with NumPy
Leverage NumPy for fast array programming, broadcasting, vectorization, and linear algebra operations.
Content
Stacking and Splitting
Versions:
Watch & Learn
AI-discovered learning video
Sign in to watch the learning video for this topic.
NumPy Stacking and Splitting — Glue and Scissors for Arrays
"Think of arrays as LEGO plates and bricks. Stacking is how you build towers; splitting is how you pry them apart without losing your mind."
You've already learned how to reshape, transpose, and reduce arrays — so you know how to change the shape and summarize arrays. Now we’re learning how to combine and separate them cleanly. This is the practical next step when you move from toy arrays to real data pipelines: joining sensor streams, merging feature columns, or slicing up predictions into batches.
Why stacking and splitting matter (in one dramatic sentence)
When you're prepping data or assembling model inputs, you need to combine arrays in precise ways (rows vs columns, new axes vs existing axes). When postprocessing or sharding work, you need to split arrays predictably and efficiently. Get these right and your code becomes readable, fast, and Google-searchable by future-you.
The basic verbs: concatenate, stack, vstack/hstack/column_stack, split, array_split
1) concatenate — glue along an existing axis
- Use when you want to join arrays along one of their existing dimensions.
- Requirement: arrays must match in all dimensions except the one you concatenate over.
Example (1D):
import numpy as np
a = np.array([1, 2, 3])
b = np.array([4, 5, 6])
np.concatenate((a, b)) # -> array([1,2,3,4,5,6]) shape (6,)
Example (2D):
A = np.array([[1,2],[3,4]])
B = np.array([[5,6],[7,8]])
np.concatenate((A, B), axis=0) # stacks rows -> shape (4,2)
np.concatenate((A, B), axis=1) # stacks columns -> shape (2,4)
Analogy: concatenate is gluing tiles edge-to-edge along a chosen seam.
2) stack — add a new axis and bundle arrays into a single higher-dim array
- Use when you want a new dimension that groups the arrays.
- stack creates a new axis; shapes must be exactly equal.
# 1D arrays stacked -> 2D
np.stack((a, b), axis=0) # shape (2, 3) (two rows)
np.stack((a, b), axis=1) # shape (3, 2) (two columns)
# 2D arrays stacked -> 3D
np.stack((A, B), axis=0) # shape (2, 2, 2) e.g., two 'layers'
Insight: stack is like adding a new book onto a bookshelf dimension — each book preserves its pages (the inner dimensions).
3) vstack / hstack / column_stack — convenience wrappers
- vstack((a,b)) is like np.concatenate((a,b), axis=0) but will first turn 1D arrays into row vectors.
- hstack((a,b)) concatenates along axis=1 (for 2D); for 1D it's like concatenate.
- column_stack stacks 1D arrays as columns (equivalent to np.stack(arrs, axis=1) for 1D inputs).
np.vstack((a, b)) # shape (2,3) from 1D inputs -> two rows
np.column_stack((a, b)) # shape (3,2) -> columns
Splitting: np.split, np.hsplit, np.vsplit, np.array_split
- np.split(array, indices_or_sections, axis=0) divides an array into sub-arrays. If you give an integer it must divide the dimension evenly.
- np.array_split allows uneven splits (useful when length isn't divisible).
- np.hsplit, np.vsplit, np.dsplit are convenient variants for horizontal/vertical/depth splits.
Examples:
M = np.arange(12).reshape(3,4)
# M = [[ 0, 1, 2, 3],
# [ 4, 5, 6, 7],
# [ 8, 9,10,11]]
np.split(M, 3, axis=0) # -> 3 arrays, each shape (1,4) (views)
np.hsplit(M, 2) # -> 2 arrays, each shape (3,2)
np.array_split(M, 4, axis=1) # -> 4 arrays; last one may be shorter
Important: split returns views (slices) into the original array when possible — no copy. stack/concatenate always produce a new array (copy).
Rules of the road (common pitfalls)
- Shape compatibility: for concatenate, all dims except the concatenation axis must match. For stack, shapes must match exactly.
- Axis surprises: axis=0 is rows for 2D arrays. For 1D arrays, stacking often adds a dimension — be explicit with np.newaxis (a[:, None]).
- Copies vs views: splitting often returns views; stacking/concatenate creates new arrays — watch memory.
- Use array_split when partition sizes aren't even.
Why do people keep misunderstanding this?
Because 1D vs 2D inputs change how helpers behave (vstack, hstack, column_stack) and because axis numbering is zero-based. Always print shapes after stacking/splitting while you're learning.
Walkthroughs (practical recipes)
Recipe A — Build a batch matrix from multiple 1D feature vectors
vectors = [np.random.rand(10) for _ in range(100)]
# Option 1: stack into shape (100, 10)
batch = np.stack(vectors, axis=0)
# Option 2: using vstack
batch2 = np.vstack([v[np.newaxis, :] for v in vectors])
Stack is simpler and faster here.
Recipe B — Concatenate time-series horizontally to form a feature matrix
x = np.random.rand(100, 5) # 100 samples, 5 features
y = np.random.rand(100, 2) # 100 samples, 2 extra features
features = np.concatenate((x, y), axis=1) # -> shape (100, 7)
Recipe C — Split dataset into minibatches (may return views)
data = np.arange(1000).reshape(100, 10)
batches = np.array_split(data, 20, axis=0) # 20 batches, each ~5 rows
# batches is a list of arrays; modifying a batch may alter 'data' if it's a view
Performance note (because you will care)
- Frequent stacking in a loop is expensive because each stack creates a new array. Prefer collecting arrays in a Python list and calling one np.concatenate or np.stack once at the end.
Bad:
out = np.empty((0, 10))
for chunk in chunks:
out = np.vstack((out, chunk)) # repeated copying — slow
Good:
pieces = [chunk for chunk in chunks]
out = np.vstack(pieces) # single allocation
Quick reference table (mental cheat-sheet)
- concatenate: join along existing axis — arrays must match other dims
- stack: add a new axis and combine — shapes must match
- vstack/hstack: convenience for common vertical/horizontal stacking
- split/hsplit/vsplit: cut arrays into pieces (split returns views when possible)
- array_split: like split but allows uneven pieces
Key takeaways
- Stack = create a new axis (higher dimensional grouping). Use np.stack when you need a new dimension.
- Concatenate = join along an existing axis (no new axis). Use np.concatenate to append rows/columns.
- Split = efficient slicing into pieces (often views). Use array_split when sizes don't divide cleanly.
- Collect arrays then do one concatenate/stack — avoid repeated stacking in loops.
"Stack when you want a new perspective (axis). Concatenate when you want to extend the same dimension. Split when you need parts."
If you want, I can add a short interactive notebook that demonstrates these functions with visuals (heatmaps of shapes) and timing comparisons. Want that little lab?
Comments (0)
Please sign in to leave a comment.
No comments yet. Be the first to comment!