Courses/Python for Data Science, AI & Development/Numerical Computing with NumPy

Numerical Computing with NumPy

41597 views

Leverage NumPy for fast array programming, broadcasting, vectorization, and linear algebra operations.

Content

10 of 15

Stacking and Splitting

NumPy Stacking and Splitting Explained (With Examples)

2147 views

beginner

visual

python

data-science

humorous

gpt-5-mini

2147 views

Versions:

NumPy Stacking and Splitting Explained (With Examples)

Watch & Learn

AI-discovered learning video

Start learning for free

Bookmark content and pick up later
AI-generated study materials
Flashcards, timelines, and more
Progress tracking and certificates

Free to join · No credit card required

NumPy Stacking and Splitting — Glue and Scissors for Arrays

"Think of arrays as LEGO plates and bricks. Stacking is how you build towers; splitting is how you pry them apart without losing your mind."

You've already learned how to reshape, transpose, and reduce arrays — so you know how to change the shape and summarize arrays. Now we’re learning how to combine and separate them cleanly. This is the practical next step when you move from toy arrays to real data pipelines: joining sensor streams, merging feature columns, or slicing up predictions into batches.

Why stacking and splitting matter (in one dramatic sentence)

When you're prepping data or assembling model inputs, you need to combine arrays in precise ways (rows vs columns, new axes vs existing axes). When postprocessing or sharding work, you need to split arrays predictably and efficiently. Get these right and your code becomes readable, fast, and Google-searchable by future-you.

The basic verbs: concatenate, stack, vstack/hstack/column_stack, split, array_split

1) concatenate — glue along an existing axis

Use when you want to join arrays along one of their existing dimensions.
Requirement: arrays must match in all dimensions except the one you concatenate over.

Example (1D):

import numpy as np
a = np.array([1, 2, 3])
b = np.array([4, 5, 6])
np.concatenate((a, b))  # -> array([1,2,3,4,5,6]) shape (6,)

Example (2D):

A = np.array([[1,2],[3,4]])
B = np.array([[5,6],[7,8]])
np.concatenate((A, B), axis=0)  # stacks rows -> shape (4,2)
np.concatenate((A, B), axis=1)  # stacks columns -> shape (2,4)

Analogy: concatenate is gluing tiles edge-to-edge along a chosen seam.

2) stack — add a new axis and bundle arrays into a single higher-dim array

Use when you want a new dimension that groups the arrays.
stack creates a new axis; shapes must be exactly equal.

# 1D arrays stacked -> 2D
np.stack((a, b), axis=0)  # shape (2, 3)  (two rows)
np.stack((a, b), axis=1)  # shape (3, 2)  (two columns)

# 2D arrays stacked -> 3D
np.stack((A, B), axis=0)  # shape (2, 2, 2)  e.g., two 'layers'

Insight: stack is like adding a new book onto a bookshelf dimension — each book preserves its pages (the inner dimensions).

3) vstack / hstack / column_stack — convenience wrappers

vstack((a,b)) is like np.concatenate((a,b), axis=0) but will first turn 1D arrays into row vectors.
hstack((a,b)) concatenates along axis=1 (for 2D); for 1D it's like concatenate.
column_stack stacks 1D arrays as columns (equivalent to np.stack(arrs, axis=1) for 1D inputs).

np.vstack((a, b))        # shape (2,3) from 1D inputs -> two rows
np.column_stack((a, b)) # shape (3,2) -> columns

Splitting: np.split, np.hsplit, np.vsplit, np.array_split

np.split(array, indices_or_sections, axis=0) divides an array into sub-arrays. If you give an integer it must divide the dimension evenly.
np.array_split allows uneven splits (useful when length isn't divisible).
np.hsplit, np.vsplit, np.dsplit are convenient variants for horizontal/vertical/depth splits.

Examples:

M = np.arange(12).reshape(3,4)
# M = [[ 0, 1, 2, 3],
#      [ 4, 5, 6, 7],
#      [ 8, 9,10,11]]

np.split(M, 3, axis=0)   # -> 3 arrays, each shape (1,4) (views)
np.hsplit(M, 2)          # -> 2 arrays, each shape (3,2)
np.array_split(M, 4, axis=1)  # -> 4 arrays; last one may be shorter

Important: split returns views (slices) into the original array when possible — no copy. stack/concatenate always produce a new array (copy).

Rules of the road (common pitfalls)

Shape compatibility: for concatenate, all dims except the concatenation axis must match. For stack, shapes must match exactly.
Axis surprises: axis=0 is rows for 2D arrays. For 1D arrays, stacking often adds a dimension — be explicit with np.newaxis (a[:, None]).
Copies vs views: splitting often returns views; stacking/concatenate creates new arrays — watch memory.
Use array_split when partition sizes aren't even.

Why do people keep misunderstanding this?

Because 1D vs 2D inputs change how helpers behave (vstack, hstack, column_stack) and because axis numbering is zero-based. Always print shapes after stacking/splitting while you're learning.

Walkthroughs (practical recipes)

Recipe A — Build a batch matrix from multiple 1D feature vectors

vectors = [np.random.rand(10) for _ in range(100)]
# Option 1: stack into shape (100, 10)
batch = np.stack(vectors, axis=0)

# Option 2: using vstack
batch2 = np.vstack([v[np.newaxis, :] for v in vectors])

Stack is simpler and faster here.

Recipe B — Concatenate time-series horizontally to form a feature matrix

x = np.random.rand(100, 5)   # 100 samples, 5 features
y = np.random.rand(100, 2)   # 100 samples, 2 extra features
features = np.concatenate((x, y), axis=1)  # -> shape (100, 7)

Recipe C — Split dataset into minibatches (may return views)

data = np.arange(1000).reshape(100, 10)
batches = np.array_split(data, 20, axis=0)  # 20 batches, each ~5 rows
# batches is a list of arrays; modifying a batch may alter 'data' if it's a view

Performance note (because you will care)

Frequent stacking in a loop is expensive because each stack creates a new array. Prefer collecting arrays in a Python list and calling one np.concatenate or np.stack once at the end.

Bad:

out = np.empty((0, 10))
for chunk in chunks:
    out = np.vstack((out, chunk))   # repeated copying — slow

Good:

pieces = [chunk for chunk in chunks]
out = np.vstack(pieces)            # single allocation

Quick reference table (mental cheat-sheet)

concatenate: join along existing axis — arrays must match other dims
stack: add a new axis and combine — shapes must match
vstack/hstack: convenience for common vertical/horizontal stacking
split/hsplit/vsplit: cut arrays into pieces (split returns views when possible)
array_split: like split but allows uneven pieces

Key takeaways

Stack = create a new axis (higher dimensional grouping). Use np.stack when you need a new dimension.
Concatenate = join along an existing axis (no new axis). Use np.concatenate to append rows/columns.
Split = efficient slicing into pieces (often views). Use array_split when sizes don't divide cleanly.
Collect arrays then do one concatenate/stack — avoid repeated stacking in loops.

"Stack when you want a new perspective (axis). Concatenate when you want to extend the same dimension. Split when you need parts."

If you want, I can add a short interactive notebook that demonstrates these functions with visuals (heatmaps of shapes) and timing comparisons. Want that little lab?

Flashcards

Mind Map

Speed Challenge

Comments (0)

Please sign in to leave a comment.

No comments yet. Be the first to comment!

Ready to practice?

Study with flashcards, timelines, and more

Earn certificates for completed courses

Bookmark content for later reference

Track your progress across all topics