Numerical Computing with NumPy
Leverage NumPy for fast array programming, broadcasting, vectorization, and linear algebra operations.
Content
Indexing and Slicing
Versions:
Watch & Learn
AI-discovered learning video
Sign in to watch the learning video for this topic.
NumPy Indexing and Slicing — Find the Data Fast (and Keep It)
Imagine your ndarray is a layered cake. You don't need to eat the whole cake — you want the chocolate layer, the corner piece, or every second berry on top. Indexing and slicing are your fork and laser scalpel.
Why this matters (building on what you already know)
You already know how to create ndarrays (see: ndarray Creation) and why dtype choices matter (see: Dtypes and Casting). Now you need to get at the data inside efficiently. Indexing and slicing let you: extract features, create masks, prepare batches for models, and write vectorized operations instead of painful Python loops (remember Data Structures and Iteration?). This is where speed and readability meet.
Quick vocabulary — the cheat sheet
- Indexing: selecting individual elements (like arr[2, 3]).
- Slicing: selecting ranges with start:stop:step (like arr[:, 1:5:2]).
- Fancy indexing / advanced indexing: using integer arrays or boolean arrays to select elements — often returns a copy.
- View vs Copy: slices produce views (no copy) — changes affect the original. Fancy indexing and boolean masking produce copies.
Basic indexing: the coordinates of your data
NumPy indexing looks like nested Python lists but with extra power.
import numpy as np
arr = np.arange(12).reshape(3,4)
# arr = [[ 0 1 2 3]
# [ 4 5 6 7]
# [ 8 9 10 11]]
print(arr[1,2]) # 6 -> row 1, col 2
print(arr[1]) # [4 5 6 7] -> row slice (1D view)
print(arr[1, :2]) # [4 5] -> row 1, first two columns
Micro explanation: single integer for a dimension collapses it. Use slices to keep dimensionality when needed.
Slicing: start:stop:step — the range operator
- Syntax: start:stop:step (stop is exclusive).
- Negative indices count from the end.
- Negative step reverses.
print(arr[::-1]) # reverse rows (step = -1)
print(arr[:, ::-1]) # reverse columns
print(arr[0:3:2]) # every other row: rows 0 and 2
Why stop is exclusive? Because programming loves half-open intervals — they make lengths = stop - start.
Boolean masking — select by condition (very Data Science)
Create a boolean array from a condition and use it to filter rows or elements. This is essential for feature selection and cleaning.
ages = np.array([18, 22, 15, 45, 34])
adult_mask = ages >= 18
adults = ages[adult_mask] # [18 22 45 34]
# Chain with other arrays
scores = np.array([55, 80, 40, 90, 70])
print(scores[ages >= 18]) # scores for adults
Micro explanation: boolean indexing returns a copy, so modifying it won't change the original array.
Fancy indexing: pick arbitrary items
Use integer arrays (or lists) to select arbitrary rows/cols.
arr = np.arange(16).reshape(4,4)
rows = np.array([0,2])
cols = np.array([1,3])
print(arr[rows]) # selects rows 0 and 2
print(arr[rows, cols]) # selects elements (0,1) and (2,3) -> [1 11]
Important: fancy indexing produces a copy, not a view.
View vs Copy — the gotcha you must remember
- Slices (using :) -> usually a view. Modifying it changes the original.
- Fancy indexing and boolean masks -> copies. Modifying them does not affect the original.
a = np.arange(6)
view = a[2:5]
view[0] = 999
print(a) # a changed -> [ 0 1 999 3 4 5]
b = a[[0,1]]
b[0] = -1
print(a) # a unchanged by fancy indexing copy
If you want an independent copy use .copy():
safe = a[2:5].copy()
Dimension tricks: np.newaxis, None, and Ellipsis
- Increase dims: arr[:, np.newaxis] or arr[:, None] adds an axis (useful for broadcasting).
- Ellipsis ... fills in missing ':' for higher-rank arrays.
v = np.array([1,2,3])
print(v.shape) # (3,)
v2 = v[:, None] # shape (3,1)
# Ellipsis:
big = np.zeros((2,3,4,5))
print(big[0, ... , 1]) # shorthand to slice last axis
Use these when preparing data: machine learning models expect (n_samples, n_features) shapes.
Common patterns you'll use daily
- select a column: X[:, 2]
- select columns 1..3: X[:, 1:4]
- select rows satisfying condition: X[X[:,0] > 0]
- add axis for broadcasting: x[:, None] + y[None, :]
| Task | Indexing pattern | Returns |
|---|---|---|
| Row slice | arr[2] or arr[2,:] | 1D view or depending on slice |
| Column slice | arr[:,2] | 1D view |
| Submatrix | arr[1:3, 2:4] | 2D view |
| Arbitrary picks | arr[[0,2,3]] | copy |
| Condition | arr[arr>0] | copy |
Performance notes (brief but crucial)
- Views are cheap (no copy). Use slices when possible for memory-critical workloads.
- Fancy indexing and boolean masks copy data; they can be expensive on large arrays.
- Prefer vectorized boolean masks to Python loops (faster thanks to contiguous memory and C backing).
Closing: key takeaways (so you can flex in interviews)
- Slicing gives views; fancy indexing/boolean masks give copies — remember this or you'll debug for hours.
- Use negative indices and negative steps to count from the end or reverse quickly.
- np.newaxis (or None) and Ellipsis are small helpers that unlock broadcasting and concise slicing in high dimensions.
- Combine indexing with what you learned about dtypes and ndarray creation: correct dtype + correct slice = fast, memory-efficient pipelines.
"When you can slice the data the right way, you don't need to loop; you just need to think." — something your future self will be grateful you learned.
Try this (2-minute practice)
- Create a (1000, 10) array of random floats.
- Extract rows where column 0 > 0.5 and column 3 < 0.2.
- From that subset, take every other column starting at column 1.
If you get stuck, remember: boolean masks combine with & and parentheses, and columns slice with start:stop:step.
Happy slicing — may your views be fast and your copies deliberate.
Comments (0)
Please sign in to leave a comment.
No comments yet. Be the first to comment!