Numerical Computing with NumPy
Leverage NumPy for fast array programming, broadcasting, vectorization, and linear algebra operations.
Content
Boolean Masking
Versions:
Watch & Learn
AI-discovered learning video
Sign in to watch the learning video for this topic.
Boolean Masking in NumPy — Filter Arrays Like a Pro
"Want to pick the red M&Ms out of a million candies without touching each one? Welcome to Boolean masking."
You already know how to grab elements by position (indexing and slicing) and how NumPy stores values in typed memory (dtypes and casting). Boolean masking is the next trick in the magician's hat: instead of selecting by where something is, you select by what it is. It's content-based selection — fast, expressive, and vectorized.
Why it matters
- Real analytics often asks: "Give me all values > threshold" or "Drop rows where condition holds." Boolean masks answer these directly without Python loops.
- Masks combine naturally with prior skills: use them after slicing, or before casting/aggregation.
- They are the foundation for filtering, conditional assignment, and generating summary statistics efficiently.
What is a Boolean mask?
- A Boolean mask is a NumPy array of dtype bool (True/False) with the same shape as the array you're filtering.
- Applying the mask to the original array returns only the elements where the mask is True — think of it as a stencil.
Micro explanation
- arr: [10, 3, 7, 12]
- mask = arr > 6 -> [True, False, True, True]
- arr[mask] -> [10, 7, 12]
Simple. Delicious. Fast.
Quick examples (code you can run now)
import numpy as np
arr = np.array([10, 3, 7, 12, 5, 20])
mask = arr > 6 # boolean array: [ True, False, True, True, False, True ]
filtered = arr[mask] # array([10, 7, 12, 20])
# Combine conditions (remember parentheses!)
mask2 = (arr > 6) & (arr < 15)
arr[mask2] # array([10, 7, 12])
# Negation
arr[~mask] # values <= 6
# Assign using a mask
arr[arr < 6] = 0 # set small values to 0
Why parentheses? Because & and | bind less tightly than comparisons. Without parentheses you'll get a ValueError or wrong logic.
Boolean masks vs. slicing vs. fancy indexing
- Slicing (arr[2:5]) selects by position (contiguous ranges).
- Fancy indexing (arr[[0, 2, 5]]) selects specific indices.
- Boolean masking selects by condition. It's the content-filtering tool.
| Operation | Use case | Returns | Typical cost |
|---|---|---|---|
| Slicing | contiguous block | view (cheap) | O(1) view |
| Fancy index | arbitrary indices | copy | O(k) |
| Boolean mask | condition-based selection | copy of matching values | O(n) to create mask + O(k) copying |
Note: Mask creation is vectorized and implemented in C — much faster than building lists in Python loops.
Multi-dimensional arrays & broadcasting
Boolean masks work on any shape. If your mask has the same shape as the array, it flattens the result to 1D of matches. Broadcasting also works — but shapes must be compatible.
M = np.array([[1, 8, 3], [4, 10, 2]])
mask = M > 3 # shape (2,3)
M[mask] # array([8,4,10]) -> flattened matches
# Broadcasting example: mask a 2x3 by a 2x1 boolean
mask2 = np.array([[True], [False]]) # shape (2,1)
M[mask2] # returns first row: [1,8,3]
Tip: If you want the same boolean mask to select rows (like Pandas), build a 1D mask of length n_rows and use it for axis-based indexing: data[mask, :]
Practical patterns you'll use every day
- Filtering out invalid values
x = np.array([1.2, np.nan, 3.4, np.nan, 2.2])
valid = ~np.isnan(x)
x_valid = x[valid]
mean = x_valid.mean()
- Conditional assignment (in-place):
scores = np.array([55, 70, 90, 40])
scores[scores < 60] = 0 # fail becomes 0
- Combining masks with logical ops (and/or/not):
mask = (x > low) & (x < high) # intersection
mask = (x < low) | (x > high) # union
- Use with structured arrays or multiple columns:
data = np.array([(1, 2.0), (2, -1.5), (3, 4.2)], dtype=[('id','i4'), ('val','f4')])
mask = data['val'] > 0
data[mask] # rows with positive 'val'
Performance notes
- Creating mask = arr > threshold is vectorized C code — very fast compared with a Python loop.
- However, mask creation scans the whole array (O(n)). If you only need the first match consider np.argmax or np.nonzero and break early in a loop when necessary.
- Memory: the mask is a boolean array — roughly 1 byte per element (platform dependent). For very large arrays, consider techniques like np.where or chunking.
Useful functions: np.where, np.nonzero, np.count_nonzero
indices = np.nonzero(arr > 6)[0] # positions where condition holds
np.count_nonzero(arr > 6) # how many
np.where(arr > 6, arr, -1) # vectorized select/replace
Common gotchas and how to avoid them
- Using Python's and/or instead of &/|: you'll get a ValueError because Python expects boolean scalars. Use & and | for elementwise boolean operations and wrap conditions in parentheses.
- Forgetting mask shape: mask must be broadcastable to the array's shape. If you try using a 1D mask to filter columns incorrectly you'll get unexpected results.
- Dtype surprises: assigning np.nan into an integer array will upcast or error. If you want to mark missing entries with NaN, ensure float dtype first (remember dtypes/casting from earlier!).
Small real-world example: filter sensor readings
Imagine a sensor stream array readings, and you want readings within safe range and not flagged bad:
readings = np.array([0.2, 5.5, -1.0, 10.2, 7.7])
flags = np.array([False, False, True, False, False]) # True means 'bad'
mask = (~flags) & (readings >= 0) & (readings <= 10)
safe = readings[mask]
Now compute stats safe.mean(), safe.std(), etc., without touching the bad values.
Key takeaways
- Boolean masks let you filter arrays by content, not position — think
arr[ condition ]. - Masks are fast (vectorized) and integrate cleanly with slicing, fancy indexing, and broadcasting.
- Use
&,|,~for elementwise logic and always wrap comparisons in parentheses. - Watch dtype interactions when assigning with masks (remember dtypes & casting).
"Boolean masking is the difference between shouting at a million candies and using a magnet that only pulls the red ones — both dramatic, but only one is efficient."
If you liked this, next up: using masks for group-wise operations and combining masks with np.take_along_axis — the party keeps getting nerdier.
Comments (0)
Please sign in to leave a comment.
No comments yet. Be the first to comment!