jypi
  • Explore
ChatWays to LearnMind mapAbout

jypi

  • About Us
  • Our Mission
  • Team
  • Careers

Resources

  • Ways to Learn
  • Mind map
  • Blog
  • Help Center
  • Community Guidelines
  • Contributor Guide

Legal

  • Terms of Service
  • Privacy Policy
  • Cookie Policy
  • Content Policy

Connect

  • Twitter
  • Discord
  • Instagram
  • Contact Us
jypi

© 2026 jypi. All rights reserved.

Python for Data Science, AI & Development
Chapters

1Python Foundations for Data Work

2Data Structures and Iteration

3Numerical Computing with NumPy

4Data Analysis with pandas

Series and DataFrame BasicsReading CSV and ExcelIndexing and SelectionFiltering and queryHandling Missing ValuesType Conversion and CategoriesSorting and RankingGroupBy and AggregationsApply and Vectorized OpsMerge, Join, and ConcatPivot Tables and CrosstabsTime Series with pandasWindow and Rolling OpsString Methods and RegexDatabase I/O with SQLAlchemy

5Data Cleaning and Feature Engineering

6Data Visualization and Storytelling

7Statistics and Probability for Data Science

8Machine Learning with scikit-learn

9Deep Learning Foundations

10Data Sources, Engineering, and Deployment

Courses/Python for Data Science, AI & Development/Data Analysis with pandas

Data Analysis with pandas

42399 views

Manipulate and analyze tabular data using pandas for indexing, joins, time series, and robust I/O.

Content

4 of 15

Filtering and query

Pandas Filtering and Query Explained: Boolean Masks
6269 views
beginner
pandas
data-analysis
humorous
gpt-5-mini
6269 views

Versions:

Pandas Filtering and Query Explained: Boolean Masks

Watch & Learn

AI-discovered learning video

Sign in to watch the learning video for this topic.

Sign inSign up free

Start learning for free

Sign up to save progress, unlock study materials, and track your learning.

  • Bookmark content and pick up later
  • AI-generated study materials
  • Flashcards, timelines, and more
  • Progress tracking and certificates

Free to join · No credit card required

Filtering and query in pandas — Make your DataFrame obey

"Filtering is just selection wearing a disguise — with extra attitude." — Your future self when debugging a broken mask

You've already seen how to load data (Reading CSV and Excel) and how to point at rows and columns (Indexing and Selection). You also learned fast numerical tricks using NumPy (vectorized ops, broadcasting). Now we put those together: how to slice, dice, and surgically extract the exact rows you want using boolean masks, pandas convenience methods, and the handy .query() language.


Why this matters (and where it shows up)

  • Need the Sales team only? Filter.
  • Want rows where score is missing to impute later? Filter.
  • Running group-level analytics on a subset? Filter.

Filtering is the gatekeeper of downstream analysis. If your filter is wrong, everything downstream is wrong — and that cup of coffee won't save you.


Quick reminder (you've seen these already)

  • From Indexing and Selection: use .loc for label-based selection and .iloc for position-based selection.
  • From NumPy: boolean arrays are fast; pandas uses them under the hood. Combine pandas masks with NumPy logical operations for speed when needed.

Boolean masks: the bread and butter

Basic pattern

# assume df is already loaded
mask = df['age'] > 30
df[mask]            # returns rows where mask is True
# or equivalently
df.loc[mask]
  • mask is a Series of True/False values (like a NumPy boolean array).
  • Use .loc when you plan to select columns too: df.loc[mask, ['name','salary']]

Combine conditions

Always use parentheses and bitwise operators (& for AND, | for OR, ~ for NOT):

df[(df['age'] > 30) & (df['department'] == 'Sales')]
# or with .loc
df.loc[(df['age'] > 30) & (df['department'] == 'Sales'), 'salary']

Why not use Python and/or? Because pandas overrides bitwise operators to work elementwise; Python's and/or try to evaluate truthiness of entire Series and explode spectacularly.


Useful filtering functions and idioms

  • isin: filter by membership (like SQL IN)
df[df['department'].isin(['Sales', 'HR'])]
  • between: inclusive numeric range
df[df['salary'].between(50000, 80000)]
  • str.contains for text columns (remember na=False)
df[df['email'].str.contains('@example.com', na=False)]
  • isnull / notnull for missing values
df[df['joined'].isnull()]
  • numpy logical functions (occasionally useful)
import numpy as np
mask = np.logical_and(df['age'] > 30, df['salary'] > 50000)
df.loc[mask]

This leverages NumPy's vectorized speed when you want to compose masks externally.


The .query() method — readable and fast

.query() lets you write boolean logic as a string expression. It's tidy for complex filters:

min_age = 30
df.query('age >= @min_age and department == "Sales"')
  • Use @ to reference Python variables inside the query string.
  • Use backticks for column names with spaces: df.query('Total Sales > 1000')
  • Under the hood pandas can use the 'numexpr' engine which is faster for large DataFrames.

Limitations: .query can't call arbitrary pandas methods (like .str.contains) with the default engine. You can set engine='python' to use Python expressions, but that may be slower.


Avoid the chained indexing trap

Bad:

df[df['A'] > 0]['B'] = 5   # may raise SettingWithCopyWarning or silently fail

Good:

mask = df['A'] > 0
df.loc[mask, 'B'] = 5     # safe, does in-place assignment

Chained indexing sometimes returns a view, sometimes a copy — pandas won't tell you which. Use .loc to be explicit.


Performance tips

  • Boolean indexing is vectorized and fast; avoid Python loops.
  • For very large frames, .query() / eval() with numexpr can be faster.
  • Convert high-cardinality repeated strings to categorical dtype to speed up comparisons and reduce memory.
  • If you build masks repeatedly, compute them once: mask = (cond); df.loc[mask, cols]

Example: using NumPy speed

# when combining many conditions, construct masks with numpy
import numpy as np
m1 = df['age'].values > 30
m2 = ~pd.isnull(df['salary'].values)
mask = np.logical_and(m1, m2)
df.loc[mask]

This avoids constructing many intermediate Series objects.


Real-world mini example

import pandas as pd
import numpy as np

df = pd.DataFrame({
    'name': ['Alice','Bob','Carla','Dan'],
    'age': [29, 42, 35, 28],
    'department': ['Sales','HR','Sales','IT'],
    'salary': [70000, 54000, np.nan, 48000]
})

# Who in Sales older than 30 with known salary?
mask = (df['department'] == 'Sales') & (df['age'] > 30) & df['salary'].notnull()
df.loc[mask]

# Same with query
df.query('department == "Sales" and age > 30 and salary == salary')
# salary == salary is a quick trick to test not-null in query (or use 'salary == salary')

Why people keep misunderstanding this

Because filtering looks simple until you mix up boolean operators, forget parentheses, or accidentally trigger chained indexing. Also because a Series is not a Python bool — it’s a vector of them. Treat it like the vector that it is.


Quick checklist before you run a filter

  • Did I use parentheses around conditions?
  • Did I use & / | / ~ instead of and / or / not?
  • If assigning, am I using .loc to avoid SettingWithCopyWarning?
  • Does .query() improve readability/performance for this case?
  • Are text comparisons using .str methods with na=False when necessary?

Key takeaways

  • Filtering = Boolean masks + selection. Build masks with Series operations or NumPy for speed.
  • Use .loc for selection and assignment to be explicit and safe.
  • .query() is great for readable filters and can be faster via numexpr; use @ to inject Python variables.
  • Avoid chained indexing and prefer masks computed once.

Final thought: a correct filter is the single most influential line in your analysis script. Treat it like a tiny truth oracle — check it, test it, and laugh at the rows you excluded.


If you want, I can generate a small cheat-sheet image or a runnable Jupyter notebook with these examples and more corner cases (e.g., backticks, engine options, performance benchmarks). Say the word and I’ll conjure it like a debugging genie.

Flashcards
Mind Map
Speed Challenge

Comments (0)

Please sign in to leave a comment.

No comments yet. Be the first to comment!

Ready to practice?

Sign up now to study with flashcards, practice questions, and more — and track your progress on this topic.

Study with flashcards, timelines, and more
Earn certificates for completed courses
Bookmark content for later reference
Track your progress across all topics