jypi
  • Explore
ChatWays to LearnMind mapAbout

jypi

  • About Us
  • Our Mission
  • Team
  • Careers

Resources

  • Ways to Learn
  • Mind map
  • Blog
  • Help Center
  • Community Guidelines
  • Contributor Guide

Legal

  • Terms of Service
  • Privacy Policy
  • Cookie Policy
  • Content Policy

Connect

  • Twitter
  • Discord
  • Instagram
  • Contact Us
jypi

© 2026 jypi. All rights reserved.

Python for Data Science, AI & Development
Chapters

1Python Foundations for Data Work

2Data Structures and Iteration

3Numerical Computing with NumPy

4Data Analysis with pandas

Series and DataFrame BasicsReading CSV and ExcelIndexing and SelectionFiltering and queryHandling Missing ValuesType Conversion and CategoriesSorting and RankingGroupBy and AggregationsApply and Vectorized OpsMerge, Join, and ConcatPivot Tables and CrosstabsTime Series with pandasWindow and Rolling OpsString Methods and RegexDatabase I/O with SQLAlchemy

5Data Cleaning and Feature Engineering

6Data Visualization and Storytelling

7Statistics and Probability for Data Science

8Machine Learning with scikit-learn

9Deep Learning Foundations

10Data Sources, Engineering, and Deployment

Courses/Python for Data Science, AI & Development/Data Analysis with pandas

Data Analysis with pandas

42399 views

Manipulate and analyze tabular data using pandas for indexing, joins, time series, and robust I/O.

Content

7 of 15

Sorting and Ranking

Sorting and Ranking in pandas for Data Analysis (Beginner Guide)
1364 views
beginner
humorous
visual
python
pandas
gpt-5-mini
1364 views

Versions:

Sorting and Ranking in pandas for Data Analysis (Beginner Guide)

Watch & Learn

AI-discovered learning video

Sign in to watch the learning video for this topic.

Sign inSign up free

Start learning for free

Sign up to save progress, unlock study materials, and track your learning.

  • Bookmark content and pick up later
  • AI-generated study materials
  • Flashcards, timelines, and more
  • Progress tracking and certificates

Free to join · No credit card required

Sorting and Ranking in pandas — Make Your Data Line Up Neatly

"If your data were children at recess, sorting is lining them up by height; ranking is handing out medals. Both matter."

You're already comfortable converting types, working with categories, handling missing values, and doing fast numeric work with NumPy. This lesson plugs into that flow: we sort to see structure and rank to quantify position. These operations are tiny tools that make big differences in data cleaning, exploratory analysis, and feature engineering for ML.


What this covers (quickly)

  • How to sort DataFrame and Series with pandas' sort_values and sort_index
  • Multi-column sorts, stable sorts, and the handy key= transform
  • Ranking with rank() — methods, pct, grouping, and handling NaNs
  • When to fall back to NumPy for speed

This tutorial uses small example snippets so you can copy-paste and play.


Why sorting matters (and when ranking is better)

  • Sorting: visual ordering. Useful for inspection, slicing top-k, and ordering before time-series operations.
  • Ranking: relative position. Useful for percentiles, tie-handling in leaderboards, or model features.

Imagine you want the top 3 performers per group. Sorting gets you the rows; ranking gives each entry a numeric place so you can filter with rank <= 3.


Quick example DataFrame

import pandas as pd
import numpy as np

df = pd.DataFrame({
    'id': [11, 22, 33, 44, 55, 66],
    'group': ['A', 'B', 'A', 'B', 'A', 'B'],
    'score': [88, np.nan, 92, 88, 92, 75],
    'name': ['alice', 'Bob', 'ALAN', 'bob', 'zoe', 'zoe']
})

This has: NaNs in score, mixed-case names (hello key=), repeated scores (ties), and groups for grouped operations.


Sorting basics

  • df.sort_values(by='score', ascending=False) sorts by a column
  • df.sort_index() sorts by row labels
  • df.sort_values(['group', 'score'], ascending=[True, False]) is multi-column

Examples:

# highest scores first, NaNs go to bottom by default
df.sort_values(by='score', ascending=False)

# keep stable order for ties (pandas uses 'quicksort' or you can choose 'mergesort')
df.sort_values(by='score', ascending=False, kind='mergesort')

# multi-column: group asc, score desc
df.sort_values(by=['group', 'score'], ascending=[True, False])

Case-insensitive sorting with key=

If strings have mixed case and you want case-insensitive sort, use key= which receives the column series and returns transformed values (leveraging NumPy/Pandas vectorized ops):

# sort by name case-insensitively
df.sort_values(by='name', key=lambda col: col.str.lower())

key= is great because it applies the transform only for sorting, not permanently changing the column (so you can avoid unnecessary type conversions covered earlier).


Categories and sorting

If you followed the previous 'Type Conversion and Categories' lesson, you know categorical dtype can enforce a custom order. Sorting respects ordered categoricals:

cat = pd.Categorical(['low', 'medium', 'high'], ordered=True)
df['priority'] = pd.Categorical(['high', 'low', 'medium', 'low', 'high', 'medium'], categories=cat.categories, ordered=True)

df.sort_values(by='priority')

This is cleaner than ad-hoc mapping to ints.


Ranking: numeric positions, ties, and percentiles

Series.rank() returns the rank of each value.

Parameters to know:

  • method: 'average' (default), 'min', 'max', 'first', 'dense'
  • ascending: True/False
  • pct: if True, returns percentile rank between 0 and 1
  • na_option: 'keep'|'top'|'bottom' — where to place NaNs
s = df['score']
print(s.rank())                # default average ranking
print(s.rank(ascending=False)) # higher score = rank 1
print(s.rank(method='dense'))  # dense: ranks don't skip numbers
print(s.rank(pct=True))        # percentile (0..1)

Ranking per group (very common)

# give top performer rank 1 within each group
df['rank_in_group'] = df.groupby('group')['score'].rank(ascending=False, method='dense')

This is your go-to for leaderboards, per-segment scoring, or feature creation in ML pipelines.

Handling NaNs when ranking

You can use na_option or pre-process:

# keep NaNs (rank returns NaN)
df['score'].rank(na_option='keep')

# treat NaNs as worst (use bottom)
df['score'].rank(na_option='bottom')

# or fill NaNs before ranking if they should be considered the lowest
df['score'].fillna(-999).rank(ascending=False)

Refer back to 'Handling Missing Values' for patterns on imputation vs. keeping NaNs.


When you might use NumPy instead

If you need maximum speed for large arrays, leverage NumPy's argsort and vectorized ops (we covered this in 'Numerical Computing with NumPy'). Example: get ranking by position (fast):

order = np.argsort(-df['score'].values)   # descending
ranks = np.empty_like(order)
ranks[order] = np.arange(len(df)) + 1
# ranks now contains 1..N positions (no tie-handling built-in)

Use NumPy when you are indexing large numeric arrays and want minimal Python overhead. Use pandas' rank() when you want tie-handling, grouping, or NaN-aware behavior — it's built for that.


Small checklist / tips

  • Use sort_values to reorder rows for human inspection, top-k, or stable pre-grouping.
  • Use rank() to create numeric positions (with chosen tie behavior) for features and filters.
  • Use key= for temporary transforms like case folding — avoids permanent dtype changes.
  • When performance matters, consider NumPy argsort for raw arrays, but prefer pandas when you need group-aware or NaN-aware behavior.
  • Remember categorical dtype supports custom order and is fast for repeated sorts.

Key takeaways

  1. Sort to see, rank to measure. Sorting is ordering; ranking assigns each row a numeric place.
  2. key= is your friend for temporary transformations before sorting (case-insensitive sorts).
  3. Choose rank method carefully: 'average' vs 'dense' vs 'first' change how ties behave.
  4. Handle NaNs intentionally: decide whether to keep, treat as top/bottom, or impute.
  5. Use NumPy for raw speed when you only need positional rankings and no tie logic.

Remember: ordering and ranking are deceptively powerful. Many data-cleaning and feature-engineering problems collapse into a small set of sort-and-rank operations. Next up, you might combine ranking with rolling/window calculations or convert ranks into categorical bins for models — both natural continuations from this lesson.

"If you ever feel lost in a messy table, sort it. If you need meaning beyond order, rank it."

Flashcards
Mind Map
Speed Challenge

Comments (0)

Please sign in to leave a comment.

No comments yet. Be the first to comment!

Ready to practice?

Sign up now to study with flashcards, practice questions, and more — and track your progress on this topic.

Study with flashcards, timelines, and more
Earn certificates for completed courses
Bookmark content for later reference
Track your progress across all topics