jypi
  • Explore
ChatWays to LearnMind mapAbout

jypi

  • About Us
  • Our Mission
  • Team
  • Careers

Resources

  • Ways to Learn
  • Mind map
  • Blog
  • Help Center
  • Community Guidelines
  • Contributor Guide

Legal

  • Terms of Service
  • Privacy Policy
  • Cookie Policy
  • Content Policy

Connect

  • Twitter
  • Discord
  • Instagram
  • Contact Us
jypi

© 2026 jypi. All rights reserved.

Python for Data Science, AI & Development
Chapters

1Python Foundations for Data Work

2Data Structures and Iteration

Lists and List ComprehensionsTuples and ImmutabilityDictionaries and Dict ComprehensionsSets and Set OperationsSlicing and ViewsIterables and IteratorsGenerators and yieldEnumerate and ZipSorting and Custom KeysLambda FunctionsMap, Filter, Reduce*args and **kwargsRecursion vs IterationTime Complexity BasicsType Hints and dataclasses

3Numerical Computing with NumPy

4Data Analysis with pandas

5Data Cleaning and Feature Engineering

6Data Visualization and Storytelling

7Statistics and Probability for Data Science

8Machine Learning with scikit-learn

9Deep Learning Foundations

10Data Sources, Engineering, and Deployment

Courses/Python for Data Science, AI & Development/Data Structures and Iteration

Data Structures and Iteration

41523 views

Use Python collections and iteration patterns to write expressive, efficient, and readable data-oriented code.

Content

9 of 15

Sorting and Custom Keys

Sorting with Custom Keys in Python for Data Science
2171 views
beginner
python
data-science
sorting
gpt-5-mini
2171 views

Versions:

Sorting with Custom Keys in Python for Data Science

Watch & Learn

AI-discovered learning video

Sign in to watch the learning video for this topic.

Sign inSign up free

Start learning for free

Sign up to save progress, unlock study materials, and track your learning.

  • Bookmark content and pick up later
  • AI-generated study materials
  • Flashcards, timelines, and more
  • Progress tracking and certificates

Free to join · No credit card required

Sorting and Custom Keys — Make Python Sort Like a Data Scientist

Ever noticed how a leaderboard can be beautifully ordered one moment and catastrophically wrong the next because someone used lexicographic sorting on numbers? Welcome to the low-key chaos of sorting data. We're going to tame it.

"Sorting is the quiet magician of data pipelines — it doesn't get applause, but everything breaks without it."

This lesson assumes you already know how to iterate with tools like enumerate, zip, and how to produce data lazily with generators. We'll build on that: use enumerate to keep original indices, zip to sort parallel arrays, and remember that generators need to be materialized (list(...)) before sorting.


What this page covers

  • How Python's sorted() and list.sort() use key functions to control ordering.
  • Practical patterns for data science: sorting dicts, rankings, top-k, stable multi-key sorts.
  • Performance tips (decorate-sort-undecorate, operator helpers, heapq for top-k).

Why this matters: sorting underpins ranking, sampling, deduplication, and feature engineering. If you sort wrong, your model gets garbage, your dashboard lies, and your product manager cries.


The basics: sorted(), list.sort(), and the key parameter

  • sorted(iterable, key=..., reverse=False) returns a new list.
  • list.sort(key=..., reverse=False) sorts in-place and is slightly faster and memory-friendlier.

The magic is the key function: Python computes key(item) for each item once and sorts by those keys. That single-call-per-item behavior is a performance superpower — use it.

Example: sort people by age

people = [
    {'name': 'Ana', 'age': 34},
    {'name': 'Ben', 'age': 28},
    {'name': 'Cara', 'age': 42}
]

# ascending by age
sorted_people = sorted(people, key=lambda p: p['age'])

# descending by age
sorted_people_desc = sorted(people, key=lambda p: p['age'], reverse=True)

Fast alternative with operator.itemgetter

Use operator.itemgetter when extracting dict keys or tuple indices; it's slightly faster and clearer.

from operator import itemgetter
sorted_people = sorted(people, key=itemgetter('age'))

Sorting parallel lists — zip to the rescue

If you have features and labels in two lists and want to reorder them together, zip + sort is your friend.

features = [3.2, 1.5, 4.8]
labels   = ['C', 'A', 'B']

paired = list(zip(features, labels))
paired_sorted = sorted(paired, key=lambda x: x[0])  # sorts by feature
features_sorted, labels_sorted = zip(*paired_sorted)

If you used this kind of pairing earlier with enumerate/zip, you already know how neat this is.


Multi-key sorts and stability — why order of operations matters

Python's sort is stable — when keys are equal, original order is preserved. You can leverage stability for multi-key sorts by sorting by the least significant key first, then the next, or by returning a tuple of keys.

rows = [
    ('NY', 2019, 10),
    ('NY', 2020, 9),
    ('CA', 2020, 12),
    ('CA', 2019, 15)
]

# Sort by state then year (both ascending)
sorted_by_state_year = sorted(rows, key=lambda r: (r[0], r[1]))

# Or stable two-pass approach: year then state
rows_copy = rows[:]  # if you care about original
rows_copy.sort(key=lambda r: r[1])   # year
rows_copy.sort(key=lambda r: r[0])   # state => final sort is state, ties resolved by year

Tuple keys are concise and common in data work.


Case-insensitive sorts, lengths, and other practical keys

  • Case-insensitive string sort: key=str.lower
  • Sort by string length: key=len
  • Complex key: count of missing values in a row: key=lambda r: sum(1 for v in r if v is None)
names = ['alice', 'Bob', 'Álvaro']
# case-insensitive
sorted(names, key=str.lower)

Avoid redundant work: Decorate-Sort-Undecorate (but Python already does it)

If your key is expensive, you want it computed once per item. Good news: Python's sort already computes key(item) once per element. So this is already implemented for you — no need to manually decorate unless you're doing something unusual or using a custom comparator.

If you do need advanced control (e.g., comparator-based logic), use functools.cmp_to_key.

from functools import cmp_to_key

def compare_by_weird_rule(a, b):
    # returns negative if a<b, positive if a>b, zero if equal
    return (len(a) - len(b))

sorted_strings = sorted(strings, key=cmp_to_key(compare_by_weird_rule))

But prefer key= whenever possible for clarity and speed.


Top-k: use heapq for efficiency

If you only need the top 10 items, don't sort the entire dataset (O(n log n)). Use heapq.nlargest / nsmallest (O(n log k)).

import heapq

vals = [random.random() for _ in range(1000000)]
# top 5
top5 = heapq.nlargest(5, vals)

# top 5 by a key on complex objects
top5_people = heapq.nlargest(5, people, key=lambda p: p['age'])

Sorting dictionaries: most common pattern — sort by value

Dictionaries are unsorted (py3.7+ preserves insertion order but that's different). To rank keys by values:

scores = {'a': 10, 'b': 3, 'c': 30}
# keys sorted by value desc
ranked_keys = sorted(scores, key=scores.get, reverse=True)

# or items sorted by value
ranked_items = sorted(scores.items(), key=lambda kv: kv[1], reverse=True)

If you need the top-k keys, combine with heapq.nlargest: heapq.nlargest(k, scores, key=scores.get)


Generators and sorting — materialize first

Generators are lazy. sorted(generator) will consume the generator and sort the results; you cannot sort a generator without materializing it. If memory is a concern, consider streaming top-k (heapq) or external sorting.


Practical tips & gotchas

  • Key functions are called once per element by Python's sort — use that to your advantage.
  • Use operator.itemgetter/attrgetter for readability and speed.
  • For descending numeric sorts: prefer reverse=True instead of returning negative numbers (clearer).
  • Sort stability is your friend for multi-step sorts and tie-breaking.
  • For huge datasets where memory is limited: use heapq for top-k or external merge sorts.

Quick reference examples

  • Sort list of tuples by second element: sorted(lst, key=itemgetter(1))
  • Case-insensitive sort: sorted(names, key=str.lower)
  • Top-5 by score: heapq.nlargest(5, people, key=lambda p: p['score'])

Key takeaways — what to remember

  • Use key= to tell Python what to sort by; it’s called once per item, so it's efficient.
  • Prefer operator helpers and tuple-keys for clarity.
  • For parallel arrays, zip them, sort the pairs, then unzip.
  • Use heapq for top-k to avoid full sorts.
  • Remember sort stability for multi-key logic — it lets you build complex ordering with simple steps.

"Sorting well is like arranging your life: do the expensive decisions once, keep order stable, and don’t re-sort the whole universe when you just want the top five."

Go forth and sort responsibly — your dashboards and models will thank you.

Flashcards
Mind Map
Speed Challenge

Comments (0)

Please sign in to leave a comment.

No comments yet. Be the first to comment!

Ready to practice?

Sign up now to study with flashcards, practice questions, and more — and track your progress on this topic.

Study with flashcards, timelines, and more
Earn certificates for completed courses
Bookmark content for later reference
Track your progress across all topics