Courses/Python for Data Science, AI & Development/Data Structures and Iteration

Data Structures and Iteration

41534 views

Use Python collections and iteration patterns to write expressive, efficient, and readable data-oriented code.

Content

3 of 15

Dictionaries and Dict Comprehensions

Dictionaries and Dict Comprehensions in Python for Data

3270 views

beginner

humorous

python

data-science

gpt-5-mini

3270 views

Versions:

Dictionaries and Dict Comprehensions in Python for Data

Watch & Learn

AI-discovered learning video

Start learning for free

Bookmark content and pick up later
AI-generated study materials
Flashcards, timelines, and more
Progress tracking and certificates

Free to join · No credit card required

Dictionaries and Dict Comprehensions — Fast, Friendly, and Functional

"If lists are grocery bags and tuples are sealed Tupperware, dictionaries are labeled spice jars — wildly useful when you need to find the thing that matches a name."

You're coming from Python Foundations for Data Work and have already met lists (and their shiny list comprehensions) and tuples (those immutably reliable friends). Now we move to the structure that makes lookups instantaneous and your code smell less like a dumpster fire: dictionaries and dict comprehensions.

What is a dictionary and why it matters for data work

Dictionary: a mutable mapping of keys → values. Keys must be hashable (strings, numbers, tuples... not lists).
Where it appears in data tasks:
- Feature lookup: map category → index or one-hot vector
- Frequency tables: token → count
- Metadata: column_name → dtype / normalization factor
- Fast joins/merges when you don't want the overhead of pandas

If lists are great for ordered sequences and tuples guarantee safety (immutability), dictionaries are unbeatable for keyed access — O(1) average-time lookups. That’s why they’re everywhere in data pipelines.

Quick reminders from earlier (lists & tuples)

You used list comprehensions to transform sequences: [x**2 for x in nums]. Expect the same elegant expressiveness with dict comprehensions: {k: v for ...}.
Tuples can be used as dictionary keys because they’re immutable; lists cannot.

Basic dictionary usage (the easy bits)

Create from literals or two lists:

# literal
d = {'a': 1, 'b': 2}

# from two lists
cols = ['id', 'name', 'age']
values = [101, 'Ada', 29]
row = dict(zip(cols, values))  # {'id':101,'name':'Ada','age':29}

Access safely:

# may raise KeyError
x = d['c']

# safe with default
x = d.get('c', 0)

Update/merge:

d.update({'b': 3, 'c': 4})
# or Python 3.5+: new_d = {**d, **other}

Iteration patterns — choose your weapon:

for k in d:          # keys
for v in d.values(): # values
for k, v in d.items(): # both
for i, (k, v) in enumerate(d.items()): # index + items

Sort while iterating:

for k in sorted(d):
    print(k, d[k])

Dict comprehensions: list comprehension's wilder cousin

Syntax mirrors list comprehensions but builds a mapping:

# basic: feature -> normalized value
counts = {'a': 3, 'b': 7, 'c': 0}
total = sum(counts.values())
norm = {k: v/total for k, v in counts.items()}

Filter while building:

# keep only frequent features
freq_filtered = {k: v for k, v in counts.items() if v >= 2}

Conditionals inside values:

# bucketize
buckets = {k: ('high' if v > 5 else 'low') for k, v in counts.items()}

Nested comprehensions (grouping/inverting):

# invert mapping: value -> list of keys that had that value
inv = {}
for k, v in d.items():
    inv.setdefault(v, []).append(k)

# or using dict + list comprehension (less efficient):
inv = {v: [k for k, val in d.items() if val == v] for v in set(d.values())}

When to prefer dict comprehension: when you can build the mapping in a single, readable expression. If you need complex aggregation, a loop or collections.defaultdict/Counter is often clearer.

Data-science flavored examples (so you can flex in notebooks)

Map categorical values to indices (useful before feeding into models):

cats = ['apple', 'banana', 'apple', 'cherry']
cat_to_idx = {cat: i for i, cat in enumerate(sorted(set(cats)))}
# {'apple': 0, 'banana': 1, 'cherry': 2}

Frequency counts — idiomatic way (Counter) vs manual dict:

from collections import Counter
Counter(cats)  # quickest

# manual (good exercise):
counts = {}
for c in cats:
    counts[c] = counts.get(c, 0) + 1

# Normalize with dict comprehension
normalized = {k: v/sum(counts.values()) for k, v in counts.items()}

Feature engineering — rename columns

raw_cols = ['Age (yrs)', 'Salary USD']
clean = {c: c.lower().replace(' ', '_').replace('(', '').replace(')', '')
         for c in raw_cols}
# {'Age (yrs)': 'age_yrs', 'Salary USD': 'salary_usd'}

Advanced tips & gotchas

Keys must be hashable: strings, numbers, tuples ok; lists and dicts not allowed.
If you need multiple values per key, store lists or use defaultdict(list).
Performance: dict lookups are O(1) on average — perfect for joins and lookups.
Beware colliding keys when merging: later keys overwrite earlier ones.
For frequency tasks, prefer collections.Counter or defaultdict for clarity and speed.

Quick comparison: list vs tuple vs dict (in one glance)

List: ordered, mutable — good for sequences
Tuple: ordered, immutable — safe as dict keys
Dictionary: unordered mapping key→value — fast lookups and labeled data

Best practices for data projects

Use dict comprehensions for readable mapping transforms and small lookups.
Use Counter/defaultdict for aggregations; use comprehensions for final transformations.
Keep keys simple and consistent (strings or tuples). Keys that are objects can be fragile when pickling or across sessions.
Document what keys mean — dictionaries are flexible but can become cryptic messes if keys are used inconsistently.

Key takeaways

Dictionaries are the go-to structure for labeled, fast-access data.
Dict comprehensions give you declarative power like list comprehensions, letting you map and filter in one line.
Use tuple keys when you need composite keys (they're immutable and hashable). Use defaultdict/Counter when aggregating.

"Think of a dict as the indexed index of your data — you can call things by name instead of rummaging through every row."

Go practice: convert a CSV header & row into a dict (zip), then write a dict comprehension to normalize numeric columns and filter out low-quality features. That combo bridges your Python Foundations into real, clean data work.

Happy mapping. When in doubt, enumerate + items() + a bit of comprehension will rescue 90% of your code smell.

Flashcards

Mind Map

Speed Challenge

Comments (0)

Please sign in to leave a comment.

No comments yet. Be the first to comment!

Ready to practice?

Study with flashcards, timelines, and more

Earn certificates for completed courses

Bookmark content for later reference

Track your progress across all topics