Data Structures and Iteration
Use Python collections and iteration patterns to write expressive, efficient, and readable data-oriented code.
Content
Lists and List Comprehensions
Versions:
Watch & Learn
AI-discovered learning video
Sign in to watch the learning video for this topic.
Lists and List Comprehensions — Your Swiss Army Knife for Python Data Work
Imagine your data as a messy pile of socks. Lists are the drawers. List comprehensions are the magical folding robot that sorts, filters, and politely pairs socks while you sip coffee.
You already know the basics of writing tidy Python from the previous module on coding style and PEP 8, and you can open the REPL or call help when you forget a method. Now lets level up: lists are the most common collection type youll use in data tasks, and list comprehensions are a concise, Pythonic way to build them. This is where we go from fiddling with files (remember File I/O essentials?) to shaping data into analysis-ready formats.
What is a list? Why care in data work?
- List: an ordered, mutable collection of items (can be mixed types). Think of it as a column of values, a batch of filenames, or a pipeline of transformed numbers.
- Lists show up everywhere: when you read lines from a CSV, when you collect features, or when you store model predictions.
Key properties:
- Ordered: items keep their positions, so indexing and slicing make sense.
- Mutable: you can change, append, or pop elements.
- Heterogeneous: you can mix ints, strings, and objects (but consistent types are cleaner for data ops).
Quick examples (REPL-friendly)
# create lists
nums = [1, 2, 3, 5, 8]
names = ['alex', 'bella', 'carlos']
# indexing
first = nums[0] # 1
last = nums[-1] # 8
# slicing
slice_mid = nums[1:4] # [2, 3, 5]
# mutating
nums.append(13)
nums[2] = 4 # change 3 -> 4
Pro tip: in the REPL, use help(list) or dir(list) to explore methods; you reviewed the REPL and help in the previous module, so you know this drill.
Common list methods (and when to use them)
| Method | What it does | Data use case |
|---|---|---|
| append(x) | add x to end | collecting streaming results |
| extend(iterable) | append all items from iterable | merging lists of rows |
| insert(i, x) | insert at index i | building ordered batches |
| pop(i=-1) | remove and return item | dequeue-like operations |
| remove(x) | remove first matching x | cleaning specific bad values |
| sort(key=..., reverse=...) | in-place sort | ordering records |
| sorted(iterable) | return new sorted list | preserve original data |
Remember: PEP 8 suggests clear variable names — avoid vague names like 'l' or 'list1' unless you're in a 2-line REPL demo.
List comprehensions: the concise, readable power move
Definition: a compact syntax for constructing lists from iterables, optionally with filtering and expressions.
Basic form:
[ expression for item in iterable if condition ]
Example: square the first 10 integers
squares = [x*x for x in range(10)]
# [0, 1, 4, 9, 16, 25, 36, 49, 64, 81]
Why use them?
- Shorter than a for-loop + append
- Often faster than a pure Python loop
- Readable once you learn the pattern
But: dont cram too much logic in one comprehension. If it gets more than two clauses or a long expression, prefer a loop or helper function for clarity.
With conditionals
# keep only even squares
even_squares = [x*x for x in range(10) if x % 2 == 0]
# [0, 4, 16, 36, 64]
Transforming strings (data cleaning example)
raw = [' Alice ', 'BOB', 'charlie ']
clean = [name.strip().title() for name in raw]
# ['Alice', 'Bob', 'Charlie']
This is where your File I/O lesson meets list comprehensions: readlines() -> list -> comprehension to clean lines.
Nested comprehensions and flattening
You can nest comprehensions for 2D data, but readability drops quickly.
matrix = [[1,2,3], [4,5,6], [7,8,9]]
flattened = [val for row in matrix for val in row]
# [1,2,3,4,5,6,7,8,9]
Read it left-to-right: for each row in matrix, for each val in row, take val.
If you need complex nesting, sometimes numpy arrays or pandas DataFrames are a more natural fit in data science workflows.
When NOT to use list comprehensions
- If the expression has side effects (I/O, logging) use a for-loop.
- If the comprehension becomes unreadable.
- If you need a lazy sequence (use generator expressions instead).
Generator expression example (lazy, memory-efficient):
sum_sq = sum(x*x for x in range(10**7)) # doesn't build the whole list
Try to avoid building huge lists when you can process streams or use numpy/pandas which operate in C and are memory-efficient for numeric arrays.
Performance note
- List comprehensions are usually faster than equivalent for-loops due to C-level optimizations.
- But for heavy numeric work, use numpy arrays or pandas Series — they beat Python lists for vectorized operations.
If you ever want to micro-profile, use the timeit module or compare timeit in the REPL; a small experiment was your friend when you learned to call help() earlier.
Short exercises (try these in the REPL)
- Read lines from a small CSV (using what you learned in File I/O), split each line, and create a list of floats from the 3rd column using a list comprehension.
- Given a list of dictionaries with a 'score' key, produce a list of names for entries with score > 80.
- Flatten a list of lists containing varying lengths and then deduplicate while preserving order.
Need hints? Use help(list), and check PEP 8 for variable naming in your solutions.
Key takeaways
- Lists are your go-to mutable sequence for many small-to-medium sized data tasks.
- List comprehensions give concise, readable constructs for building lists: map + filter in one line.
- Prefer readability: if comprehension logic grows complex, extract a function or use a loop.
- Use generators or numpy/pandas for memory-heavy or numeric tasks.
"This is the moment where the concept finally clicks." — when you stop thinking of comprehensions as clever syntax and start seeing them as tools to express your data transformation intent clearly.
Go experiment in the REPL: mix your File I/O skills, call help when you need it, and keep PEP 8 style in mind. Next up: tuples, sets, and dicts — the rest of the Python collection family.
Comments (0)
Please sign in to leave a comment.
No comments yet. Be the first to comment!