Data Structures and Iteration
Use Python collections and iteration patterns to write expressive, efficient, and readable data-oriented code.
Content
Slicing and Views
Versions:
Watch & Learn
AI-discovered learning video
Sign in to watch the learning video for this topic.
Slicing and Views — Why Your Slice Might Be a Copy, a View, or a Tiny Betrayal
Ever sliced a list, changed the slice, and watched the original remain stubbornly intact — and then sliced a NumPy array, changed the slice, and your original screamed in pain? Welcome to the dramatic world of slicing and views in Python. This is where memory, performance, and subtle bugs meet in a smoky jazz club and occasionally fight.
"Slicing isn't just syntax — it's a contract about who owns the data."
We're building directly on what you already know: Python Foundations for Data Work (your toolbox and IDE habits), plus earlier Data Structures topics like Dictionaries and Sets. Those taught you what collections are. Now we’ll learn how slicing behaves differently across types, why it matters for data science, and how to avoid nasty surprises when manipulating data.
What is slicing? Quick refresher
Slicing is the sequence operation using [start:stop:step]. It creates a subsequence. Syntax summary:
- seq[start:stop] — elements start..stop-1
- seq[start:stop:step] — with step (skip or reverse if negative)
- Omitted indices use defaults (start=0, stop=len(seq), step=1)
Micro explanation: Under the hood, Python hands the object a slice object (slice(start, stop, step)) — an object that getitem implementations can interpret any way they like.
s = [0,1,2,3,4,5]
print(s[1:5:2]) # [1, 3]
print(s[::-1]) # [5,4,3,2,1,0] (reverse)
Copy vs View: The core distinction
- Copy: a new object with new memory. Mutating the slice does not affect the original.
- View: a different object sharing the same memory. Mutating the slice does affect the original.
Why this is important: in data work you either want fast, memory-efficient views or safe independent copies. Pick the right tool.
How common types behave
| Type | slice returns | Mutable? | Notes |
|---|---|---|---|
| list | new list (copy) | yes | Slicing makes a fresh list. id differs. |
| tuple | new tuple (copy) | no | Tuples are immutable; slice gives a new tuple. |
| str | new str (copy) | no | Immutable, new object. |
| bytes | new bytes (copy) | no | Immutable. Use bytearray for mutability. |
| bytearray | new bytearray (copy?) or memoryview | yes | Use memoryview(obj) to get a view. |
| NumPy ndarray | view (usually) | yes | Slicing returns a view (no copy) unless complex indexing forces copy. |
| pandas DataFrame | sometimes view, sometimes copy | yes/no | Beware of chained indexing; use .loc/.iloc and .copy() if needed. |
Micro tip: List slices are safe but costly for large data; NumPy slices are cheap but can bite you if you mutate them unintentionally.
Examples you will absolutely make at 2AM
Python list: a safe copy
L = list(range(10))
sub = L[2:6]
sub[0] = 999
print(L) # original unchanged
print(sub) # mutated
Lists make a fresh object. This is predictable and safe — but copying big lists repeatedly is slow.
NumPy arrays: efficient views (and dangerous magic)
import numpy as np
A = np.arange(10)
view = A[2:6]
view[0] = 999
print(A) # A is changed! view shares memory with A
print(view)
# If you need a copy explicitly:
copy = A[2:6].copy()
copy[0] = -1
print(A) # unchanged now
Check whether two arrays share memory:
np.shares_memory(A, view) # True for views
And internals: view.base references the original data buffer when it's a true view (or None if standalone).
Fancy indexing vs slicing in NumPy
Fancy indexing with arrays of indices returns a copy (not a view):
indices = np.array([1,3,5])
sel = A[indices] # copy, not a view
This trip-up is common: slicing (A[1:6]) -> view; fancy indexing (A[[1,3,5]]) -> copy.
Pandas: the land of ambiguity (aka SettingWithCopyWarning)
DataFrame slicing returns a view or a copy depending on internal memory layout. Pandas warns you with SettingWithCopyWarning when it suspects you're assigning to a copy:
Bad pattern (chained indexing):
df = pd.DataFrame(...)
subset = df[df['col'] > 0]
subset['new'] = 1 # might be assigning to a copy -> warning
Better: use .loc and explicitly copy when required:
subset = df.loc[df['col'] > 0].copy()
subset['new'] = 1 # safe; no surprises
Rule of thumb: if you plan to modify, call .copy() on the DataFrame slice.
When to prefer views vs copies
- Use views when: data is large, you need speed and lower memory usage, and you won't accidentally mutate the original (or you intend to). Common in model inference, windowing and feature selection.
- Use copies when: you need safe independent manipulations without side effects (data cleaning, feature engineering drafts).
Performance note: copying large arrays repeatedly can convert a memory-bound pipeline into a slow, annoying pipeline. Use views and explicit copies consciously.
Practical patterns for data science
- For feature slicing in NumPy: prefer views by default, but call .copy() if you'll mutate.
- In pandas, avoid chained indexing. Use df.loc[row_mask, col_list] and .copy() when you plan to change values.
- Use memoryview(bytearray) when you need a buffer-like view into binary data.
- Check np.shares_memory or ndarray.base when debugging mysterious mutations.
Short debugging checklist
- If a change to your slice unexpectedly modifies the original: you probably have a view.
- If slicing is slow and memory-heavy: you probably are copying large objects; consider views.
- If pandas warns SettingWithCopyWarning: make a deliberate .copy() or use .loc properly.
Key takeaways (so you can recite at the next study group)
- Slicing semantics differ by type: lists and strings create copies; NumPy slices are views; pandas can be ambiguous.
- Views save memory and time — but they share data, so mutating a view mutates the original.
- When in doubt, copy explicitly with .copy() if you need an independent object.
"Treat slices like borrowing a book from a friend: if you dog-ear the pages (mutate), you should know whose book it is."
If you're building pipelines from the previous topics (dict-driven feature maps, set-based de-duplication), keep this in mind: choosing copy vs view affects both correctness and performance. Now go slice responsibly — and remember to .copy() when you're messy.
Further reading and commands to try
- Try:
np.shares_memory,arr.base,df.loc[...],my_list[:].append()(note: list slice copy so append doesn’t affect original),memoryview(bytearray(b'abc')). - Read more on pandas' SettingWithCopy docs and NumPy indexing docs when you want to level up.
Happy slicing! Your data — and your future self debugging at 3 AM — will thank you.
Comments (0)
Please sign in to leave a comment.
No comments yet. Be the first to comment!