Python Foundations for Data Work
Master core Python syntax and tooling for data tasks, from environments and notebooks to clean, reliable scripts.
Content
Strings and f-strings
Versions:
Watch & Learn
AI-discovered learning video
Sign in to watch the learning video for this topic.
Strings and f-Strings — Python Foundations for Data Work
"Strings are everywhere. If your data isn't numbers, it's probably text pretending to be complicated."
You're already comfortable creating variables and casting types from the previous lesson (remember when we turned floats into ints and panicked?). Now let's make friends with Python's textual universe: strings and the shiny, powerful way to format them — f-strings. These are essential for cleaning, reporting, and producing human-readable outputs from your data pipelines and notebooks.
Why strings matter for data work
- CSVs, JSON, log messages, column names, user input, SQL queries — almost all of these are strings.
- Good string skills let you clean messy data, format numbers for reports, and build templated queries or messages quickly.
Imagine you have a model prediction 0.82345 and you need to print: "Model accuracy: 82.35%" — f-strings do this elegantly and readably.
Quick reminder: strings are a type you already met
You saw types and casting earlier. Strings in Python are of type str. You can convert other types to strings with str() — useful when concatenating things like numbers and text.
Creating strings (fast cheat-sheet)
- Single-quoted: 'hello'
- Double-quoted: "hello" (useful when your text contains apostrophes)
- Triple-quoted for multi-line or docstrings: '''multi
line''' or """multi
line""" - Raw strings: r"C:\data\file.csv" — useful for Windows paths and regex escapes
Example (in a notebook or script):
name = "Ada"
age = 29
s = "Hello, " + name + "!"
# but this is clunky for lots of pieces
Enter f-strings (formatted string literals)
Introduced in Python 3.6, f-strings let you embed expressions inside string literals using curly braces. They are concise, readable, and often faster than older methods.
Basic example:
name = "Ada"
score = 0.9567
print(f"{name} got {score:.2%} on the test")
# Ada got 95.67% on the test
Notes:
- Prefix with
forF:f"..." - Inside
{}, you can write expressions:{score * 100}, calls:{math.sqrt(x):.3f}, or attribute access:{user.name}.
Why f-strings beat manual concatenation
- Readability: the template reads like the output
- Less casting: no
str()around every variable - Expression support: compute inline
Format mini-language (the part that sells f-strings)
Inside {expr:format_spec} you can control presentation. Very common in data work.
:.2f→ fixed 2 decimal places:,→ thousand separator:.1%→ percentage with one decimal:>10:<10:^10→ alignment (right, left, center) with width:0>5→ zero-pad to width 5
Examples:
value = 12345.6789
print(f"Value: {value:,.2f}") # 'Value: 12,345.68'
print(f"Pct: {0.1734:.1%}") # 'Pct: 17.3%'
print(f"ID: {42:0>6}") # 'ID: 000042'
from datetime import datetime
now = datetime(2026, 3, 13)
print(f"Date: {now:%Y-%m-%d}") # 'Date: 2026-03-13'
These specs are indispensable for generating CSV/JSON-ready strings, reports, or pretty CLI output.
String methods you’ll use 90% of the time
Strings are immutable — methods return new strings.
.strip(),.lstrip(),.rstrip()— clean whitespace.lower(),.upper(),.title()— normalization.replace(old, new)— quick fixes.split(sep)and'sep'.join(list)— tokenization and reassembly.find(),.count(),.startswith(),.endswith()— tests and checks
Example: cleaning a CSV header:
header = ' User Name , Age, Salary\n'
clean = header.strip().lower().replace(' ', '_')
# 'user_name,_age,_salary' -> might still need split/join to fix commas
cols = [c.strip().lower().replace(' ', '_') for c in header.split(',')]
Tip: for large concatenations, prefer join over repeated + to avoid O(n^2) behavior.
Advanced f-string tricks for data work
- Embed dictionary lookups:
f"{row['amount']:,.2f}"(useful when iterating rows) - Multi-line f-strings for SQL or templates:
query = f"""
SELECT id, name, revenue
FROM accounts
WHERE region = '{region}' AND date >= '{start:%Y-%m-%d}'
ORDER BY revenue DESC
LIMIT {top_n}
"""
(But — warning — don't inject raw user input into SQL! Use parameterized queries in DB libraries.)
- Conditional expression inside f-strings:
x = None
msg = f"Value: {x if x is not None else 'N/A'}"
Performance and safety notes
- f-strings are evaluated at runtime; they're usually faster than
.format()and clearer than concatenation. - Security: never build SQL or shell commands by directly inserting untrusted input via f-strings. Parameterize queries or sanitize inputs.
Small cookbook (copy-paste friendly)
- Format a currency with commas and two decimals:
amount = 1234567.891
f"${amount:,.2f}" # '$1,234,567.89'
- Align columns for a simple CLI table:
rows = [("Alice", 95.4), ("Bob", 82.1)]
for name, score in rows:
print(f"{name:<10} | {score:>6.1f}")
- Inline math and function calls:
import math
r = 3
f"Circle area: {math.pi * r**2:.3f}"
Quick summary — the useful bits to memorize
- Use f-strings for readable, concise interpolation: f"{var}"
- Format spec is your friend:
:,.2f,:.1%,:>10 - Strings are immutable — use
.join()for heavy concatenation - Use triple-quoted f-strings for multi-line templates (but be careful with untrusted inputs)
- Remember types and casting from earlier: sometimes you still need
str()orint()before formatting
"This is the moment where the concept finally clicks." — print an f-string with formatted data and it suddenly makes sense.
Now go open your notebook (yes, the one you used when running scripts earlier) and try these on your real dataset: format the numeric columns for presentation, clean the string columns with .strip() and .lower(), and build a small f-string report to print the top 5 rows neatly. If the output makes you smile, congratulations — you're officially a string whisperer.
Comments (0)
Please sign in to leave a comment.
No comments yet. Be the first to comment!