Python Foundations for Data Work
Master core Python syntax and tooling for data tasks, from environments and notebooks to clean, reliable scripts.
Content
Variables, Types, and Casting
Versions:
Watch & Learn
AI-discovered learning video
Sign in to watch the learning video for this topic.
Python Variables, Types, and Casting — Foundations for Data Work
Ever tried adding a string to a number and felt like Python judged you silently? Welcome to the spicy world of variables, types, and casting — the part of data work where Python tells you what your data is, not what you wish it were.
You've already learned how to run code in Jupyter and VS Code and how to execute scripts and notebooks. Now we'll build directly on that: think of this lesson as the backstage pass to what those notebooks are actually holding in memory.
What this is and why it matters
- Variables are names that hold values. They're your labeled data boxes.
- Types describe what kind of thing is inside the box (number, text, list, etc.).
- Casting (or type conversion) is changing a value from one type to another.
Why it matters for data work:
- Data cleaning: you'll frequently convert strings to numbers (and back).
- Avoiding bugs: operations depend on types (you can't multiply a string by a float and expect math).
- Performance & correctness: e.g., pandas / NumPy dtypes affect memory and operations.
"If your values don't have the right type, your analysis will be doing impressionist art instead of statistics."
Quick tour: core built-in types
Numbers
- int — integers: 1, -3, 0
- float — decimals: 3.14, -0.001
- complex — complex numbers: 1+2j (rare in most data science workflows)
Text
- str — strings: "hello", "123"
Boolean
- bool — True or False (note: bool is a subclass of int)
Collections
- list — ordered, mutable: [1, 2, 3]
- tuple — ordered, immutable: (1, 2)
- dict — key-value mapping: {"name": "Ada", "age": 30}
- set — unordered unique items: {1, 2, 3}
Missing value
- NoneType — None (often used to represent missing/unknown)
Hands-on examples (run these in your notebook or VS Code REPL)
x = 7
y = "7"
print(type(x)) # <class 'int'>
print(type(y)) # <class 'str'>
# Operations
print(x + 3) # 10
# print(x + y) # TypeError: unsupported operand types
# Casting
y_int = int(y) # convert string '7' -> int 7
print(type(y_int)) # <class 'int'>
print(x + y_int) # 14
Micro-explanation: Python won't implicitly turn a string into a number. You must ask it to convert.
Casting functions cheat-sheet
| Goal | Function | Example |
|---|---|---|
| To integer | int() | int("42") -> 42 |
| To float | float() | float("3.14") -> 3.14 |
| To string | str() | str(99) -> "99" |
| To bool | bool() | bool(0) -> False; bool("") -> False |
| To list/tuple/set | list(), tuple(), set() | list((1,2)) -> [1,2] |
Watchouts:
- int("3.14") raises ValueError — you must float("3.14") first, then int()
- bool("False") is True because any non-empty string is truthy
print(bool("False")) # True — non-empty strings evaluate True
Data-focused gotchas & best practices
Strings that look like numbers: CSV imports often produce strings. Always check types with .dtypes for pandas DataFrames.
- In pandas: df['age'].astype(int) or pandas.to_numeric(df['age'], errors='coerce') to handle bad values.
Hidden whitespace: " 42\n" -> int(" 42\n") works, but trailing characters break conversion. Use .str.strip() first.
Missing values: Converting columns with NaN to int fails in pandas; consider using nullable integer dtype (Int64) or fillna() first.
Precision: float rounding matters (use Decimal for extreme precision; in ML usually float64 is fine).
NumPy / pandas types: numpy arrays and pandas Series have their own dtypes (e.g., float32, int64, object). Use .astype() and be mindful of memory/performance.
import numpy as np
arr = np.array([1, 2, 3], dtype=np.int32)
arr = arr.astype(np.float64) # upcast to float
Practical mini-workflow: Clean a CSV column of ages
- Inspect: df['age'].head(), df['age'].dtype
- Trim: df['age'] = df['age'].str.strip()
- Convert safely: df['age'] = pd.to_numeric(df['age'], errors='coerce')
- Handle missing: df['age'] = df['age'].fillna(df['age'].median()).astype(int)
Why this sequence? String cleanup -> safe numeric parsing -> handle NaNs -> set dtype.
Useful checks and tools
- isinstance(x, int) — preferred for type-checking in code paths
- type(x) is int — strict identity check (rarely needed)
- dir(x) / help(x) — inspect methods
- Jupyter / VS Code: hover over variables or use %whos / Watch to inspect runtime types
Small reference: common errors and how to fix them
- ValueError: invalid literal for int() — fix: strip and ensure numeric-only characters, or use to_numeric(errors='coerce')
- TypeError: unsupported operand types — fix: check types and cast appropriately
- Unexpected truthiness (e.g., bool("False") == True) — fix: explicitly compare strings or parse them
Quick exercises (5–10 minutes each)
- In a Jupyter cell: assign a = " 12 ", b = 3. Convert a to an int and add to b.
- Create a list = ["1", "2.5", "three", "4"] and use a try/except to convert items to floats, storing NaN for failures.
- Load a small CSV in pandas, inspect dtypes, convert the columns that should be numeric, and print df.dtypes before/after.
Run these in Jupyter or VS Code — remember, you already know how to run cells and scripts from earlier sections.
Key takeaways
- Types describe the nature of data; variables are labels pointing to typed values.
- Python is dynamically typed: the same variable can later point to a different type, but that doesn't remove the need for explicit, careful casting.
- Casting is explicit: use int(), float(), str(), or pandas/NumPy conversions for data work.
- Watch out for truthiness and missing values — they bite in real datasets.
Final thought: treat types like contracts. If your function promises numbers, enforce the contract early (convert, validate, or raise an informative error).
If you want, next we'll do a rapid follow-up showing how pandas and NumPy dtypes interact with scikit-learn pipelines — a practical bridge from types & casting to modeling.
Comments (0)
Please sign in to leave a comment.
No comments yet. Be the first to comment!