Python Foundations for Data Work
Master core Python syntax and tooling for data tasks, from environments and notebooks to clean, reliable scripts.
Content
Modules and Imports
Versions:
Watch & Learn
AI-discovered learning video
Sign in to watch the learning video for this topic.
Modules and Imports — Practical Tools for Reusable Data Code
You've already seen how to wrap logic in functions and document them with docstrings, and how conditionals steer program flow. Now imagine those tidy, well-documented functions living in a tidy apartment building where anyone on your team (or your future self at 3 AM) can rent them — that's what modules and imports give you in Python.
"This is the moment where the concept finally clicks: functions are friends, modules are neighborhoods, and imports are the subway."
What a module is (and why you care)
- Module = a single Python file (e.g., utils.py) containing values, functions, and classes.
- Package = a folder containing modules plus an init.py (makes it look like one big module).
Why it matters for data work:
- Reuse common ETL functions across projects (cleaning, validation, feature engineering).
- Keep notebooks readable by importing tested utilities instead of copy-pasting code.
- Share code across teams and version control.
Real-world analogy
Think of a module as a cookbook chapter called "data_cleaning.py". Instead of rewriting the recipe each time, you import the recipe and apply it. Your data becomes edible faster.
Basic import patterns (and when to use them)
- Import the whole module:
import math_utils
math_utils.normalize_column(df, "age")
- Keeps namespace explicit (recommended for clarity).
- Import with an alias (very common in data science):
import numpy as np
import pandas as pd
- Shortens long names; standard conventions (np, pd) help readability.
- Import specific names:
from math_utils import normalize_column, scale_values
normalize_column(df, "age")
- Good for grabbing just what you need; watch for name collisions.
- Wildcard import (don't do this in production):
from math_utils import * # makes debugging nightmares
- Pollutes your namespace and hides origins of symbols.
Example: building a tiny module (and using docstrings)
Create a file math_utils.py:
# math_utils.py
"""Utilities for numeric columns in dataframes.
Functions:
- mean_ignore_na
- normalize_column
"""
import numpy as np
def mean_ignore_na(arr):
"""Return mean ignoring missing values."""
return np.nanmean(arr)
def normalize_column(df, col):
"""Scale a DataFrame column to zero mean and unit variance."""
mu = mean_ignore_na(df[col])
sigma = np.nanstd(df[col])
df[col] = (df[col] - mu) / sigma
return df
if __name__ == "__main__":
# quick local tests — won't run when imported
import pandas as pd
df = pd.DataFrame({"x": [1, 2, None, 4]})
print(normalize_column(df, "x"))
Notes:
- We used docstrings in the module and functions — remember your Functions & Docstrings lesson.
- The
if __name__ == "__main__":block is a great place for small demos or smoke tests.
Packages and relative imports (project structure)
Example layout for a data pipeline:
project/
data_pipeline/
__init__.py
extract.py
transform.py
load.py
scripts/
run_pipeline.py
Inside transform.py you can reference sibling module functions with relative imports:
# transform.py
from .extract import load_raw_data
from .load import save_clean
Relative imports are perfect when organizing a package that will be installed or reused.
How Python finds modules
Python searches directories in sys.path. Typical entries include:
- Directory containing the running script.
- Entries from the PYTHONPATH environment variable.
- Standard library directories and site-packages.
import sys
print(sys.path)
If your module isn't found, check your current working directory and virtual environment.
Tips and best practices for data work
- Use explicit imports in scripts and notebooks to make origins clear.
- Follow common alias conventions (np, pd, plt) to help collaborators.
- Keep utility modules small and focused — e.g., cleaning.py, viz.py, metrics.py.
- Use
if __name__ == "__main__":for module-level quick checks and demos. - Avoid
from module import *— it's a readability and debugging hazard. - For long imports that slow startup, consider lazy imports inside functions:
def heavy_transform(df):
import pyarrow as pa # only imported when needed
...
- Use virtual environments and requirements.txt (or pyproject.toml) to lock dependencies for reproducibility.
Interactive workflow: editing a module while a REPL is open
When you change a module during an interactive session (like a Jupyter notebook), reload it:
import importlib
import math_utils
importlib.reload(math_utils)
This avoids restarting the kernel for small edits.
Watch out for circular imports
If module A imports B and B imports A at top-level, Python can get stuck. Fixes:
- Move imports inside functions (deferred import).
- Refactor shared code into a new module C that both can import.
Quick practical checklist before you commit code
- Are imports explicit and clear?
- Are utility functions documented with docstrings?
- Did you avoid wildcard imports?
- Is package layout logical for reuse?
- Are heavy imports deferred if they slow tests or CLI tools?
Key takeaways
- Modules let you bundle and reuse functions, classes, and constants — essential for tidy data code.
- Use explicit imports to keep namespaces clear; alias standard libs (np, pd) for readability.
if __name__ == "__main__":is your friend for quick demos and module-level tests.- Keep packages well-structured and prefer relative imports inside a project.
- For interactive work, use
importlib.reload()to refresh edited modules.
Remember: good modules are like tidy toolboxes. When your workflow needs a hammer, you should be able to pull one out without rummaging through a pile of half-broken scripts.
"The best code is the code you can find in the dark at 2 AM." — maybe you, after writing a solid module structure.
If you want, I can generate a sample project scaffold (files + content) you can drop into VS Code or a notebook to practice these patterns.
Comments (0)
Please sign in to leave a comment.
No comments yet. Be the first to comment!