Courses/Python for Data Science, AI & Development/Python Foundations for Data Work

Python Foundations for Data Work

41022 views

Master core Python syntax and tooling for data tasks, from environments and notebooks to clean, reliable scripts.

Content

10 of 15

Modules and Imports

Modules and Imports in Python for Data Science Foundations

3965 views

beginner

python

data-science

modules

gpt-5-mini

3965 views

Versions:

Modules and Imports in Python for Data Science Foundations

Watch & Learn

AI-discovered learning video

Start learning for free

Bookmark content and pick up later
AI-generated study materials
Flashcards, timelines, and more
Progress tracking and certificates

Free to join · No credit card required

Modules and Imports — Practical Tools for Reusable Data Code

You've already seen how to wrap logic in functions and document them with docstrings, and how conditionals steer program flow. Now imagine those tidy, well-documented functions living in a tidy apartment building where anyone on your team (or your future self at 3 AM) can rent them — that's what modules and imports give you in Python.

"This is the moment where the concept finally clicks: functions are friends, modules are neighborhoods, and imports are the subway."

What a module is (and why you care)

Module = a single Python file (e.g., utils.py) containing values, functions, and classes.
Package = a folder containing modules plus an init.py (makes it look like one big module).

Why it matters for data work:

Reuse common ETL functions across projects (cleaning, validation, feature engineering).
Keep notebooks readable by importing tested utilities instead of copy-pasting code.
Share code across teams and version control.

Real-world analogy

Think of a module as a cookbook chapter called "data_cleaning.py". Instead of rewriting the recipe each time, you import the recipe and apply it. Your data becomes edible faster.

Basic import patterns (and when to use them)

Import the whole module:

import math_utils
math_utils.normalize_column(df, "age")

Keeps namespace explicit (recommended for clarity).

Import with an alias (very common in data science):

import numpy as np
import pandas as pd

Shortens long names; standard conventions (np, pd) help readability.

Import specific names:

from math_utils import normalize_column, scale_values
normalize_column(df, "age")

Good for grabbing just what you need; watch for name collisions.

Wildcard import (don't do this in production):

from math_utils import *  # makes debugging nightmares

Pollutes your namespace and hides origins of symbols.

Example: building a tiny module (and using docstrings)

Create a file math_utils.py:

# math_utils.py
"""Utilities for numeric columns in dataframes.

Functions:
- mean_ignore_na
- normalize_column
"""

import numpy as np

def mean_ignore_na(arr):
    """Return mean ignoring missing values."""
    return np.nanmean(arr)


def normalize_column(df, col):
    """Scale a DataFrame column to zero mean and unit variance."""
    mu = mean_ignore_na(df[col])
    sigma = np.nanstd(df[col])
    df[col] = (df[col] - mu) / sigma
    return df


if __name__ == "__main__":
    # quick local tests — won't run when imported
    import pandas as pd
    df = pd.DataFrame({"x": [1, 2, None, 4]})
    print(normalize_column(df, "x"))

Notes:

We used docstrings in the module and functions — remember your Functions & Docstrings lesson.
The if __name__ == "__main__": block is a great place for small demos or smoke tests.

Packages and relative imports (project structure)

Example layout for a data pipeline:

project/
  data_pipeline/
    __init__.py
    extract.py
    transform.py
    load.py
  scripts/
    run_pipeline.py

Inside transform.py you can reference sibling module functions with relative imports:

# transform.py
from .extract import load_raw_data
from .load import save_clean

Relative imports are perfect when organizing a package that will be installed or reused.

How Python finds modules

Python searches directories in sys.path. Typical entries include:

Directory containing the running script.
Entries from the PYTHONPATH environment variable.
Standard library directories and site-packages.

import sys
print(sys.path)

If your module isn't found, check your current working directory and virtual environment.

Tips and best practices for data work

Use explicit imports in scripts and notebooks to make origins clear.
Follow common alias conventions (np, pd, plt) to help collaborators.
Keep utility modules small and focused — e.g., cleaning.py, viz.py, metrics.py.
Use if __name__ == "__main__": for module-level quick checks and demos.
Avoid from module import * — it's a readability and debugging hazard.
For long imports that slow startup, consider lazy imports inside functions:

def heavy_transform(df):
    import pyarrow as pa  # only imported when needed
    ...

Use virtual environments and requirements.txt (or pyproject.toml) to lock dependencies for reproducibility.

Interactive workflow: editing a module while a REPL is open

When you change a module during an interactive session (like a Jupyter notebook), reload it:

import importlib
import math_utils
importlib.reload(math_utils)

This avoids restarting the kernel for small edits.

Watch out for circular imports

If module A imports B and B imports A at top-level, Python can get stuck. Fixes:

Move imports inside functions (deferred import).
Refactor shared code into a new module C that both can import.

Quick practical checklist before you commit code

Are imports explicit and clear?
Are utility functions documented with docstrings?
Did you avoid wildcard imports?
Is package layout logical for reuse?
Are heavy imports deferred if they slow tests or CLI tools?

Key takeaways

Modules let you bundle and reuse functions, classes, and constants — essential for tidy data code.
Use explicit imports to keep namespaces clear; alias standard libs (np, pd) for readability.
if __name__ == "__main__": is your friend for quick demos and module-level tests.
Keep packages well-structured and prefer relative imports inside a project.
For interactive work, use importlib.reload() to refresh edited modules.

Remember: good modules are like tidy toolboxes. When your workflow needs a hammer, you should be able to pull one out without rummaging through a pile of half-broken scripts.

"The best code is the code you can find in the dark at 2 AM." — maybe you, after writing a solid module structure.

If you want, I can generate a sample project scaffold (files + content) you can drop into VS Code or a notebook to practice these patterns.

Flashcards

Mind Map

Speed Challenge

Comments (0)

Please sign in to leave a comment.

No comments yet. Be the first to comment!

Ready to practice?

Study with flashcards, timelines, and more

Earn certificates for completed courses

Bookmark content for later reference

Track your progress across all topics