Courses/Python for Data Science, AI & Development/Python Foundations for Data Work

Python Foundations for Data Work

41010 views

Master core Python syntax and tooling for data tasks, from environments and notebooks to clean, reliable scripts.

Content

9 of 15

Functions and Docstrings

Functions and Docstrings in Python for Data Science Work

1952 views

beginner

python

data-science

humorous

gpt-5-mini

1952 views

Versions:

Functions and Docstrings in Python for Data Science Work

Watch & Learn

AI-discovered learning video

Start learning for free

Bookmark content and pick up later
AI-generated study materials
Flashcards, timelines, and more
Progress tracking and certificates

Free to join · No credit card required

Functions and Docstrings — Your Python Superpowers for Data Work

"Functions are the recipes; docstrings are the little sticky notes that save you from burning the soufflé."

You already know how to make decisions in code (remember: booleans and logic, and the glorious if/else branching from Conditionals and Control Flow). Functions are the next level: reusable bundles of behavior that stop you from copy-pasting the same logic into twelve different cells and then crying when a bug appears.

In this lesson we'll cover: what functions are, why they matter for data work, function anatomy, parameter types (including *args and **kwargs), scope and side effects, lambdas and higher-order functions, and — crucially — how to write docstrings that actually help future-you (and your teammates).

Why functions matter for data work

Reusability: Clean, tested behavior you can call everywhere (e.g., data cleaning steps).
Readability: A well-named function turns a block of code into a readable sentence.
Testing: Small functions are easy to unit test.
Composition: Combine functions like building blocks for pipelines.

Imagine you're preparing a dataset for ML. Instead of repeating the same normalization/cleaning code in multiple notebooks, wrap it in a function and call it from every experiment. Your future self will thank you; your past self who didn't write tests will get roasted by your teammates.

Anatomy of a function (quick tour)

Micro explanation: basic structure

def greet(name):
    """Return a friendly greeting string for name."""
    return f"Hi, {name}!"

print(greet('Ada'))  # Hi, Ada!

def introduces the function.
greet is the function name (use verbs for actions: calculate_mean, filter_outliers).
(name) are parameters.
The triple-quoted string inside is the docstring — the built-in help for the function.
return sends a value back to the caller.

Docstrings: the non-negotiable sticky note

A docstring should answer: What does this do? What are the inputs? What does it return? Any side effects or exceptions?

Good minimal docstring style (one-line + optional details):

def mean(values):
    """Compute the arithmetic mean of a sequence of numbers.

    Args:
        values (Sequence[float]): Iterable of numbers.

    Returns:
        float: The mean value.
    """
    return sum(values) / len(values)

Use styles your team prefers — NumPy style, Google style, or reStructuredText for Sphinx. The key: be consistent.

You can access it with help(mean) or mean.__doc__.

Parameters — the flavors

Positional and keyword arguments

def scale(x, factor=1.0):
    """Scale x by factor (default 1.0)."""
    return x * factor

# positional
scale(5, 2)
# keyword
scale(5, factor=2)

Default argument gotcha (mutable defaults!)

def add_tag(record, tags=[]):
    tags.append('new')
    return tags

# unexpected shared list across calls
add_tag({})  # ['new']
add_tag({})  # ['new', 'new']  <-- oops!

Fix with None sentinel:

def add_tag(record, tags=None):
    if tags is None:
        tags = []
    tags.append('new')
    return tags

*args and **kwargs

Use these when you don't know how many args might be passed (common in wrappers):

def concat(*arrays, axis=0):
    # arrays is a tuple of positional arguments
    pass

def plot(series, **plot_kwargs):
    # plot_kwargs forwarded to plotting library
    pass

Scope, side effects, and pure functions

Local scope: Variables inside a function don't touch the outside unless returned.
Global variables: Can be read, but modifying requires global — usually a smell.
Side effects: Printing, writing files, mutating inputs. OK when intentional, bad when hidden.

Prefer pure functions (no side effects, consistent outputs for same inputs) for testing and reasoning. But in data work you often need side effects (writing CSVs). Just keep them explicit.

Example using previous lessons: a predicate function with booleans

def is_outlier(x, lower, upper):
    """Return True if x is outside [lower, upper]. Uses boolean logic."""
    return (x < lower) or (x > upper)

# used with filter (control flow knowledge applies when inspecting values)
values = [1, 20, 3, 100]
filtered = [v for v in values if not is_outlier(v, 0, 50)]

Lambdas and higher-order functions

Lambdas: tiny anonymous functions, use sparingly for simple transforms.

squared = lambda x: x*x
list(map(lambda x: x*x, [1,2,3]))

Higher-order functions: functions that accept or return functions. Useful for pipelines.

def make_multiplier(factor):
    def multiply(x):
        return x * factor
    return multiply

double = make_multiplier(2)
double(5)  # 10

Map/filter/reduce or comprehensions are your friends for readable data transformations.

Docstring conventions that actually help

A practical template (Google style):

Short one-line summary.

Args:
    param1 (type): Description.
    param2 (type, optional): Description. Defaults to something.

Returns:
    type: What is returned.

Raises:
    ErrorType: When something goes wrong.

Tip: include examples. Many will copy-paste your example and expect it to work.

def normalize(col):
    """Normalize a numeric column to mean 0 and sd 1.

    Args:
        col (Sequence[float]): Input values.

    Returns:
        list[float]: Normalized values.

    Example:
        >>> normalize([1,2,3])
        [-1.0, 0.0, 1.0]
    """
    # implementation omitted
    pass

For libraries, follow NumPy/SciPy docstring conventions to integrate with Sphinx.

Quick checklist before you commit a function

Name: clear verb-based name (e.g., compute_rmse).
Docstring: one-line summary + args + returns + example.
Side effects: explicit or none.
Tests: small unit tests for edge cases (empty input, NaNs).
Avoid mutable defaults.
Keep functions short (single responsibility).

Key takeaways

Functions package behavior for reuse, readability, and testing — essential in data work.
Docstrings are the user manual for your function. One-liners are fine, but examples + parameter/return descriptions make your life easier.
Watch out for mutable defaults and hidden side effects.
Use *args/**kwargs, lambdas, and higher-order functions when they simplify your pipeline; don't overuse them.

"This is the moment where the concept finally clicks." — you, after writing one clean reusable function that saves you hours across experiments.

Want a quick exercise? Create a function clean_and_summarize(df) that (1) drops NaNs, (2) casts a date column to datetime, (3) computes column means, and (4) includes a helpful docstring and an example. Use small, testable helper functions where it makes sense.

Go forth and modularize. Your notebooks (and teammates) will breathe easier.

Flashcards

Mind Map

Speed Challenge

Comments (0)

Please sign in to leave a comment.

No comments yet. Be the first to comment!

Ready to practice?

Study with flashcards, timelines, and more

Earn certificates for completed courses

Bookmark content for later reference

Track your progress across all topics