jypi
  • Explore
ChatWays to LearnMind mapAbout

jypi

  • About Us
  • Our Mission
  • Team
  • Careers

Resources

  • Ways to Learn
  • Mind map
  • Blog
  • Help Center
  • Community Guidelines
  • Contributor Guide

Legal

  • Terms of Service
  • Privacy Policy
  • Cookie Policy
  • Content Policy

Connect

  • Twitter
  • Discord
  • Instagram
  • Contact Us
jypi

© 2026 jypi. All rights reserved.

Python for Data Science, AI & Development
Chapters

1Python Foundations for Data Work

Installing Python and ToolingWorking in Jupyter and VS CodeRunning Scripts and NotebooksVariables, Types, and CastingStrings and f-stringsNumbers and ArithmeticBooleans and LogicConditionals and Control FlowFunctions and DocstringsModules and ImportsVirtual EnvironmentsErrors and ExceptionsFile I/O EssentialsCoding Style and PEP 8Using the REPL and Help

2Data Structures and Iteration

3Numerical Computing with NumPy

4Data Analysis with pandas

5Data Cleaning and Feature Engineering

6Data Visualization and Storytelling

7Statistics and Probability for Data Science

8Machine Learning with scikit-learn

9Deep Learning Foundations

10Data Sources, Engineering, and Deployment

Courses/Python for Data Science, AI & Development/Python Foundations for Data Work

Python Foundations for Data Work

41008 views

Master core Python syntax and tooling for data tasks, from environments and notebooks to clean, reliable scripts.

Content

9 of 15

Functions and Docstrings

Functions and Docstrings in Python for Data Science Work
1952 views
beginner
python
data-science
humorous
gpt-5-mini
1952 views

Versions:

Functions and Docstrings in Python for Data Science Work

Watch & Learn

AI-discovered learning video

Sign in to watch the learning video for this topic.

Sign inSign up free

Start learning for free

Sign up to save progress, unlock study materials, and track your learning.

  • Bookmark content and pick up later
  • AI-generated study materials
  • Flashcards, timelines, and more
  • Progress tracking and certificates

Free to join · No credit card required

Functions and Docstrings — Your Python Superpowers for Data Work

"Functions are the recipes; docstrings are the little sticky notes that save you from burning the soufflé."


You already know how to make decisions in code (remember: booleans and logic, and the glorious if/else branching from Conditionals and Control Flow). Functions are the next level: reusable bundles of behavior that stop you from copy-pasting the same logic into twelve different cells and then crying when a bug appears.

In this lesson we'll cover: what functions are, why they matter for data work, function anatomy, parameter types (including *args and **kwargs), scope and side effects, lambdas and higher-order functions, and — crucially — how to write docstrings that actually help future-you (and your teammates).


Why functions matter for data work

  • Reusability: Clean, tested behavior you can call everywhere (e.g., data cleaning steps).
  • Readability: A well-named function turns a block of code into a readable sentence.
  • Testing: Small functions are easy to unit test.
  • Composition: Combine functions like building blocks for pipelines.

Imagine you're preparing a dataset for ML. Instead of repeating the same normalization/cleaning code in multiple notebooks, wrap it in a function and call it from every experiment. Your future self will thank you; your past self who didn't write tests will get roasted by your teammates.


Anatomy of a function (quick tour)

Micro explanation: basic structure

def greet(name):
    """Return a friendly greeting string for name."""
    return f"Hi, {name}!"

print(greet('Ada'))  # Hi, Ada!
  • def introduces the function.
  • greet is the function name (use verbs for actions: calculate_mean, filter_outliers).
  • (name) are parameters.
  • The triple-quoted string inside is the docstring — the built-in help for the function.
  • return sends a value back to the caller.

Docstrings: the non-negotiable sticky note

A docstring should answer: What does this do? What are the inputs? What does it return? Any side effects or exceptions?

Good minimal docstring style (one-line + optional details):

def mean(values):
    """Compute the arithmetic mean of a sequence of numbers.

    Args:
        values (Sequence[float]): Iterable of numbers.

    Returns:
        float: The mean value.
    """
    return sum(values) / len(values)

Use styles your team prefers — NumPy style, Google style, or reStructuredText for Sphinx. The key: be consistent.

You can access it with help(mean) or mean.__doc__.


Parameters — the flavors

Positional and keyword arguments

def scale(x, factor=1.0):
    """Scale x by factor (default 1.0)."""
    return x * factor

# positional
scale(5, 2)
# keyword
scale(5, factor=2)

Default argument gotcha (mutable defaults!)

def add_tag(record, tags=[]):
    tags.append('new')
    return tags

# unexpected shared list across calls
add_tag({})  # ['new']
add_tag({})  # ['new', 'new']  <-- oops!

Fix with None sentinel:

def add_tag(record, tags=None):
    if tags is None:
        tags = []
    tags.append('new')
    return tags

*args and **kwargs

Use these when you don't know how many args might be passed (common in wrappers):

def concat(*arrays, axis=0):
    # arrays is a tuple of positional arguments
    pass

def plot(series, **plot_kwargs):
    # plot_kwargs forwarded to plotting library
    pass

Scope, side effects, and pure functions

  • Local scope: Variables inside a function don't touch the outside unless returned.
  • Global variables: Can be read, but modifying requires global — usually a smell.
  • Side effects: Printing, writing files, mutating inputs. OK when intentional, bad when hidden.

Prefer pure functions (no side effects, consistent outputs for same inputs) for testing and reasoning. But in data work you often need side effects (writing CSVs). Just keep them explicit.

Example using previous lessons: a predicate function with booleans

def is_outlier(x, lower, upper):
    """Return True if x is outside [lower, upper]. Uses boolean logic."""
    return (x < lower) or (x > upper)

# used with filter (control flow knowledge applies when inspecting values)
values = [1, 20, 3, 100]
filtered = [v for v in values if not is_outlier(v, 0, 50)]

Lambdas and higher-order functions

  • Lambdas: tiny anonymous functions, use sparingly for simple transforms.
squared = lambda x: x*x
list(map(lambda x: x*x, [1,2,3]))
  • Higher-order functions: functions that accept or return functions. Useful for pipelines.
def make_multiplier(factor):
    def multiply(x):
        return x * factor
    return multiply

double = make_multiplier(2)
double(5)  # 10

Map/filter/reduce or comprehensions are your friends for readable data transformations.


Docstring conventions that actually help

A practical template (Google style):

Short one-line summary.

Args:
    param1 (type): Description.
    param2 (type, optional): Description. Defaults to something.

Returns:
    type: What is returned.

Raises:
    ErrorType: When something goes wrong.

Tip: include examples. Many will copy-paste your example and expect it to work.

def normalize(col):
    """Normalize a numeric column to mean 0 and sd 1.

    Args:
        col (Sequence[float]): Input values.

    Returns:
        list[float]: Normalized values.

    Example:
        >>> normalize([1,2,3])
        [-1.0, 0.0, 1.0]
    """
    # implementation omitted
    pass

For libraries, follow NumPy/SciPy docstring conventions to integrate with Sphinx.


Quick checklist before you commit a function

  • Name: clear verb-based name (e.g., compute_rmse).
  • Docstring: one-line summary + args + returns + example.
  • Side effects: explicit or none.
  • Tests: small unit tests for edge cases (empty input, NaNs).
  • Avoid mutable defaults.
  • Keep functions short (single responsibility).

Key takeaways

  • Functions package behavior for reuse, readability, and testing — essential in data work.
  • Docstrings are the user manual for your function. One-liners are fine, but examples + parameter/return descriptions make your life easier.
  • Watch out for mutable defaults and hidden side effects.
  • Use *args/**kwargs, lambdas, and higher-order functions when they simplify your pipeline; don't overuse them.

"This is the moment where the concept finally clicks." — you, after writing one clean reusable function that saves you hours across experiments.


Want a quick exercise? Create a function clean_and_summarize(df) that (1) drops NaNs, (2) casts a date column to datetime, (3) computes column means, and (4) includes a helpful docstring and an example. Use small, testable helper functions where it makes sense.

Go forth and modularize. Your notebooks (and teammates) will breathe easier.

Flashcards
Mind Map
Speed Challenge

Comments (0)

Please sign in to leave a comment.

No comments yet. Be the first to comment!

Ready to practice?

Sign up now to study with flashcards, practice questions, and more — and track your progress on this topic.

Study with flashcards, timelines, and more
Earn certificates for completed courses
Bookmark content for later reference
Track your progress across all topics