Statistics and Probability for Data Science
Develop statistical intuition for inference, experimentation, and uncertainty-aware decisions.
Content
Hypothesis Testing
Versions:
Watch & Learn
AI-discovered learning video
Sign in to watch the learning video for this topic.
Hypothesis Testing for Data Science — Make Decisions, Not Wild Guesses
"If sampling and the Central Limit Theorem are your microscope, hypothesis testing is your decision-making lab coat."
You're already familiar with Sampling and the Central Limit Theorem and how distributions behave from Probability Distributions. Now we move from "what could happen" to "what we can decide based on data." Hypothesis testing is how a data scientist turns noisy samples into confident statements — and learns to say "there's evidence" without sounding like a podcast host spouting nonsense.
What is Hypothesis Testing? (Short and spicy)
Hypothesis testing is a formal framework for deciding whether observed data are consistent with a baseline assumption (the null hypothesis, H0) or whether they better support an alternative claim (the alternative hypothesis, H1).
Think of it like a courtroom: H0 is "the defendant is innocent" (no effect), and your sample data are the evidence. Hypothesis testing tells you whether the evidence is strong enough to reject innocence — but never to prove guilt beyond all doubt.
Why it matters for Data Science
- It helps you avoid chasing noise (false positives).
- It turns visual patterns (from your plots) into decisions you can report.
- It connects to confidence intervals, effect sizes, and power — all essential for reproducibility.
(And yes — after visualizing your differences in Seaborn or Plotly as in the Data Visualization module, you should test them, not just stare and nod.)
The 6-step recipe for hypothesis testing (apply this like a boss)
- State H0 and H1 — Null is usually no change/no effect. Alternative is directional or two-sided.
- Choose a significance level (α) — Commonly 0.05. Lower α reduces false positives but raises false negatives.
- Pick a test and check assumptions — t-test for means, z-test for large-sample proportions, chi-square for independence, etc. Assumptions: independence, normality (or CLT), equal variances (maybe).
- Compute the test statistic — Standardized score comparing estimate to null (e.g., t, z).
- Calculate p-value or critical value — p-value = probability of observing data as extreme as yours under H0.
- Decide and report — Reject H0 if p < α, else fail to reject. Report effect size and confidence interval.
Micro explanation: p-value vs. effect size
- p-value tells you whether an effect is unlikely under H0.
- Effect size (Cohen's d, difference in proportions) tells you if the effect is useful, not just statistically detectable.
Common tests and when to use them
| Problem | Test | Key assumptions |
|---|---|---|
| Compare two sample means (small samples) | Student's t-test (independent) | Samples independent, approx normal or CLT kicks in |
| Compare proportions | Two-proportion z-test | Large samples so sampling distribution ~ Normal |
| Paired data (before/after) | Paired t-test | Differences approx normal |
| Categorical association | Chi-square test | Expected counts not too small |
Note: The CLT you learned earlier justifies using normal-based tests even when raw data are not normal — so long as your sample size is large enough and sampling is independent.
A practical example: A/B test for click-through rate (CTR)
Scenario: You run an experiment comparing a new button (B) to the old button (A). You observe:
- A: 800 visits, 48 clicks (p_A = 0.06)
- B: 820 visits, 74 clicks (p_B = 0.0902)
Null hypothesis: H0: p_A = p_B (no difference). Alternative: H1: p_B > p_A (one-sided).
Here's a compact Python example (scipy + statsmodels) showing a two-proportion z-test and a visual of the null distribution:
# Two-proportion z-test (approximate)
import numpy as np
from statsmodels.stats.proportion import proportions_ztest
counts = np.array([74, 48]) # successes: B, A
nobs = np.array([820, 800])
stat, pval = proportions_ztest(counts, nobs, alternative='larger')
print('z-stat:', stat, 'p-value:', pval)
Interpretation: If p < 0.05, you have sufficient evidence to claim B > A at 5% significance. But also compute the difference in proportions and its confidence interval and plot it to show practical significance (remember Data Visualization!).
Visualize the test (because humans love pictures)
Plot the null distribution of the test statistic (a Normal or t distribution), mark the observed statistic, and shade the p-value area. This makes the decision obvious and communicates uncertainty to stakeholders.
"A p-value without a picture is like a joke without a punchline — you might get it, but you won't feel it."
Assumptions, pitfalls, and best practices
- Pre-register tests when possible. Don’t peek repeatedly without correction — that's how p-hacking parties start.
- Check assumptions. If normality is violated for small n, consider nonparametric tests (Mann–Whitney) or bootstrap methods.
- Report effect sizes and CI, not just p-values. A tiny effect can be significant with big n.
- Power matters. Low-powered tests often fail to detect true effects; plan sample sizes before experiments.
Quick checklist before you publish a test
- H0/H1 clearly stated
- α chosen and justified
- Appropriate test selected with assumptions checked
- Effect size and confidence intervals reported
- Visualizations show distributions and observed statistic
- Consider multiple-testing corrections if many tests
Closing: TL;DR and memorable insight
- Hypothesis testing turns sample evidence into decisions using a standard set of steps.
- Use the CLT and your knowledge of probability distributions to justify tests and interpret statistics.
- Visualize the null distribution and observed statistic — make your results feel as well as sound.
"P-values tell you how surprising the data are under the null; effect sizes tell you how meaningful the surprise is. You want both — unless you enjoy being both statistically significant and practically irrelevant."
Key takeaways
- Formulate clear hypotheses and pick tests whose assumptions you can justify.
- Always complement p-values with effect sizes and visualizations (remember the Data Visualization module).
- Plan experiments for power and report transparently — reproducibility is not optional.
Want a follow-up? I can show a full Jupyter notebook that runs the A/B test, bootstrap CIs, and draws the null distribution with Seaborn/Matplotlib so your reports look like art and your conclusions actually hold up.
Comments (0)
Please sign in to leave a comment.
No comments yet. Be the first to comment!