Statistics and Probability for Data Science
Develop statistical intuition for inference, experimentation, and uncertainty-aware decisions.
Content
Nonparametric Tests
Versions:
Watch & Learn
AI-discovered learning video
Sign in to watch the learning video for this topic.
Nonparametric Tests: When Your Data Hates Normality (and What to Do About It)
“The t-test and ANOVA are great — until your data shows up in sweatpants.”
You just learned t-tests, ANOVA, and confidence intervals — neat tools when your residuals are behaving (i.e., roughly normal, homoscedastic, independent). But real-world data often refuses to dress up for the occasion: skewed distributions, outliers, ordinal measures, tiny sample sizes. That’s where nonparametric tests come in — the toolbox for when assumptions fail, or when you just don’t trust means.
What are nonparametric tests? (Short, sweet, and dramatic)
Nonparametric tests are statistical methods that don't assume a specific parametric form (like normality) for the population distribution. Instead of modeling means under normal assumptions, they often use medians, ranks, or resampling. They’re robust, flexible, and a little rebellious.
Why they matter: In data science you’ll see skewed revenue, ordinal survey responses (“agree / neutral / disagree”), and tiny A/B test buckets. Use nonparametric tests when parametric assumptions (normality, equal variances) are violated or when your metric is ordinal.
Quick reminder: When you might prefer nonparametric over t-tests/ANOVA
- Small sample sizes (n < ~30) and non-normal distributions
- Heavy outliers that distort means
- Ordinal data (Likert scales)
- Heteroscedasticity that can’t be fixed by transformation
Remember: earlier, we used visual checks (from Data Visualization & Storytelling) — histograms, QQ-plots, boxplots, and violin plots — to verify normality and spot outliers. If those visuals scream “nope”, nonparametric tests are your friend.
The main nonparametric tests you’ll use (and when to pick them)
Mann–Whitney U test (a.k.a. Wilcoxon rank-sum)
- Use: Compare two independent groups (like two separate A/B variants) when you can’t assume normality.
- Works with: continuous or ordinal data.
- Intuition: Rank all observations across both groups; test whether ranks differ between groups.
Wilcoxon signed-rank test
- Use: Paired data (before/after on same users) where normality for paired differences is suspect.
- Intuition: Rank absolute differences and consider signs — testing if median difference is zero.
Kruskal–Wallis H test
- Use: More than two independent groups (nonparametric alternative to one-way ANOVA).
- Intuition: Generalizes rank-based comparison to k groups.
Friedman test
- Use: Repeated measures (like ANOVA repeated measures) when normality fails.
- Intuition: Ranks within each block (subject) and compares treatments across blocks.
Spearman rank correlation
- Use: Correlation when linearity or normality is questionable; measures monotonic relationships.
Sign test / Significance of medians
- Use: Extremely simple paired test based solely on direction (sign) of differences — very robust but low power.
Bootstrap methods (nonparametric resampling)
- Use: Estimate confidence intervals for medians, percentiles, or complex statistics when analytic CIs aren’t available.
- Intuition: Resample your data with replacement many times and use the empirical distribution to build CIs — remember our earlier lessons on confidence intervals? Bootstrapping builds them without parametric assumptions.
How to choose: a mini decision tree
- Is your outcome ordinal or non-normal? → Consider nonparametric.
- Are groups independent? → Mann–Whitney (2 groups) or Kruskal–Wallis (k groups).
- Are observations paired/repeated? → Wilcoxon signed-rank or Friedman.
- Need correlation? → Spearman.
- Want CI for median or complex stat? → Bootstrap.
Tiny Python recipes (so you can stop reading and start testing)
Note: We previously used Matplotlib/Seaborn to explore distributions. Visualize first! Then run these.
- Mann–Whitney U (scipy)
from scipy import stats
u_stat, p = stats.mannwhitneyu(group_A, group_B, alternative='two-sided')
- Wilcoxon signed-rank (paired)
w_stat, p = stats.wilcoxon(before, after)
- Kruskal–Wallis (k groups)
h_stat, p = stats.kruskal(group1, group2, group3)
- Spearman correlation
rho, p = stats.spearmanr(x, y)
- Simple bootstrap for median CI
import numpy as np
boots = [np.median(np.random.choice(data, size=len(data), replace=True)) for _ in range(5000)]
ci = np.percentile(boots, [2.5, 97.5])
Tip: SciPy functions return p-values like the parametric tests; interpret them the same way, but remember nonparametric tests often have less power (harder to detect small effects).
Practical examples (real talk)
- You have customer satisfaction scores (1–5 Likert). Comparing two design prototypes? Use Mann–Whitney rather than t-test.
- You’re comparing revenue per user but distributions are long-tailed. Consider median differences with bootstrap CIs.
- A/B test with users paired across time (same users before/after feature). Wilcoxon signed-rank beats paired t-test if differences are non-normal.
Pitfalls and things your stats professor will quietly sigh about
- Nonparametric doesn't mean assumption-free. Many rely on exchangeability and independence.
- You lose power: nonparametric tests can need larger samples to detect the same effect size.
- Reporting: Don’t just give p-values. Report effect sizes (median difference, rank-biserial correlation) and visuals (boxplots, violin plots, bootstrap CIs).
Visuals + robust statistics = honest storytelling. You’ve already learned to communicate insights with Matplotlib/Seaborn — now use those plots to explain why you chose nonparametric methods.
Quick summary — the nonparametric cheat-sheet
- Use nonparametric tests when normality or equal variance assumptions fail, or data is ordinal.
- Mann–Whitney (two independent groups), Wilcoxon (paired), Kruskal–Wallis (k groups), Friedman (repeated), Spearman (correlation), Bootstrap (CIs).
- Visualize first. Report effect sizes and CIs (bootstrap if necessary).
"This is the moment where the concept finally clicks": nonparametric tests are not weaker cousins of t-tests — they’re the rugged off-road vehicles for messy, real-world data. When parametric roads disappear, nonparametric tools keep you moving.
Final lab assignment (tiny, satisfying)
- Load a skewed dataset (e.g., revenue per user), plot distribution (hist, boxplot, violin).
- Compare two groups with both t-test and Mann–Whitney. Report both p-values and a bootstrap CI for the median difference. Explain why results differ.
- Write 2–3 sentences justifying the test you chose for a stakeholder who only skims emails.
Keep your plots clean, your explanations crisp, and your statistical choices defensible. If a stakeholder asks why you didn’t use a t-test, show them the violin plot. If they still ask, show them the bootstrap CI and smile.
Comments (0)
Please sign in to leave a comment.
No comments yet. Be the first to comment!