jypi
  • Explore
ChatWays to LearnMind mapAbout

jypi

  • About Us
  • Our Mission
  • Team
  • Careers

Resources

  • Ways to Learn
  • Mind map
  • Blog
  • Help Center
  • Community Guidelines
  • Contributor Guide

Legal

  • Terms of Service
  • Privacy Policy
  • Cookie Policy
  • Content Policy

Connect

  • Twitter
  • Discord
  • Instagram
  • Contact Us
jypi

© 2026 jypi. All rights reserved.

Supervised Machine Learning: Regression and Classification
Chapters

1Foundations of Supervised Learning

2Data Wrangling and Feature Engineering

3Exploratory Data Analysis for Predictive Modeling

Univariate Distributions and Summary StatsPairwise Relationships and CorrelationsVisualization for Regression TargetsVisualization for Class ImbalanceDetecting Nonlinearity and HeteroscedasticityMulticollinearity DiagnosticsTrain–Test Split Before EDAStratification StrategiesLeakage-Aware EDA PracticesRobust Scaling Decisions from EDAIdentifying Data Quality IssuesFeature Importance via Baseline ModelsPartial Plots for Early InsightHandling Out-of-Range ValuesData Imputation Strategy Design

4Train/Validation/Test and Cross-Validation Strategies

5Regression I: Linear Models

6Regression II: Regularization and Advanced Techniques

7Classification I: Logistic Regression and Probabilistic View

8Classification II: Thresholding, Calibration, and Metrics

9Distance- and Kernel-Based Methods

10Tree-Based Models and Ensembles

11Handling Real-World Data Issues

12Dimensionality Reduction and Feature Selection

13Model Tuning, Pipelines, and Experiment Tracking

14Model Interpretability and Responsible AI

15Deployment, Monitoring, and Capstone Project

Courses/Supervised Machine Learning: Regression and Classification/Exploratory Data Analysis for Predictive Modeling

Exploratory Data Analysis for Predictive Modeling

25147 views

EDA methods tailored to supervised tasks to reveal signal, distribution shifts, and modeling risks.

Content

1 of 15

Univariate Distributions and Summary Stats

The No-Chill Breakdown: Univariate EDA for Predictive Modeling
4225 views
intermediate
humorous
visual
machine-learning
gpt-5-mini
4225 views

Versions:

The No-Chill Breakdown: Univariate EDA for Predictive Modeling

Watch & Learn

AI-discovered learning video

Sign in to watch the learning video for this topic.

Sign inSign up free

Start learning for free

Sign up to save progress, unlock study materials, and track your learning.

  • Bookmark content and pick up later
  • AI-generated study materials
  • Flashcards, timelines, and more
  • Progress tracking and certificates

Free to join · No credit card required

Univariate Distributions & Summary Stats — The Sexy First Date of Your Features

"If you skip univariate EDA, your model will judge you in subtle, career-limiting ways." — Probably me, but also your model

You're coming off a sprint through Data Wrangling and Feature Engineering — you tamed high-cardinality beasts with feature hashing, debated sparse vs dense like a caffeinated philosopher, and built features that actually mean something without leaking the answers. Now, before you hand your lovingly engineered features to a hungry algorithm, we need the unglamorous but essential ritual: Univariate Exploratory Data Analysis (EDA).

This is where each feature gets a solo performance. We ask: who are you? How do you behave? Are you lying to me? Will you explode my model if I standardize you? Let's find out.


Why univariate EDA matters (and why your future self will send you a thank-you meme)

  • Catch garbage early: Skew, heavy tails, or a pile of zeros can torpedo assumptions behind linear models, distance metrics, and many preprocessing steps. Remember when you hashed high-cardinality categories and got a bunch of sparse columns? Those sparsity patterns deserve a univariate check too.
  • Guide transformations: Log, sqrt, Box–Cox? You won’t know until you inspect the distribution.
  • Robust scaling choices: Mean/SD vs median/IQR — pick your fighter based on the distribution.
  • Feature importance sanity check: A constant or near-constant feature is noise. A highly skewed feature might dominate a distance-based model.

The toolkit: What to compute and why

1) Core summary statistics (the essentials)

  • Count / n: How many non-missing observations? (Don’t forget missingness — it’s information)
  • Mean: Average. Sensitive to outliers.
  • Median: Middle value. Robust.
  • Std (σ): Spread around the mean. Use cautiously when skewed.
  • IQR (Q3 − Q1): Spread of the middle 50%. Robust.
  • Min / Max: Show the range and potential data-entry errors.
  • Percentiles (e.g., 1st, 5th, 95th, 99th): Help detect heavy tails.
  • Skewness: Direction and degree of asymmetry.
  • Kurtosis: Tail heaviness (not just “peakedness”).

Why both mean and median? Because if mean ≫ median, you’ve got a right tail stretching like a bad plotline.

2) Robust measures and outlier detectors

  • MAD (Median Absolute Deviation): Robust analog of standard deviation.
  • IQR-based rule: Outlier if x < Q1 − 1.5·IQR or x > Q3 + 1.5·IQR.
  • Robust z-score: (x − median) / MAD.

3) Visuals (your eyes are powerful validators)

  • Histogram + KDE: Shape, modality, tails.
  • Boxplot (with notches): Quick outlier view; medians & IQR.
  • Violin plot: If you like drama and density.
  • ECDF (Empirical CDF): Great for comparing distributions.

Quick Python cheatsheet (pandas + seaborn vibes)

# pandas summary
df['age'].describe(percentiles=[.01, .05, .25, .5, .75, .95, .99])

# skew/kurt
df['age'].skew(), df['age'].kurtosis()

# IQR and IQR-outliers
Q1 = df['age'].quantile(0.25)
Q3 = df['age'].quantile(0.75)
IQR = Q3 - Q1
outliers = df[(df['age'] < Q1 - 1.5*IQR) | (df['age'] > Q3 + 1.5*IQR)]

# MAD
mad = (np.abs(df['age'] - df['age'].median())).median()
robust_z = (df['age'] - df['age'].median()) / (1.4826 * mad)

# quick plot
import seaborn as sns
sns.histplot(df['age'], kde=True)

Examples & interpretation (read like a drama script)

Scenario A — Income is wildly right-skewed

  • Mean ($120k) ≫ median ($45k), heavy right tail.
  • Model implication: A linear model might be pulled toward the wealthy outliers.
  • Fixes: Log transform, winsorize the top 1%, or use tree-based models that are less sensitive to monotonic transformations.

Scenario B — A predictor is almost always zero

  • 95% zeros, 5% positive values (sparse)
  • If you created this via feature hashing or one-hot expansion, this is expected. But: remove near-constant features or compress them (sparse format benefits). Consider binary encoding: present/absent.

Scenario C — Numeric column with two peaks (bimodal)

  • Could represent two distinct populations (e.g., novice vs expert users).
  • Consider: splitting into two features, adding an interaction, or binning into categories.

Rules of thumb (do not ignore these)

  • If skewness magnitude > 1: consider transformation.
  • If kurtosis >> 3: inspect tail percentiles (95/99) before trusting mean/SD.
  • If > 90% identical values: drop or re-encode — it won't help supervised learning.
  • If missingness correlates with target: create a missing indicator — missingness can be predictive.

Table: Choosing central tendency & spread — quick lookup

Situation Use central tendency Use spread measure Why
Symmetric, light tails Mean Std Efficient for Gaussian-like data
Skewed Median IQR / MAD Robust to outliers
Heavy tails Median MAD / Percentiles Captures extreme behavior without distortion
Sparse with zeros Median or rate Proportion non-zero + IQR Zero inflation needs special handling

Practical checklist before modeling (your pre-flight inspection)

  1. For each numeric feature: compute count, missing%, mean, median, std, IQR, skew, kurtosis, 1/99 percentiles.
  2. Visualize with histogram + boxplot (or violin). Spot-check distributions across target classes.
  3. If extreme skew or heavy tails: try log/Box–Cox/Yeo–Johnson; re-evaluate.
  4. Mark near-constant features for removal or special encoding.
  5. For sparse features (e.g., after hashing or one-hot): consider sparse matrices and check density; aggregate rare levels.
  6. Create a small transformation pipeline (fit on train only) and test its effect on validation performance.

Final pep talk + key takeaways

Univariate EDA isn’t decorative — it’s the stabilizer that keeps your predictive models from lurching into nonsensical behavior. It's the difference between a model that generalizes and one that memorizes weird artifacts (or screams at your validation set). You've already learned to wrestle high-cardinality monsters and choose between sparse and dense representations; now look at the face of every feature and ask it the important questions:

  • Are you skewed? Transformable?
  • Are you a tiny but predictive minority (sparsity)?
  • Do you hide missingness that’s actually a signal?

Do these checks early and document your decisions. Your future reproducible-self (and whoever inherits your notebook) will thank you — possibly with a GIF.

TL;DR: Summarize, visualize, decide. Mean vs median is not an aesthetic choice — it's a battle plan.


version_notes: "Builds on prior feature-engineering lessons like hashing and sparse/dense choices."

Flashcards
Mind Map
Speed Challenge

Comments (0)

Please sign in to leave a comment.

No comments yet. Be the first to comment!

Ready to practice?

Sign up now to study with flashcards, practice questions, and more — and track your progress on this topic.

Study with flashcards, timelines, and more
Earn certificates for completed courses
Bookmark content for later reference
Track your progress across all topics