Supervised Machine Learning: Regression and Classification

Chapters

1Foundations of Supervised Learning

2Data Wrangling and Feature Engineering

3Exploratory Data Analysis for Predictive Modeling

Univariate Distributions and Summary Stats Pairwise Relationships and Correlations Visualization for Regression Targets Visualization for Class Imbalance Detecting Nonlinearity and Heteroscedasticity Multicollinearity Diagnostics Train–Test Split Before EDA Stratification Strategies Leakage-Aware EDA Practices Robust Scaling Decisions from EDA Identifying Data Quality Issues Feature Importance via Baseline Models Partial Plots for Early Insight Handling Out-of-Range Values Data Imputation Strategy Design

4Train/Validation/Test and Cross-Validation Strategies

5Regression I: Linear Models

6Regression II: Regularization and Advanced Techniques

7Classification I: Logistic Regression and Probabilistic View

8Classification II: Thresholding, Calibration, and Metrics

9Distance- and Kernel-Based Methods

10Tree-Based Models and Ensembles

11Handling Real-World Data Issues

12Dimensionality Reduction and Feature Selection

13Model Tuning, Pipelines, and Experiment Tracking

14Model Interpretability and Responsible AI

15Deployment, Monitoring, and Capstone Project

Courses/Supervised Machine Learning: Regression and Classification/Exploratory Data Analysis for Predictive Modeling

Exploratory Data Analysis for Predictive Modeling

25159 views

EDA methods tailored to supervised tasks to reveal signal, distribution shifts, and modeling risks.

Content

4 of 15

Visualization for Class Imbalance

Class Imbalance, But Make It Dramatic

4657 views

intermediate

visual

humorous

machine learning

gpt-5-mini

4657 views

Versions:

Class Imbalance, But Make It Dramatic

Watch & Learn

AI-discovered learning video

Start learning for free

Bookmark content and pick up later
AI-generated study materials
Flashcards, timelines, and more
Progress tracking and certificates

Free to join · No credit card required

Visualization for Class Imbalance — The Little Class That Could (and Often Can't)

"If your positive class is rarer than a unicorn sighting, you're not doing EDA — you're performing archeology."

You're arriving at the party having already seen how to visualize regression targets and pairwise relationships. Great — you've got context. You also just finished wrangling and feature-engineering your dataset, so features are clean, encoded, and not leaking like a sieve. Now the last obnoxious guest: class imbalance. This chapter teaches you how to visualize it so you can decide whether to resample, reweight, engineer new features, or simply be wiser about metrics.

Why this matters (quick refresher)

Class imbalance biases learning algorithms, evaluation metrics, and even your intuition.
Visualizations help you see how imbalanced things are, how imbalance interacts with features, and whether minority-class patterns exist or are just noise.
Builds on pairwise relationships: instead of looking at correlations across the whole dataset, visualize them by class.

Ask yourself: "Does the minority class live in a different part of feature space, or is it completely mixed with the majority?" The answer determines strategy.

Core plots and what they tell you

1) Simple class-count bar chart (start here)

Why: the most honest picture of imbalance. Use absolute counts and percentages together.

What to show: side-by-side bars for counts and an overlaid text with percent.
Pitfalls: log scales hide absolute scarcity — show both linear and log if counts vary hugely.

Code snippet (seaborn/matplotlib):

import seaborn as sns
import matplotlib.pyplot as plt
sns.countplot(x='target', data=df)
for p in plt.gca().patches:
    plt.gca().annotate(f"{int(p.get_height())}\n({p.get_height()/len(df):.1%})",
                       (p.get_x()+p.get_width()/2, p.get_height()),
                       ha='center', va='bottom')
plt.title('Class counts (absolute and percent)')

2) Class vs. feature distribution (numerical)

Plot KDEs, histograms, or violin plots by class to see whether the minority class has a different distribution.
Use transparency and same x-axis limits for fair comparison.

Why: If minority-class density overlaps heavily with majority, resampling alone may not help much — you may need feature engineering.

3) Class vs. feature distribution (categorical)

Use stacked bar charts showing counts or proportions of categories per class.
Mosaic plots are great when you want joint-proportions visually.

Important: use proportions (within-class) and absolute counts side-by-side — a rare class may have a strong proportion in a category but still be few in absolute terms.

4) Pairwise plots with stratified sampling

Pairplots colored by class are excellent but O(n^2) and can get overwhelmed by many points.
Strategy: subsample majority class to match minority count for visibility, or use alpha and hex/bin plots.
This is where you build on the previous Pairwise Relationships topic: look at pairwise separation conditioned on class.

5) Dimensionality reduction visualizations (PCA / t-SNE / UMAP)

Run PCA/t-SNE/UMAP on features and color points by class.
Use this to explore separability: are minority points clustered or randomly sprinkled?
Caveats: these techniques can distort distances — don’t over-claim causality.

6) Feature importance / class-conditional feature ranking

Train a quick tree-based model (with cross-validation) and plot feature importances by how predictive they are for the minority class.
This is borderline modeling, but it’s useful as a diagnostic during EDA.

7) Correlation and contingency heatmaps per class

Compute correlation matrices for majority and minority separately and visualize differences.
For categorical pairs, use Cramér’s V heatmaps by class to see structural differences.

Practical examples & metaphors

Imagine the minority class is a hidden speakeasy in a city: stacked bar charts tell you which neighborhoods (categories) it prefers; KDEs tell you whether it sneaks into similar price brackets as the majority; PCA shows whether it exists in one small cluster or is a bunch of people scattered across the metropolis.
If the minority is clustered in PCA space, synthetic oversampling (SMOTE variants) might work. If it's scattered and indistinguishable, oversampling could make your model hallucinate.

Visualizing sampling strategies (Before / After)

Always visualize the effect of resampling (undersample/oversample/SMOTE) on class counts and feature distributions.

Plot counts before and after.
Overlay feature distributions before and after — does SMOTE create unrealistic synthetic examples? If a new synthetic density looks unnaturally smooth or extends into feature regions with zero real points, be suspicious.

Code sketch:

# show distribution before and after SMOTE (example)
sns.kdeplot(data=df, x='feature1', hue='target')
# after resample: df_resampled
sns.kdeplot(data=df_resampled, x='feature1', hue='target', linestyle='--')

Pitfalls & how to avoid lying with plots

Never plot percentages only when class counts differ drastically. A 90% majority and a 10% minority look small in % but can still be large in absolute terms.
Avoid plotting tiny minority class with same alpha/marker size as majority — it disappears.
Log scale can be useful but show linear version too so stakeholders understand absolute impact.
Be careful with overplotting. Hexbin/contour or subsampling for pairplots keeps visual noise down.

Quick decision map (visualization → action)

Minority concentrated in distinct cluster(s): consider oversampling (SMOTE variants), class-weighted loss, or targeted feature engineering for that cluster.
Minority overlaps heavily with majority: focus on richer features, better feature transforms, or domain-specific signals rather than naive resampling.
Minority concentrated in certain categories: create interaction features (category x numeric) or target-specific encodings.
Resampling creates unrealistic feature support: don’t use the synthetic data blindly — consider cost-sensitive learning instead.

Checklist: What to plot during your EDA for imbalance

Class count bar chart (absolute + percent) — always.
KDE/violin/boxplot of top numeric features by class.
Stacked bar / mosaic plots for important categorical features.
Pairwise scatter (subsampled) or hex/bin plots colored by class.
PCA / UMAP / t-SNE colored by class (with caution).
Before/After plots for any sampling strategy you might try.
Correlation/contingency differences between classes.

Closing mic-drop

If you only remember two things:

Visualize counts in absolute terms, then explore conditional feature distributions by class.
Use dimensionality reduction and pairwise plots to answer this simple but decisive question: "Do the minority cases live in a different place in feature space, or are they just fewer noisy copies of the majority?"

Do that, and you’ll skip a ton of bad modeling decisions. Go forth, plot ferociously, and never let an imbalanced dataset surprise you at the validation step.

Flashcards

Mind Map

Speed Challenge

Comments (0)

Please sign in to leave a comment.

No comments yet. Be the first to comment!

Ready to practice?

Study with flashcards, timelines, and more

Earn certificates for completed courses

Bookmark content for later reference

Track your progress across all topics