Supervised Machine Learning: Regression and Classification

Chapters

1Foundations of Supervised Learning

2Data Wrangling and Feature Engineering

3Exploratory Data Analysis for Predictive Modeling

Univariate Distributions and Summary Stats Pairwise Relationships and Correlations Visualization for Regression Targets Visualization for Class Imbalance Detecting Nonlinearity and Heteroscedasticity Multicollinearity Diagnostics Train–Test Split Before EDA Stratification Strategies Leakage-Aware EDA Practices Robust Scaling Decisions from EDA Identifying Data Quality Issues Feature Importance via Baseline Models Partial Plots for Early Insight Handling Out-of-Range Values Data Imputation Strategy Design

4Train/Validation/Test and Cross-Validation Strategies

5Regression I: Linear Models

6Regression II: Regularization and Advanced Techniques

7Classification I: Logistic Regression and Probabilistic View

8Classification II: Thresholding, Calibration, and Metrics

9Distance- and Kernel-Based Methods

10Tree-Based Models and Ensembles

11Handling Real-World Data Issues

12Dimensionality Reduction and Feature Selection

13Model Tuning, Pipelines, and Experiment Tracking

14Model Interpretability and Responsible AI

15Deployment, Monitoring, and Capstone Project

Courses/Supervised Machine Learning: Regression and Classification/Exploratory Data Analysis for Predictive Modeling

Exploratory Data Analysis for Predictive Modeling

25159 views

EDA methods tailored to supervised tasks to reveal signal, distribution shifts, and modeling risks.

Content

2 of 15

Pairwise Relationships and Correlations

Pairwise Magic — Correlations with Sass

6358 views

intermediate

humorous

machine learning

visual

gpt-5-mini

6358 views

Versions:

Pairwise Magic — Correlations with Sass

Watch & Learn

AI-discovered learning video

Start learning for free

Bookmark content and pick up later
AI-generated study materials
Flashcards, timelines, and more
Progress tracking and certificates

Free to join · No credit card required

Pairwise Relationships and Correlations — The Romantic (and Sometimes Toxic) Lives of Features

"If a model had a group chat, pairwise relationships would be the gossip: who’s BFFs with whom, who’s secretly copying homework, and who should definitely be blocked."

You already know the solo acts — univariate distributions and summary stats. You also know how to wrestle messy features into submission (encoding, scaling, hashing, and avoiding leakage). Now we go to the group therapy session: how features behave in pairs. This is where you discover collusion, redundancy, interaction, and the occasional soulmate pair that lifts predictive power.

Why pairwise relationships matter for predictive modeling

Redundancy: Two features that say the same thing in different words can bloat your model, cause multicollinearity, and make coefficients noisy. (Think: total_spend and avg_spend_per_txn * n_txns.)
Signal discovery: A hidden relationship between A and B might explain the target better than either alone.
Feature engineering cues: Strong nonlinear pairwise patterns scream for transformations or interaction terms.
Model choice: Linear model vs tree-based — patterns in pair plots help decide which will work better.

Visual first: Exploratory plots you should actually use

1) Scatterplot (continuous vs continuous)

Use scatterplots with a smooth line (LOESS) and color by the target.
Watch for heteroscedasticity, clusters, and nonlinear trends.

Code snippet (pandas / seaborn):

import seaborn as sns
sns.scatterplot(x='feature_a', y='feature_b', hue='target', data=df, alpha=0.6)
sns.regplot(x='feature_a', y='feature_b', data=df, scatter=False, lowess=True, color='k')

2) Pairplot / scatter matrix

Great for small feature sets (<= 10). Shows marginal distributions and pairwise scatter.
For bigger sets: sample rows or plot a correlation-sorted subset.

3) Heatmap of correlation matrix

Easy global view. Beware: Pearson-only view can lie if nonlinearity exists.

4) Categorical vs continuous: boxplots / violin plots

Boxplots give median and spread differences across categories; violin shows density.

5) Categorical vs categorical: mosaic plots / contingency tables

Look for dependencies; augment with chi-square or Cramér’s V.

Numbers speak: correlation measures and when to use them

Relationship	Best measure(s)	When to use	Notes
Continuous — Continuous	Pearson	Linear association	Sensitive to outliers and nonlinearity
Continuous — Continuous	Spearman / Kendall	Monotonic but not linear	Robust to monotonic nonlinearity
Continuous — Binary target	Point-biserial (Pearson variant)	Quick check for classification	Equivalent to Pearson when target is 0/1
Categorical — Categorical	Cramér’s V	Association strength	Based on chi-square; handles >2 categories
Mixed types	Mutual Information	Nonlinear / complex relations	Nonparametric; often used with continuous discretized

Quick tip: If Pearson says 0 but scatter looks curved, Pearson is ghosting you — check Spearman or mutual information.

Practical workflow: from plots to actionable steps

Start with a correlation heatmap for continuous features and the continuous target (if regression). Use Spearman if you suspect monotonic nonlinearity.
For pairs with high absolute correlation (|r| > 0.8), inspect scatterplots and marginal distributions. Ask: are they duplicates/derivatives? (If yes: consider dropping or combining.)
For classification targets, compute point-biserial correlations or mutual information between each feature and the target. Follow up with boxplots / violin plots.
For categorical features, compute Cramér’s V and show contingency tables for the most dependent pairs.
Compute VIF (Variance Inflation Factor) if you plan a linear model. VIF > 5 (or >10) signals multicollinearity.

Code snippet: correlation + mutual info (sklearn)

from sklearn.feature_selection import mutual_info_regression, mutual_info_classif
import numpy as np

# Pearson / Spearman
pearson = df.corr(method='pearson')
spearman = df.corr(method='spearman')

# Mutual info for regression/classification
mi_reg = mutual_info_regression(X, y_reg)   # continuous target
mi_clf = mutual_info_classif(X, y_clf)      # discrete target

Pitfalls, like the ones you’ll definitely make on your first project

Spurious correlations: Two features correlate because of a confounder (time, seasonality) or pure chance. Scatter + domain sense = reality check.
Leaked features: A variable that looks predictive only because it was generated after the target (or uses target info). You already know to avoid leakage — apply the same vigilance here.
Ignoring nonlinearity: Pearson = 0 doesn’t mean no relation. Plot it.
Outliers driving correlation: A single influential point can inflate r. Use robust stats or visualize.
High-cardinality categorical features: Don’t attempt an all-pairs chi-square matrix with thousands of levels. Use target-encoding or hashing (remember feature hashing from earlier) before pairwise checks, or sample levels.

From insight to feature engineering

If features are highly correlated and semantically redundant: combine them (sum, ratio, PCA) or drop the weaker one.
If you see a nonlinear but consistent relationship: transform (log, Box-Cox) or add polynomial / spline terms.
If two weak features together explain the target: add an interaction term (product, difference, ratio).
If multicollinearity hurts interpretation (coefficients jumping all over): prefer regularization (Ridge/Lasso) or dimensionality reduction.

Example: You spot weight vs BMI vs height. Instead of keeping all three, compute the one that is most interpretable or use PCA on that trio.

A couple of advanced moves (because you’re getting greedy)

Partial correlation: Measures correlation between A and B controlling for C. Useful for teasing apart direct vs mediated relationships.
Hierarchical clustering of features (correlation distance): Cluster features by 1-|r|, cut tree to pick representative features from each cluster.
Mutual information with permutation importance: Check whether the pairwise signal genuinely adds predictive power by seeing how shuffling a feature hurts a model.

Closing: the emotional arc of pairwise EDA

Start curious. Look pretty. Question rashly.
Replace fear of correlation with healthy skepticism: visualize everything, compute the right statistic, and always ask if the relation makes domain sense.

Key takeaways:

Use the right correlation metric for the data types and suspected shape of relationship.
Visual inspection + summary statistics beats blind thresholds.
Pairwise analysis guides feature pruning, combination, transformation, and model choice — but it’s only the beginning (higher-order interactions exist).

Final commandment: Do not let a shiny high correlation seduce you into adding leaked or redundant features. Your model will love you for it, and your stakeholders will call you a wizard instead of a magician who pulled a rabbit out of the target.

Version: "Pairwise Magic — Correlations with Sass"

Flashcards

Mind Map

Speed Challenge

Comments (0)

Please sign in to leave a comment.

No comments yet. Be the first to comment!

Ready to practice?

Study with flashcards, timelines, and more

Earn certificates for completed courses

Bookmark content for later reference

Track your progress across all topics