jypi
  • Explore
ChatWays to LearnMind mapAbout

jypi

  • About Us
  • Our Mission
  • Team
  • Careers

Resources

  • Ways to Learn
  • Mind map
  • Blog
  • Help Center
  • Community Guidelines
  • Contributor Guide

Legal

  • Terms of Service
  • Privacy Policy
  • Cookie Policy
  • Content Policy

Connect

  • Twitter
  • Discord
  • Instagram
  • Contact Us
jypi

© 2026 jypi. All rights reserved.

Python for Data Science, AI & Development
Chapters

1Python Foundations for Data Work

2Data Structures and Iteration

3Numerical Computing with NumPy

4Data Analysis with pandas

5Data Cleaning and Feature Engineering

Detecting and Handling OutliersImputation StrategiesScaling and NormalizationEncoding Categorical VariablesFeature Binning and DiscretizationFeature Interactions and PolynomialsText Cleaning BasicsDatetime Parsing and FeaturesAddressing Class ImbalanceTarget Leakage AvoidanceTrain–Validation SplitsPipeline-Friendly TransformsFeature Selection MethodsDimensionality ReductionMulticollinearity and Correlation

6Data Visualization and Storytelling

7Statistics and Probability for Data Science

8Machine Learning with scikit-learn

9Deep Learning Foundations

10Data Sources, Engineering, and Deployment

Courses/Python for Data Science, AI & Development/Data Cleaning and Feature Engineering

Data Cleaning and Feature Engineering

43367 views

Prepare high-quality datasets with robust transformations and informative features while avoiding leakage.

Content

3 of 15

Scaling and Normalization

Scaling and Normalization in Python for Data Science & AI
8942 views
intermediate
python
data-science
feature-engineering
sklearn
gpt-5-mini
8942 views

Versions:

Scaling and Normalization in Python for Data Science & AI

Watch & Learn

AI-discovered learning video

Sign in to watch the learning video for this topic.

Sign inSign up free

Start learning for free

Sign up to save progress, unlock study materials, and track your learning.

  • Bookmark content and pick up later
  • AI-generated study materials
  • Flashcards, timelines, and more
  • Progress tracking and certificates

Free to join · No credit card required

Scaling and Normalization — Making Features Play Nicely Together

You already dealt with missing values (imputation) and smacked down the outliers. Nice. Now imagine you’ve invited features to a party: some show up wearing stilts, others are toddlers. Scaling is the bouncer who makes everyone appear the same height so they stop stealing the limelight.

"Scaling doesn't make features better — it makes them comparable."


Why scaling/normalization matters (quick reminder)

  • Many ML algorithms assume features are on similar scales: KNN, K-means, SVM, logistic regression with gradient descent, neural networks, PCA. If one feature ranges 0–1 and another 0–1,000,000, the latter dominates distances and gradients.
  • Tree-based models (random forest, XGBoost) are mostly scale-invariant — they split on thresholds, so scaling rarely changes performance.
  • Always scale after imputation and outlier handling, and remember: fit only on training data to avoid data leakage.

Common scalers and when to use them

1) Standardization (Z-score)

What: subtract mean and divide by standard deviation → mean≈0, std≈1.
Use when: features are roughly Gaussian or you want zero-centered data. Good for algorithms assuming normality or using dot-products (SVMs, logistic regression, neural nets).
sklearn: StandardScaler

2) Min–Max Scaling (Normalization)

What: scales to [0,1] (or other range) using (x - min) / (max - min).
Use when: you need bounded features (e.g., image pixels), or algorithms that require positive inputs. Be careful with outliers — they compress the remaining data.
sklearn: MinMaxScaler

3) Robust Scaling

What: subtract median and divide by IQR (interquartile range).
Use when: your data contains outliers you already detected but prefer a method resilient to them. Great if outliers weren't removed but you don't want them to dominate.
sklearn: RobustScaler

4) Max-Abs Scaling

What: divides by maximum absolute value, scales to [-1,1]. Preserves sparsity.
Use when: data is sparse (e.g., TF-IDF). Useful for linear models working with sparse matrices.
sklearn: MaxAbsScaler

5) Unit Vector Scaling (Normalizer)

What: scales each sample (row) to unit norm (L1 or L2). Not feature-wise; it operates on rows.
Use when: you care about direction in feature space (cosine similarity), e.g., text embeddings.
sklearn: Normalizer

6) Power & Quantile Transforms

PowerTransformer (Box-Cox or Yeo-Johnson): makes distributions more Gaussian. Useful before scaling if features are skewed.
QuantileTransformer: maps data to a uniform or normal distribution using ranks; robust to outliers and nonlinear.
Use when: you need to stabilize variance or make feature distributions approximately Gaussian (helpful for models sensitive to normality).


Quick analogies (so it sticks)

  • Min–Max: "Squeeze into the same onesie." Everyone ends at the same endpoints.
  • StandardScaler: "Center on the stage and adjust volume." Mean zero, comparable amplitude.
  • RobustScaler: "Ignore the loudest friend and talk about the group median." Outlier-resistant.
  • Normalizer: "Normalize each sentence to its direction — we care about wording pattern, not length."

Practical: Implementing scalers with sklearn and pandas

Assume you have a DataFrame df with numerical columns numeric_cols and categorical_cols previously imputed and cleaned.

from sklearn.pipeline import Pipeline
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import StandardScaler, OneHotEncoder

numeric_cols = ['age', 'income', 'num_items']
cat_cols = ['country', 'device']

numeric_pipeline = Pipeline([
    ('scaler', StandardScaler())
])

preprocessor = ColumnTransformer([
    ('num', numeric_pipeline, numeric_cols),
    ('cat', OneHotEncoder(handle_unknown='ignore'), cat_cols)
])

# Then wrap with model pipeline
from sklearn.linear_model import LogisticRegression
model = Pipeline([
    ('preproc', preprocessor),
    ('clf', LogisticRegression())
])

model.fit(X_train, y_train)

Key points:

  • Fit scalers only on training data (Pipeline/ColumnTransformer ensures that during cross-validation).
  • Use appropriate scaler for sparse data (MaxAbsScaler or avoid dense conversions).

Pitfalls & best practices

  1. Data leakage: never fit a scaler on the full dataset. Always fit on train and transform test/validation.
  2. Order matters: impute missing values and handle outliers before scaling. For example, if you replace missing with median, do that first, then scale.
  3. Polynomials & interaction terms: if you create polynomial features, scale after generating them (or use a pipeline step to do polynomial feature generation then scaling). Otherwise, feature magnitudes explode.
  4. Interpretability: scaling changes feature units. Keep track (save your scaler) and use inverse_transform for interpreting coefficients or predictions.
  5. Sparse features: many scalers convert data to dense arrays. Use MaxAbsScaler or sparse-aware implementations when working with large sparse matrices.
  6. Tree-based models: usually don't require scaling, but if you plan to use the same pipeline for many models (some requiring scaling), include it conditionally.

Example: Avoiding leakage in cross-validation

Bad: fit scaler on entire dataset, then CV → inflated performance.

Good: include scaler inside Pipeline. Scikit-learn's cross_val_score will call fit on the training fold only, preventing leakage.

from sklearn.model_selection import cross_val_score
pipeline = Pipeline([('scaler', StandardScaler()), ('clf', KNeighborsClassifier())])
cross_val_score(pipeline, X, y, cv=5)

When scaling won't help much

  • Decision trees and ensembles of trees (unless you combine with algorithms that care about scale).
  • When a feature's absolute scale is semantically meaningful and you don't want to lose that unless your model requires it.

Quick decision cheat-sheet

  • Skewed numeric features → consider PowerTransformer or log transform then StandardScaler.
  • Outliers present → RobustScaler.
  • Sparse features → MaxAbsScaler.
  • Need bounded 0–1 → MinMaxScaler (watch out for outliers).
  • Row-wise similarity (cosine) → Normalizer.

Closing: Summary & takeaways

  • Scaling makes feature magnitudes comparable; it's essential for distance-based and gradient-based algorithms.
  • Fit scalers only on training data and include them inside Pipelines to prevent leakage.
  • Choose the scaler to match your data: use RobustScaler for outliers, StandardScaler for roughly normal data, MinMax for bounded ranges, MaxAbs for sparse data, and Power/Quantile transforms for heavy skew.
  • Keep a saved scaler for inverse_transform so your results remain interpretable.

"Scale wisely. Fit on training. Transform everywhere else. Then go build something that actually amazes people."


Suggested next steps in this course

  • Apply StandardScaler vs RobustScaler to a dataset you cleaned earlier (after imputation & outlier steps) and compare model performance for KNN and RandomForest.
  • Practice building ColumnTransformer pipelines combining numeric scalers and OneHotEncoder for categorical features.
Flashcards
Mind Map
Speed Challenge

Comments (0)

Please sign in to leave a comment.

No comments yet. Be the first to comment!

Ready to practice?

Sign up now to study with flashcards, practice questions, and more — and track your progress on this topic.

Study with flashcards, timelines, and more
Earn certificates for completed courses
Bookmark content for later reference
Track your progress across all topics