Courses/Python for Data Science, AI & Development/Data Visualization and Storytelling

Data Visualization and Storytelling

44821 views

Explore and communicate insights with clear, accessible visuals using Matplotlib, Seaborn, and Plotly.

Content

2 of 15

Matplotlib Essentials

Matplotlib Essentials for Data Visualization in Python

4993 views

beginner

visual

python

matplotlib

data-visualization

gpt-5-mini

4993 views

Versions:

Matplotlib Essentials for Data Visualization in Python

Watch & Learn

AI-discovered learning video

Start learning for free

Bookmark content and pick up later
AI-generated study materials
Flashcards, timelines, and more
Progress tracking and certificates

Free to join · No credit card required

Matplotlib Essentials — Make Your Clean Data Look Brilliant

"This is the moment where the concept finally clicks."

You already learned how to clean data and engineer features without leaking the future into your models. You also studied visualization principles. Now it’s time to use Matplotlib — the Swiss Army knife of plotting in Python — to turn those high-quality datasets into clear, honest, and compelling visual stories.

Why Matplotlib? (Even if you love Seaborn and Plotly)

Matplotlib is the foundation. Libraries like Seaborn build on it. Learn the core and you can customize anything.
Fine-grained control. Want an off-grid, hand-drawn feel or a publication-ready figure? Matplotlib does both.
Great for reproducible reports and static images (PNG, SVG, PDF).

Think of Matplotlib like learning to ride a bike with manual gears before using an e-bike. Once you know it, the fancy tools feel like icing on a very stable cake.

Quick import & first plot

import matplotlib.pyplot as plt
import numpy as np

x = np.linspace(0, 10, 100)
y = np.sin(x)

plt.figure(figsize=(8,4))
plt.plot(x, y, label='sin(x)', color='tab:blue')
plt.title('Sine Wave')
plt.xlabel('x')
plt.ylabel('sin(x)')
plt.legend()
plt.grid(alpha=0.3)
plt.show()

Micro explanation

plt.figure(figsize=(w,h)) sets canvas size (in inches).
label + plt.legend() → essential for multi-line clarity.
grid(alpha=...) softens the grid lines so they help, not hog the stage.

Core plot types and when to use them

Line plot — time series, trends.
Scatter — relationship between two continuous vars (great after PCA).
Bar — categorical comparisons (remember sorting!).
Histogram / KDE — distribution shapes (useful after feature engineering to inspect transformed variables).
Boxplot / Violin — distribution + outliers by group.
Heatmap — correlation matrices and confusion matrices.

Example: correlation heatmap (linking back to Multicollinearity & Correlation):

import seaborn as sns
corr = df[numeric_cols].corr()
plt.figure(figsize=(10,8))
sns.heatmap(corr, annot=True, fmt='.2f', cmap='coolwarm', center=0)
plt.title('Feature Correlation (Watch for multicollinearity)')
plt.show()

Tip: if you saw strong correlation earlier in feature engineering, highlight it in visuals — it justifies dimensionality reduction like PCA.

Subplots, grids, and composure

When you need multiple plots on one canvas:

fig, axes = plt.subplots(2, 2, figsize=(12,8))
axes[0,0].plot(...)
axes[0,1].scatter(...)
# ...
plt.tight_layout()
plt.show()

Use plt.tight_layout() to prevent overlapping labels.
For complex layouts, explore GridSpec.

Styling: make it readable (not flashy)

Use clear labels and units. xlabel('Weight (kg)') beats xlabel('wt').
Avoid chartjunk. Keep grids light and avoid unnecessary 3D.
Color wisely. Use perceptually uniform colormaps (e.g., viridis) for quantitative data.
Fonts & sizes. Use larger fonts for presentation; smaller, precise ones for papers.

plt.style.use('seaborn-whitegrid')

Pro tip: Set a style and stick to it for consistency across a report.

Practical: Plot PCA results (building from Dimensionality Reduction)

Imagine you ran PCA on a cleaned dataset and want to show clusters in 2D.

# Assume X_pca has columns ['PC1','PC2'] and y is a label
fig, ax = plt.subplots(figsize=(8,6))
scatter = ax.scatter(X_pca[:,0], X_pca[:,1], c=y, cmap='tab10', s=40, alpha=0.8)
ax.set_xlabel('PC1')
ax.set_ylabel('PC2')
ax.set_title('PCA: PC1 vs PC2')
legend1 = ax.legend(*scatter.legend_elements(), title='Classes')
ax.add_artist(legend1)
plt.grid(alpha=0.2)
plt.show()

This ties together feature engineering, dimensionality reduction, and visualization — showing how engineered features and PCA can reveal structure that a model can exploit.

Annotations & emphasis — tell a story

Annotations help point out the interesting stuff. Example:

ax.annotate('Outlier', xy=(x_out, y_out), xytext=(x_out+1, y_out+1),
            arrowprops=dict(facecolor='black', arrowstyle='->'))

Use annotations sparingly to guide the reader’s eye to the insight, not to distract.

Save figures correctly

plt.savefig('figure.png', dpi=300, bbox_inches='tight')

dpi=300 for print-quality. bbox_inches='tight' avoids clipped labels.

Avoid misleading visuals (ethical plotting)

Never truncate axes to exaggerate effects unless explicitly justified and labeled.
Use appropriate scales (log when data spans orders of magnitude).
Keep aspect ratio in spatial plots to avoid skewing perception.

Remember: a misleading plot is like bad seasoning — it ruins trust.

Debugging & reproducibility

Fix random seeds for jittered/animated plots when reproducing.
Use plt.close() in loops to free memory.
Save raw numeric outputs (CSV/JSON) along with images for auditability.

Quick checklist before you show a plot

Is the question clear? (What story does this plot answer?)
Are axes labeled with units?
Is the legend readable and necessary?
Does the color scale match data type (categorical vs continuous)?
Have you referenced earlier data cleaning or feature transformations that affect interpretation?

Key takeaways

Matplotlib is powerful: learn it to control the narrative of your plots.
Make visuals honest: labels, scales, and colormaps matter as much as markers and lines.
Connect to prior steps: show how cleaning, correlation checks, and PCA affect what you visualize.
Style consistently for professional reports.

Final memorable insight: Good visualizations are arguments made visible — support them with clean data and clear choices.

Ready to practice? Try recreating a figure from a paper using your cleaned dataset and Matplotlib styles. It's the best way to internalize what keeps a plot truthful and persuasive.

Flashcards

Mind Map

Speed Challenge

Comments (0)

Please sign in to leave a comment.

No comments yet. Be the first to comment!

Ready to practice?

Study with flashcards, timelines, and more

Earn certificates for completed courses

Bookmark content for later reference

Track your progress across all topics