Data Visualization and Storytelling
Explore and communicate insights with clear, accessible visuals using Matplotlib, Seaborn, and Plotly.
Content
Matplotlib Essentials
Versions:
Watch & Learn
AI-discovered learning video
Sign in to watch the learning video for this topic.
Matplotlib Essentials — Make Your Clean Data Look Brilliant
"This is the moment where the concept finally clicks."
You already learned how to clean data and engineer features without leaking the future into your models. You also studied visualization principles. Now it’s time to use Matplotlib — the Swiss Army knife of plotting in Python — to turn those high-quality datasets into clear, honest, and compelling visual stories.
Why Matplotlib? (Even if you love Seaborn and Plotly)
- Matplotlib is the foundation. Libraries like Seaborn build on it. Learn the core and you can customize anything.
- Fine-grained control. Want an off-grid, hand-drawn feel or a publication-ready figure? Matplotlib does both.
- Great for reproducible reports and static images (PNG, SVG, PDF).
Think of Matplotlib like learning to ride a bike with manual gears before using an e-bike. Once you know it, the fancy tools feel like icing on a very stable cake.
Quick import & first plot
import matplotlib.pyplot as plt
import numpy as np
x = np.linspace(0, 10, 100)
y = np.sin(x)
plt.figure(figsize=(8,4))
plt.plot(x, y, label='sin(x)', color='tab:blue')
plt.title('Sine Wave')
plt.xlabel('x')
plt.ylabel('sin(x)')
plt.legend()
plt.grid(alpha=0.3)
plt.show()
Micro explanation
- plt.figure(figsize=(w,h)) sets canvas size (in inches).
- label + plt.legend() → essential for multi-line clarity.
- grid(alpha=...) softens the grid lines so they help, not hog the stage.
Core plot types and when to use them
- Line plot — time series, trends.
- Scatter — relationship between two continuous vars (great after PCA).
- Bar — categorical comparisons (remember sorting!).
- Histogram / KDE — distribution shapes (useful after feature engineering to inspect transformed variables).
- Boxplot / Violin — distribution + outliers by group.
- Heatmap — correlation matrices and confusion matrices.
Example: correlation heatmap (linking back to Multicollinearity & Correlation):
import seaborn as sns
corr = df[numeric_cols].corr()
plt.figure(figsize=(10,8))
sns.heatmap(corr, annot=True, fmt='.2f', cmap='coolwarm', center=0)
plt.title('Feature Correlation (Watch for multicollinearity)')
plt.show()
Tip: if you saw strong correlation earlier in feature engineering, highlight it in visuals — it justifies dimensionality reduction like PCA.
Subplots, grids, and composure
When you need multiple plots on one canvas:
fig, axes = plt.subplots(2, 2, figsize=(12,8))
axes[0,0].plot(...)
axes[0,1].scatter(...)
# ...
plt.tight_layout()
plt.show()
- Use plt.tight_layout() to prevent overlapping labels.
- For complex layouts, explore GridSpec.
Styling: make it readable (not flashy)
- Use clear labels and units. xlabel('Weight (kg)') beats xlabel('wt').
- Avoid chartjunk. Keep grids light and avoid unnecessary 3D.
- Color wisely. Use perceptually uniform colormaps (e.g., viridis) for quantitative data.
- Fonts & sizes. Use larger fonts for presentation; smaller, precise ones for papers.
plt.style.use('seaborn-whitegrid')
Pro tip: Set a style and stick to it for consistency across a report.
Practical: Plot PCA results (building from Dimensionality Reduction)
Imagine you ran PCA on a cleaned dataset and want to show clusters in 2D.
# Assume X_pca has columns ['PC1','PC2'] and y is a label
fig, ax = plt.subplots(figsize=(8,6))
scatter = ax.scatter(X_pca[:,0], X_pca[:,1], c=y, cmap='tab10', s=40, alpha=0.8)
ax.set_xlabel('PC1')
ax.set_ylabel('PC2')
ax.set_title('PCA: PC1 vs PC2')
legend1 = ax.legend(*scatter.legend_elements(), title='Classes')
ax.add_artist(legend1)
plt.grid(alpha=0.2)
plt.show()
This ties together feature engineering, dimensionality reduction, and visualization — showing how engineered features and PCA can reveal structure that a model can exploit.
Annotations & emphasis — tell a story
Annotations help point out the interesting stuff. Example:
ax.annotate('Outlier', xy=(x_out, y_out), xytext=(x_out+1, y_out+1),
arrowprops=dict(facecolor='black', arrowstyle='->'))
Use annotations sparingly to guide the reader’s eye to the insight, not to distract.
Save figures correctly
plt.savefig('figure.png', dpi=300, bbox_inches='tight')
- dpi=300 for print-quality. bbox_inches='tight' avoids clipped labels.
Avoid misleading visuals (ethical plotting)
- Never truncate axes to exaggerate effects unless explicitly justified and labeled.
- Use appropriate scales (log when data spans orders of magnitude).
- Keep aspect ratio in spatial plots to avoid skewing perception.
Remember: a misleading plot is like bad seasoning — it ruins trust.
Debugging & reproducibility
- Fix random seeds for jittered/animated plots when reproducing.
- Use
plt.close()in loops to free memory. - Save raw numeric outputs (CSV/JSON) along with images for auditability.
Quick checklist before you show a plot
- Is the question clear? (What story does this plot answer?)
- Are axes labeled with units?
- Is the legend readable and necessary?
- Does the color scale match data type (categorical vs continuous)?
- Have you referenced earlier data cleaning or feature transformations that affect interpretation?
Key takeaways
- Matplotlib is powerful: learn it to control the narrative of your plots.
- Make visuals honest: labels, scales, and colormaps matter as much as markers and lines.
- Connect to prior steps: show how cleaning, correlation checks, and PCA affect what you visualize.
- Style consistently for professional reports.
Final memorable insight: Good visualizations are arguments made visible — support them with clean data and clear choices.
Ready to practice? Try recreating a figure from a paper using your cleaned dataset and Matplotlib styles. It's the best way to internalize what keeps a plot truthful and persuasive.
Comments (0)
Please sign in to leave a comment.
No comments yet. Be the first to comment!