Data Visualization and Storytelling
Explore and communicate insights with clear, accessible visuals using Matplotlib, Seaborn, and Plotly.
Content
Bar Charts and Categorical Plots
Versions:
Watch & Learn
AI-discovered learning video
Sign in to watch the learning video for this topic.
Bar Charts and Categorical Plots — Tell Better Stories with Categories
You cleaned your data, engineered sensible features, and eyeballed distributions with histograms and density plots. Now it's time to make categories sing.
If Scatterplots and Pair Plots help you explore relationships between continuous variables and Histograms reveal distributions, bar charts and categorical plots are the part of the toolkit that speak the language of categories — product types, user segments, country names, and anything that lives in the land of discrete labels.
Why this matters (and when to pick a bar chart)
- Comparison is king. Bar charts answer: "Which category is biggest? How do groups compare?"
- Counts and aggregated summaries. Great for raw counts (how many users per cohort) and for aggregated measures (average revenue by plan).
- Storytelling with discrete axes. Words on the x-axis are where the narrative lives.
Use bar charts when you want to compare categories or show an aggregate statistic per category. If you need to show distribution within categories, consider stacked bars, grouped bars, or switching to violin/box plots (we covered continuous distributions earlier).
Quick taxonomy — Which categorical plot should I use?
| Question you want to answer | Plot type | When to use |
|---|---|---|
| How many items per category? | Countplot / Simple bar (counts) | Raw frequencies — think: "users per country" |
| What’s the average/median by category? | Barplot / pointplot | Aggregate statistic with error bars — "avg. spend by segment" |
| Two categorical variables compared | Grouped (dodged) bar / heatmap | Compare subcategories, e.g., gender × plan |
| Composition of each category | Stacked bar / 100% stacked (normalized) | Share within category — market share per category |
| Many categories, need ordering | Sorted bar chart | Avoid alphabetical chaos — sort by value |
Code recipes (pandas + seaborn) — practical examples
1) Count of users per country (clean data assumed)
import seaborn as sns
import matplotlib.pyplot as plt
sns.countplot(data=df, x='country', order=df['country'].value_counts().index, palette='pastel')
plt.xticks(rotation=45, ha='right')
plt.title('Users per Country')
plt.ylabel('Number of users')
plt.xlabel('Country')
plt.tight_layout()
Tips: order= avoids alphabetical chaos; rotate labels when category names are long.
2) Average revenue per plan with 95% CI
sns.barplot(data=df, x='plan', y='revenue', estimator='mean', ci=95, palette='Set2')
plt.title('Average Revenue by Plan (95% CI)')
Seaborn's barplot computes the estimator and draws error bars. If you've already computed aggregates in cleaning/feature engineering, pass the aggregated DataFrame and use plt.bar for more control.
3) Grouped bars (dodged) for two categorical variables
sns.catplot(data=df, x='plan', hue='gender', col='region', kind='count', height=4, aspect=1.2)
Or with aggregated metrics:
agg = df.groupby(['plan','gender'])['revenue'].mean().reset_index()
sns.barplot(data=agg, x='plan', y='revenue', hue='gender')
4) 100% stacked bar (normalized shares)
ct = pd.crosstab(df['region'], df['plan'])
ct_norm = ct.div(ct.sum(axis=1), axis=0)
ct_norm.plot(kind='bar', stacked=True, colormap='tab20')
plt.legend(bbox_to_anchor=(1.05,1), loc='upper left')
Use 100% stacked bars to tell a composition story: “What percent of each region uses each plan?”
Practical storytelling tips (visual rhetoric, not just code)
- Start with the question. Don’t plot everything. Ask: What do I want the audience to see?
- Sort categories by value when comparison matters; alphabetical order is for filing cabinets, not insight.
- Annotate the bars with exact values when precision matters (use plt.text on top of bars).
- Use color intentionally. One color per category family; highlight the bar you want the audience to remember with a contrasting color.
- Show uncertainty when aggregating: error bars, bootstrapped CIs, or mini-violin inserts.
- Normalize when composition matters. Use percentages instead of counts to compare category makeup across groups of different sizes.
- Aggregate the tail. If you have dozens of low-frequency categories, group them into “Other” to keep the chart readable.
"A bar chart without sorting is like a story with no protagonist — directionless and sleepy."
Pitfalls and how to avoid them
- Truncated y-axis. Don’t clip the axis to exaggerate differences — it's deceptive.
- Too many categories. If there are hundreds of bars, use a dot plot, table, interactive viz, or aggregate.
- Leaking future info. From Data Cleaning & Feature Engineering: ensure aggregated features used for grouping are computed only from training/appropriate time windows to avoid leakage.
- Misleading stacking. Stacked bars are good for composition but bad for comparing individual categories across groups — use grouped bars instead.
Small cookbook: annotate bars with values
ax = sns.barplot(data=agg, x='plan', y='revenue')
for p in ax.patches:
ax.annotate(f'{p.get_height():.0f}', (p.get_x()+p.get_width()/2, p.get_height()),
ha='center', va='bottom')
This turns numbers into readable facts, not just eye candy.
Quick comparisons to previous visuals
- Vs. Histograms: Histograms and density plots show distribution within continuous variables; bar charts summarize categorical counts or aggregated statistics.
- Vs. Scatterplots / Pair Plots: Scatterplots explore relationships between continuous variables; categorical plots summarize categories and can be used alongside scatterplots — e.g., color (hue) categories in a scatter to mix both ideas.
Key takeaways (so you don’t forget them on exam day)
- Use bar charts to compare categories — counts or aggregated statistics.
- Sort, annotate, and highlight to make the narrative obvious.
- Choose grouped vs. stacked depending on whether you compare values or show composition.
- Control for leakage when your categories or aggregates come from engineered features.
- If the chart is busy, aggregate or switch forms (dot plots, heatmaps, interactive dashboards).
Final memorable insight
Bar charts are the stage; your categories are the actors. Arrange them (sort), give the lead actor a spotlight (color), and don’t let tiny extras (dozens of rare categories) clutter the scene. Then your audience will actually get the point.
If you want, I can: give you a reusable function to produce publication-ready categorical plots, or convert these into interactive Plotly charts for dashboards. Which plot do you want prettified first?
Comments (0)
Please sign in to leave a comment.
No comments yet. Be the first to comment!