π― Why Visualize Data?
Data visualization transforms abstract numbers into visual stories. The human brain processes images 60,000Γ faster than text. Visualization helps us explore, analyze, and communicate data effectively.
Three Purposes of Visualization
2. Explanatory: Communicate findings to stakeholders clearly
3. Confirmatory: Verify hypotheses and validate models
ποΈ Visual Perception & Pre-attentive Attributes
The human visual system can detect certain visual attributes almost instantly (< 250ms) without conscious effort. These are called pre-attentive attributes.
- Position: Most accurate for quantitative data (use X/Y axes)
- Length: Bar charts leverage this effectively
- Color Hue: Best for categorical distinctions
- Color Intensity: Good for gradients/magnitude
- Size: Bubble charts, but humans underestimate area
- Shape: Useful for categories, but limit to 5-7 shapes
- Orientation: Lines, angles
Cleveland & McGill's Accuracy Ranking
1. Position on common scale (bar chart)
2. Position on non-aligned scale (multiple axes)
3. Length (bar)
4. Angle, Slope
5. Area
6. Volume, Curvature
7. Color saturation, Color hue
π The Grammar of Graphics
The Grammar of Graphics (Wilkinson, 1999) is a framework for describing statistical graphics. It's the foundation of ggplot2 (R) and influences Seaborn, Altair, and Plotly.
- Data: The dataset being visualized
- Aesthetics (aes): Mapping data to visual properties (x, y, color, size)
- Geometries (geom): Visual elements (points, lines, bars, areas)
- Facets: Subplots by categorical variable
- Statistics: Transformations (binning, smoothing, aggregation)
- Coordinates: Cartesian, polar, map projections
- Themes: Non-data visual elements (fonts, backgrounds)
π¨ Choosing the Right Chart
The best visualization depends on your data type and question. Here's a decision guide:
β’ Continuous: Histogram, KDE, Box plot, Violin plot
β’ Categorical: Bar chart, Count plot
Two Variables (Bivariate):
β’ Both Continuous: Scatter plot, Line chart, Hexbin, 2D histogram
β’ Continuous + Categorical: Box plot, Violin, Strip, Swarm
β’ Both Categorical: Heatmap, Grouped bar chart
Multiple Variables (Multivariate):
β’ Pair plot (scatterplot matrix)
β’ Parallel coordinates
β’ Heatmap correlation matrix
β’ Faceted plots (small multiples)
Common Chart Mistakes
π¬ Matplotlib Figure Anatomy
Understanding Matplotlib's object hierarchy is key to creating professional visualizations.
Figure β Axes β Axis β Tick β Labelβ’ Figure: The overall window/canvas
β’ Axes: The actual plot area (NOT the X/Y axis!)
β’ Axis: The X or Y axis with ticks and labels
β’ Artist: Everything visible (lines, text, patches)
Two Interfaces
plt.plot(x, y)plt.xlabel('Time')plt.show()2. Object-Oriented (OO): Explicit, recommended for complex plots
fig, ax = plt.subplots()ax.plot(x, y)ax.set_xlabel('Time')
π Basic Matplotlib Plots
Master the fundamental plot types that form the foundation of data visualization.
Code Examples
ax.plot(x, y, color='blue', linestyle='--', marker='o', label='Series A')Scatter Plot:
ax.scatter(x, y, c=colors, s=sizes, alpha=0.7, cmap='viridis')Bar Chart:
ax.bar(categories, values, color='steelblue', edgecolor='black')Histogram:
ax.hist(data, bins=30, edgecolor='white', density=True)
π² Subplots & Multi-panel Layouts
Combine multiple visualizations into a single figure for comprehensive analysis.
fig, axes = plt.subplots(2, 2, figsize=(12, 10))fig, axes = plt.subplots(2, 2, sharex=True, sharey=True)gs = fig.add_gridspec(3, 3); ax = fig.add_subplot(gs[0, :])
π¨ Styling & Professional Themes
Transform basic plots into publication-quality visualizations.
plt.style.available β Lists all built-in stylesplt.style.use('seaborn-v0_8-whitegrid')with plt.style.context('dark_background'):
Color Palettes
Sequential: Blues, Greens, Oranges (for magnitude)
Diverging: coolwarm, RdBu (for +/- deviations)
Categorical: tab10, Set2, Paired (discrete groups)
π Seaborn: Statistical Visualization
Seaborn is a high-level library built on Matplotlib that makes statistical graphics beautiful and easy.
- Beautiful default styles and color palettes
- Works seamlessly with Pandas DataFrames
- Statistical estimation built-in (confidence intervals, regression)
- Faceting for multi-panel figures
- Functions organized by plot purpose
Seaborn Function Categories
Axes-level: Draw on specific axes (histplot, scatterplot, boxplot)
By Purpose:
β’ Distribution: histplot, kdeplot, ecdfplot, rugplot
β’ Relationship: scatterplot, lineplot, regplot
β’ Categorical: stripplot, swarmplot, boxplot, violinplot, barplot
β’ Matrix: heatmap, clustermap
π Distribution Plots
Visualize the distribution of a single variable or compare distributions across groups.
β’ Histogram: Discrete bins, shows raw counts
β’ KDE: Smooth curve, estimates probability density
β’ Use both together:
sns.histplot(data, kde=True)
π Relationship Plots
Explore relationships between two or more continuous variables.
sns.scatterplot(data=df, x='x', y='y', hue='category', size='magnitude')sns.regplot(data=df, x='x', y='y', scatter_kws={'alpha':0.5})sns.pairplot(df, hue='species', diag_kind='kde')
π¦ Categorical Plots
Visualize distributions and comparisons across categorical groups.
β’ Strip/Swarm: Show all data points (small datasets)
β’ Box: Summary statistics (median, quartiles, outliers)
β’ Violin: Full distribution shape + summary
β’ Bar: Mean/count with error bars
π₯ Heatmaps & Correlation Matrices
Visualize matrices of values using color intensity. Essential for EDA correlation analysis.
β’ Always annotate with values:
annot=Trueβ’ Use diverging colormap for correlation:
cmap='coolwarm', center=0β’ Mask upper/lower triangle:
mask=np.triu(np.ones_like(corr))β’ Square cells:
square=True
π Plotly Express: Interactive Visualization
Plotly creates interactive, web-based visualizations with zoom, pan, hover tooltips, and more.
- Interactive out of the box (zoom, pan, select)
- Hover tooltips with data details
- Export as HTML, PNG, or embed in dashboards
- Works in Jupyter, Streamlit, Dash
- plotly.express is the high-level API (like Seaborn for Matplotlib)
px.scatter(df, x='x', y='y', color='category', size='value', hover_data=['name'])px.line(df, x='date', y='price', color='stock')px.bar(df, x='category', y='count', color='group', barmode='group')px.histogram(df, x='value', nbins=50, marginal='box')
π¬ Animated Visualizations
Add time dimension to your visualizations with animations.
px.scatter(df, x='gdp', y='life_exp', animation_frame='year', animation_group='country', size='pop', color='continent')Matplotlib Animation:
from matplotlib.animation import FuncAnimationani = FuncAnimation(fig, update_func, frames=100, interval=50)
π± Interactive Dashboards with Streamlit
Build interactive web apps for data exploration without web development experience.
streamlit run app.pyimport streamlit as stst.title("My Dashboard")st.slider("Select value", 0, 100, 50)st.selectbox("Choose", ["A", "B", "C"])st.plotly_chart(fig)
πΊοΈ Geospatial Visualization
Visualize geographic data with maps, choropleth, and point plots.
β’ Plotly:
px.choropleth(df, locations='country', color='value')β’ Folium: Interactive Leaflet maps
β’ Geopandas + Matplotlib: Static maps with shapefiles
β’ Kepler.gl: Large-scale geospatial visualization
π² 3D Visualization
Visualize three-dimensional relationships with surface plots, scatter plots, and more.
π Data Storytelling
Transform visualizations into compelling narratives that drive action.
- Context: Why does this matter? Who is the audience?
- Data: What insights did you discover?
- Narrative: What's the storyline (beginning, middle, end)?
- Visual: Which chart best supports the story?
- Call to Action: What should the audience do?
Design Principles
Focus Attention: Use color strategically (grey + accent)
Think Like a Designer: Alignment, white space, hierarchy
Tell a Story: Title = conclusion, not description
Bad: "Sales by Region"
Good: "West Region Sales Dropped 23% in Q4"