Table of Contents
Why visualization matters in data science
Visualization turns raw numbers into pictures you can understand at a glance. With graphs, you can:
- See trends over time
- Compare groups
- Spot outliers (unusual values)
- Explain results to others quickly
In Python, one of the most common libraries for plotting is matplotlib, usually used through its pyplot module.
Getting started with matplotlib
To follow along, you should already know how to:
- Install and import Python libraries
- Work with basic data (lists, numbers)
- Use simple NumPy/pandas examples (from this chapter’s earlier sections)
Most examples will use this import style:
import matplotlib.pyplot as plt
This gives you access to plotting functions through plt.
A basic pattern you’ll see a lot:
import matplotlib.pyplot as plt
# 1. Prepare your data
x = [1, 2, 3, 4]
y = [2, 4, 6, 8]
# 2. Create a plot
plt.plot(x, y)
# 3. Show the plot window
plt.show()
plt.show() opens a window (or displays the figure in a notebook) with your chart.
Basic plot types
Line plots
Line plots are useful for showing how something changes, often over time.
import matplotlib.pyplot as plt
days = [1, 2, 3, 4, 5, 6, 7]
temperatures = [21, 23, 20, 22, 24, 25, 23]
plt.plot(days, temperatures) # x, y
plt.xlabel("Day")
plt.ylabel("Temperature (°C)")
plt.title("Temperature Over One Week")
plt.show()You can customize the line style and color:
plt.plot(days, temperatures, color="red", linestyle="--", marker="o")Common options:
- Colors:
"red","blue","green"or shorthand like"r","b","g" - Line styles:
"-"(solid),"--"(dashed),"-.",":" - Markers:
"o"(circle),"s"(square),"x","^"(triangle), etc.
Scatter plots
Scatter plots show individual points. They’re useful for exploring relationships between two variables.
import matplotlib.pyplot as plt
hours_studied = [1, 2, 3, 4, 5, 6]
exam_score = [50, 55, 65, 70, 75, 85]
plt.scatter(hours_studied, exam_score)
plt.xlabel("Hours Studied")
plt.ylabel("Exam Score")
plt.title("Study Time vs Exam Score")
plt.show()You can change marker style and color:
plt.scatter(hours_studied, exam_score, color="purple", marker="x")Bar charts
Bar charts compare values across categories (like labels instead of numbers on the x-axis).
import matplotlib.pyplot as plt
languages = ["Python", "JavaScript", "C++", "Java"]
popularity = [85, 75, 60, 70] # Made-up scores
plt.bar(languages, popularity)
plt.xlabel("Programming Language")
plt.ylabel("Popularity Score")
plt.title("Programming Language Popularity (Example Data)")
plt.show()You can change color, width, and orientation:
plt.bar(languages, popularity, color="orange")Horizontal bar chart:
plt.barh(languages, popularity) # Note: barh = horizontal bar chartHistograms
Histograms show the distribution of a numeric variable—how often values fall into ranges (bins).
import matplotlib.pyplot as plt
ages = [18, 19, 21, 22, 23, 19, 34, 45, 29, 31, 22, 25, 26, 27, 30, 32, 40]
plt.hist(ages, bins=5) # Try changing number of bins
plt.xlabel("Age")
plt.ylabel("Frequency")
plt.title("Age Distribution")
plt.show()
The bins parameter controls how many bars (ranges) the histogram will have.
Customizing plots
Good plots are not just correct—they’re readable and clear. Here are some simple customizations.
Titles and axis labels
Use:
plt.title("Your Title")plt.xlabel("X-axis label")plt.ylabel("Y-axis label")
Example:
plt.plot([1, 2, 3], [3, 5, 7])
plt.title("Simple Line")
plt.xlabel("X value")
plt.ylabel("Y value")
plt.show()Legends
Legends explain different lines or markers in one figure.
import matplotlib.pyplot as plt
x = [1, 2, 3, 4]
y1 = [2, 4, 6, 8]
y2 = [1, 3, 5, 7]
plt.plot(x, y1, label="Group A")
plt.plot(x, y2, label="Group B")
plt.xlabel("X")
plt.ylabel("Y")
plt.title("Two Lines on One Plot")
plt.legend() # Show legend
plt.show()You can position the legend:
plt.legend(loc="upper left") # or "upper right", "lower right", "lower left", etc.Colors and styles
To keep multiple lines distinct:
plt.plot(x, y1, color="blue", linestyle="-", marker="o", label="Blue line")
plt.plot(x, y2, color="red", linestyle="--", marker="s", label="Red dashed line")You can also use short format strings:
plt.plot(x, y1, "bo-") # blue, circle markers, solid line
plt.plot(x, y2, "r--") # red, dashed lineFigure size
Change the size of the whole figure:
plt.figure(figsize=(8, 4)) # width, height in inches
plt.plot(x, y1)
plt.title("Custom Size")
plt.show()
You typically call plt.figure() before plotting.
Using matplotlib with NumPy and pandas
In data science, you often load and manipulate data with NumPy and pandas, then plot it with matplotlib.
Plotting a NumPy array
import numpy as np
import matplotlib.pyplot as plt
x = np.linspace(0, 10, 100) # 100 points from 0 to 10
y = np.sin(x)
plt.plot(x, y)
plt.xlabel("x")
plt.ylabel("sin(x)")
plt.title("Sine Wave")
plt.show()Plotting a pandas DataFrame
pandas integrates nicely with matplotlib. A simple example:
import pandas as pd
import matplotlib.pyplot as plt
data = {
"day": [1, 2, 3, 4, 5],
"steps": [3000, 5000, 7000, 6500, 8000]
}
df = pd.DataFrame(data)
plt.plot(df["day"], df["steps"])
plt.xlabel("Day")
plt.ylabel("Steps")
plt.title("Daily Step Count")
plt.show()
pandas also has its own plotting methods (which use matplotlib under the hood), but here we focus on using plt directly.
Subplots: multiple charts in one figure
Subplots let you show several related charts side by side in a single figure.
Use plt.subplot(rows, columns, index) where:
rows= number of rows of plotscolumns= number of columnsindex= which plot (starting at 1)
Example with 2 subplots (1 row, 2 columns):
import matplotlib.pyplot as plt
x = [1, 2, 3, 4, 5]
y1 = [1, 4, 9, 16, 25]
y2 = [1, 8, 27, 64, 125]
plt.figure(figsize=(10, 4))
# Left plot
plt.subplot(1, 2, 1)
plt.plot(x, y1)
plt.title("Squares")
# Right plot
plt.subplot(1, 2, 2)
plt.plot(x, y2)
plt.title("Cubes")
plt.tight_layout() # Adjust layout to avoid overlap
plt.show()
tight_layout() helps prevent titles and labels from overlapping.
Saving plots to files
Instead of (or in addition to) showing a figure on screen, you can save it to a file with plt.savefig().
import matplotlib.pyplot as plt
x = [1, 2, 3, 4]
y = [2, 3, 5, 7]
plt.plot(x, y)
plt.title("Example Plot")
plt.savefig("example_plot.png") # Save as PNG
plt.show()
Supported formats include .png, .jpg, .pdf, .svg, etc. The format is chosen from the file extension.
You can control the resolution with dpi:
plt.savefig("example_plot_highres.png", dpi=300)Common beginner pitfalls with matplotlib
- Forgetting
plt.show()
In many environments, you must callplt.show()to actually display the figure. - Plotting before creating a figure (in more complex scripts)
While not always required, explicitplt.figure()can help keep complex scripts organized. - Overlapping labels or text
If labels overlap, try:
plt.tight_layout()or rotate labels:
plt.xticks(rotation=45)- Data lengths don’t match
xandyinplt.plot(x, y)(orplt.scatter) must have the same length. - Too many colors and styles
Simpler is often better. Use a small, consistent set of colors and line styles.
Small practice ideas
Here are a few mini tasks to practice:
- Line chart of monthly expenses
- Create a list of 12 numbers (one for each month).
- Plot them as a line chart with appropriate labels and title.
- Histogram of random data
- Use NumPy or Python’s
randommodule to generate 100 random integers. - Plot a histogram of the values.
- Comparing two categories
- Create two lists of numbers (e.g., scores of Class A and Class B).
- Plot them as side-by-side bar charts or as two lines on the same plot with a legend.
- Multiple subplots
- In one figure, create:
- A line plot in the first subplot
- A histogram in the second
- Use
plt.subplot()andplt.tight_layout().
These exercises will help you get comfortable turning data into clear visual stories using matplotlib.