Kahibaro
Discord Login Register

16.5 Data visualization with matplotlib

Why visualization matters in data science

Visualization turns raw numbers into pictures you can understand at a glance. With graphs, you can:

In Python, one of the most common libraries for plotting is matplotlib, usually used through its pyplot module.

Getting started with matplotlib

To follow along, you should already know how to:

Most examples will use this import style:

python
import matplotlib.pyplot as plt

This gives you access to plotting functions through plt.

A basic pattern you’ll see a lot:

python
import matplotlib.pyplot as plt
# 1. Prepare your data
x = [1, 2, 3, 4]
y = [2, 4, 6, 8]
# 2. Create a plot
plt.plot(x, y)
# 3. Show the plot window
plt.show()

plt.show() opens a window (or displays the figure in a notebook) with your chart.

Basic plot types

Line plots

Line plots are useful for showing how something changes, often over time.

python
import matplotlib.pyplot as plt
days = [1, 2, 3, 4, 5, 6, 7]
temperatures = [21, 23, 20, 22, 24, 25, 23]
plt.plot(days, temperatures)       # x, y
plt.xlabel("Day")
plt.ylabel("Temperature (°C)")
plt.title("Temperature Over One Week")
plt.show()

You can customize the line style and color:

python
plt.plot(days, temperatures, color="red", linestyle="--", marker="o")

Common options:

Scatter plots

Scatter plots show individual points. They’re useful for exploring relationships between two variables.

python
import matplotlib.pyplot as plt
hours_studied = [1, 2, 3, 4, 5, 6]
exam_score =    [50, 55, 65, 70, 75, 85]
plt.scatter(hours_studied, exam_score)
plt.xlabel("Hours Studied")
plt.ylabel("Exam Score")
plt.title("Study Time vs Exam Score")
plt.show()

You can change marker style and color:

python
plt.scatter(hours_studied, exam_score, color="purple", marker="x")

Bar charts

Bar charts compare values across categories (like labels instead of numbers on the x-axis).

python
import matplotlib.pyplot as plt
languages = ["Python", "JavaScript", "C++", "Java"]
popularity = [85, 75, 60, 70]   # Made-up scores
plt.bar(languages, popularity)
plt.xlabel("Programming Language")
plt.ylabel("Popularity Score")
plt.title("Programming Language Popularity (Example Data)")
plt.show()

You can change color, width, and orientation:

python
plt.bar(languages, popularity, color="orange")

Horizontal bar chart:

python
plt.barh(languages, popularity)   # Note: barh = horizontal bar chart

Histograms

Histograms show the distribution of a numeric variable—how often values fall into ranges (bins).

python
import matplotlib.pyplot as plt
ages = [18, 19, 21, 22, 23, 19, 34, 45, 29, 31, 22, 25, 26, 27, 30, 32, 40]
plt.hist(ages, bins=5)   # Try changing number of bins
plt.xlabel("Age")
plt.ylabel("Frequency")
plt.title("Age Distribution")
plt.show()

The bins parameter controls how many bars (ranges) the histogram will have.

Customizing plots

Good plots are not just correct—they’re readable and clear. Here are some simple customizations.

Titles and axis labels

Use:

Example:

python
plt.plot([1, 2, 3], [3, 5, 7])
plt.title("Simple Line")
plt.xlabel("X value")
plt.ylabel("Y value")
plt.show()

Legends

Legends explain different lines or markers in one figure.

python
import matplotlib.pyplot as plt
x = [1, 2, 3, 4]
y1 = [2, 4, 6, 8]
y2 = [1, 3, 5, 7]
plt.plot(x, y1, label="Group A")
plt.plot(x, y2, label="Group B")
plt.xlabel("X")
plt.ylabel("Y")
plt.title("Two Lines on One Plot")
plt.legend()   # Show legend
plt.show()

You can position the legend:

python
plt.legend(loc="upper left")  # or "upper right", "lower right", "lower left", etc.

Colors and styles

To keep multiple lines distinct:

python
plt.plot(x, y1, color="blue", linestyle="-", marker="o", label="Blue line")
plt.plot(x, y2, color="red", linestyle="--", marker="s", label="Red dashed line")

You can also use short format strings:

python
plt.plot(x, y1, "bo-")  # blue, circle markers, solid line
plt.plot(x, y2, "r--")  # red, dashed line

Figure size

Change the size of the whole figure:

python
plt.figure(figsize=(8, 4))  # width, height in inches
plt.plot(x, y1)
plt.title("Custom Size")
plt.show()

You typically call plt.figure() before plotting.

Using matplotlib with NumPy and pandas

In data science, you often load and manipulate data with NumPy and pandas, then plot it with matplotlib.

Plotting a NumPy array

python
import numpy as np
import matplotlib.pyplot as plt
x = np.linspace(0, 10, 100)   # 100 points from 0 to 10
y = np.sin(x)
plt.plot(x, y)
plt.xlabel("x")
plt.ylabel("sin(x)")
plt.title("Sine Wave")
plt.show()

Plotting a pandas DataFrame

pandas integrates nicely with matplotlib. A simple example:

python
import pandas as pd
import matplotlib.pyplot as plt
data = {
    "day": [1, 2, 3, 4, 5],
    "steps": [3000, 5000, 7000, 6500, 8000]
}
df = pd.DataFrame(data)
plt.plot(df["day"], df["steps"])
plt.xlabel("Day")
plt.ylabel("Steps")
plt.title("Daily Step Count")
plt.show()

pandas also has its own plotting methods (which use matplotlib under the hood), but here we focus on using plt directly.

Subplots: multiple charts in one figure

Subplots let you show several related charts side by side in a single figure.

Use plt.subplot(rows, columns, index) where:

Example with 2 subplots (1 row, 2 columns):

python
import matplotlib.pyplot as plt
x = [1, 2, 3, 4, 5]
y1 = [1, 4, 9, 16, 25]
y2 = [1, 8, 27, 64, 125]
plt.figure(figsize=(10, 4))
# Left plot
plt.subplot(1, 2, 1)
plt.plot(x, y1)
plt.title("Squares")
# Right plot
plt.subplot(1, 2, 2)
plt.plot(x, y2)
plt.title("Cubes")
plt.tight_layout()  # Adjust layout to avoid overlap
plt.show()

tight_layout() helps prevent titles and labels from overlapping.

Saving plots to files

Instead of (or in addition to) showing a figure on screen, you can save it to a file with plt.savefig().

python
import matplotlib.pyplot as plt
x = [1, 2, 3, 4]
y = [2, 3, 5, 7]
plt.plot(x, y)
plt.title("Example Plot")
plt.savefig("example_plot.png")  # Save as PNG
plt.show()

Supported formats include .png, .jpg, .pdf, .svg, etc. The format is chosen from the file extension.

You can control the resolution with dpi:

python
plt.savefig("example_plot_highres.png", dpi=300)

Common beginner pitfalls with matplotlib

python
  plt.tight_layout()

or rotate labels:

python
  plt.xticks(rotation=45)

Small practice ideas

Here are a few mini tasks to practice:

  1. Line chart of monthly expenses
    • Create a list of 12 numbers (one for each month).
    • Plot them as a line chart with appropriate labels and title.
  2. Histogram of random data
    • Use NumPy or Python’s random module to generate 100 random integers.
    • Plot a histogram of the values.
  3. Comparing two categories
    • Create two lists of numbers (e.g., scores of Class A and Class B).
    • Plot them as side-by-side bar charts or as two lines on the same plot with a legend.
  4. Multiple subplots
    • In one figure, create:
      • A line plot in the first subplot
      • A histogram in the second
    • Use plt.subplot() and plt.tight_layout().

These exercises will help you get comfortable turning data into clear visual stories using matplotlib.

Views: 123

Comments

Please login to add a comment.

Don't have an account? Register now!