Kahibaro
Discord Login Register

Disk and I/O monitoring

Why Disk and I/O Monitoring Matters

Disk and I/O (input/output) performance often becomes a bottleneck before CPU or RAM. Slow disks can cause:

Monitoring disk and I/O helps you answer:

This chapter focuses on the main tools and metrics used to monitor disk and I/O on Linux.

Key Disk and I/O Metrics

You’ll see these metrics across most tools:

Tools Overview

You’ll use a mix of:

We’ll focus on practical usage of the most common tools.

Device and Mount Layout Basics for Monitoring

To interpret output correctly, you need to recognize:

Useful quick commands:

lsblk      # Block devices, partitions, mount points
lsblk -f   # + filesystem type, labels, UUIDs
findmnt    # Show mounts in a tree form

These help map “which disk” to “which filesystem” when you see them in monitoring tools.

Monitoring with iostat (sysstat)

iostat is one of the core disk monitoring tools. It comes from the sysstat package (install it if missing).

Basic usage:

iostat

This prints CPU stats and basic device stats since boot. For real monitoring you usually:

Common patterns:

# Extended stats for all devices every 2 seconds
iostat -x 2
# Extended stats for one device
iostat -x 2 /dev/sda
# Human-readable bytes (on some distros)
iostat -h -x 2

Key columns in iostat -x:

When troubleshooting:

  1. Run iostat -x 2 during the slowdown.
  2. Look for disks with:
    • High %util
    • High await
    • High r/s or w/s (IOPS) and/or high throughput
  3. Map the busy device to a filesystem with lsblk/findmnt.

Monitoring with vmstat

vmstat focuses on memory and virtual memory but has an I/O section too. It’s often used as a lightweight first look.

Basic:

vmstat 2

Key I/O columns:

These are coarse but useful for seeing whether disk is moving data at all, and how changes (e.g. starting a backup job) affect I/O rates over time.

Using dstat and atop for Combined Views

dstat

dstat (if installed) gives a more customizable live view.

dstat -d -D sda,sdb 1      # Disk stats only, for sda and sdb
dstat -cdngy 1            # CPU, disk, net, paging, system

Helpful flags:

It shows per-second rates, which makes trends easier to see than cumulative counters.

atop

atop is an advanced monitoring tool that can show per-process disk usage and can also log to a file over time.

atop

Look for the DSK section for device-level load, and (in some versions) per-process I/O statistics. You can:

It’s especially useful on servers for long-term performance analysis (when used with its logging mode).

Finding I/O-Heavy Processes with iotop and pidstat

When you know disks are busy, the next question is “who is causing this?”

iotop

iotop shows I/O usage by process/thread. You usually need root (or sudo) to see full details.

Install from your distro repo, then:

sudo iotop

or, for a more typical mode:

sudo iotop -o

Key options:

Important columns (names may vary slightly):

Use it to identify:

pidstat (from sysstat)

pidstat can show per-process I/O over time. For example:

pidstat -d 2

Columns (may vary):

You can also monitor a single process:

pidstat -d -p <PID> 1

This is handy when you already suspect one application and want to quantify its disk usage.

Checking Space Usage: df and du

Performance and capacity are linked: a nearly full disk can slow down and cause failures.

df: filesystem-level usage

df -h

Shows:

Points to watch:

You can filter specific filesystems:

df -h /var
df -h /home

du: directory-level usage

To find which directories use the most space:

du -sh *        # in current directory
du -sh /var/*   # biggest users in /var

Options:

Example:

sudo du -h --max-depth=1 /var | sort -h

This helps track down:

Monitoring Inodes

On some filesystems, you can run out of inodes (maximum number of files) even if there is free space.

Check inode usage:

df -i

Look at the IUse% column. A filesystem that is 100% full on inodes cannot create more files, even if Use% (space) is lower.

This is common when many small files are created (mail spools, caches, temporary files).

Device-Level Stats from /proc and /sys

Many monitoring tools read from /proc and /sys. You can inspect them directly for custom scripts.

/proc/diskstats

cat /proc/diskstats

Each line corresponds to a device or partition, with fields like:

These are cumulative counters since boot. Tools like iostat simply sample this repeatedly and compute per-second differences.

/sys/block

ls /sys/block

Per-device directories (e.g. /sys/block/sda) contain:

This is more advanced but useful to know it exists for deeper investigations or scripting.

Historical I/O Data with sar

sar (also from sysstat) can collect and display historical disk and I/O metrics, if the sysstat service/cron is enabled.

To view historical device activity:

sar -d 1 3          # live, like iostat
sar -d -f /var/log/sysstat/sa10   # historical, file name varies by distro/date

Key columns:

This is particularly useful when:

Simple Workflows for Common Scenarios

Scenario 1: System feels slow, high load average

  1. Check if load is I/O-related:
    • top or uptime for load
    • vmstat 2 for bi/bo
  2. If I/O is active, use:
    • iostat -x 2 to see which device is saturated
    • df -h to ensure the filesystem is not full
  3. Use iotop -o or pidstat -d 2 to see which processes are causing heavy I/O.

Scenario 2: Database or application latency spikes

  1. Run iostat -x 2 and watch await and %util on the disk(s) where data is stored.
  2. If these are high, use iotop/pidstat to see whether:
    • The database itself is doing heavy I/O
    • Another process (backup, log rotation, find job) is competing for the same disk
  3. Consider whether the data is on HDD vs SSD, and if the workload pattern (random vs sequential) is stressing the device.

Scenario 3: Disk unexpectedly full, causing failures

  1. df -h to find full or near-full filesystems.
  2. du -sh /* and then drill down:
    • du -sh /var/*
  3. Identify growth in:
    • Log directories (/var/log)
    • Cache directories
    • Application-specific data directories
  4. Remove/rotate/compress data as appropriate, or move it to another volume.

Intro to I/O Performance Characteristics (HDD vs SSD)

Understanding rough differences helps interpret numbers:

This means:

Basic Monitoring Tips and Practices

By combining these tools and metrics, you can quickly determine whether disk and I/O are your bottleneck, and identify the processes and filesystems involved.

Views: 20

Comments

Please login to add a comment.

Don't have an account? Register now!