3.5 System Monitoring

Table of Contents

Why System Monitoring Matters

Monitoring is about continuously observing your system so you can:

Detect problems early (high CPU, low disk space, failing services)
Understand “normal” behavior (baseline)
Troubleshoot issues faster (slowness, crashes, network problems)
Plan capacity (when to add CPU, RAM, storage)

For an administrator, monitoring is not optional—it’s how you keep systems healthy and predictable.

Key ideas that apply throughout this chapter:

Metrics vs logs: metrics are numeric (CPU %, memory usage); logs are detailed events/messages. Monitoring usually starts with metrics and then uses logs to investigate.
Real‑time vs historical: some tools show what’s happening now; others help you look back in time.
Local vs centralized: you can monitor a single machine locally, or send data to a central server for many machines.

The rest of this chapter introduces practical tools and basic workflows for monitoring a Linux system.

Types of Monitoring

System monitoring usually covers:

Resource monitoring

CPU, memory, swap
Disk space and I/O
Network throughput and connections

Service monitoring

Is the service running?
Is it responding correctly/quickly?

Performance monitoring

Which processes are heavy?
Where is the bottleneck (CPU, RAM, disk, network)?

In later sections of this part, you’ll go deeper into specific areas (CPU, memory, disk, services). Here we’ll build a general toolkit and mindset.

Core Monitoring Tools Overview

Linux provides a set of command‑line tools that almost every admin uses:

top, htop: real‑time view of CPU, memory, processes
free, vmstat: memory and virtual memory statistics
iostat, iotop, df, du: disk usage and disk I/O
ps, pgrep, pidof: process listings and search
ss, ip, ping: basic network state
journalctl, /var/log/*: logs (covered in more detail in the log‑specific chapter)
systemctl: service status (covered in systemd chapter, but used heavily in monitoring)

You don’t have to memorize all of them immediately, but you should know which tool to reach for in common situations.

Real-Time Interactive Monitors

Using `top`

top gives a live, updating view of processes and overall system usage.

Start it:

top

Key parts of the display:

Load averages, CPU usage, tasks summary at the top
Memory and swap usage
Process list (PID, user, CPU %, memory %, command)

Useful interactive keys (press while top is running):

q – quit
P – sort by CPU usage
M – sort by memory usage
k – kill a process (you’ll be prompted for PID and signal)
u – filter by user
h – help

Typical use: “What is using all the CPU right now?” Sort by CPU (P) and look at the top entries.

Using `htop` (if installed)

htop is a more user‑friendly alternative to top:

htop

Advantages:

Colorful bar graphs for CPU, memory, swap
Scrollable list of processes
Mouse support
Easier to kill/renice processes

Common keys:

F6 – change sort column
F3 – search processes
F9 – kill selected process

If you don’t have it, install with your package manager (details in the package management chapters).

Non-Interactive Snapshots

Sometimes you want a one‑off snapshot instead of a live view.

`ps`

Why System Monitoring Matters

System monitoring is how you:

Notice problems before users complain
Identify what’s actually slow (CPU, RAM, disk, network, or a specific process)
Compare “today” to “normal” behavior
Decide when to add or resize resources

In this chapter you’ll build a basic toolkit and workflow for watching what a Linux system is doing, in real time and over time.

Types of Things You Monitor

System monitoring usually focuses on:

Resources

CPU usage and load
Memory and swap
Disk space and disk I/O
Network traffic

Processes

Which processes exist
How much they use of each resource

Services

Whether essential services are running
Whether they are responding quickly

Other chapters in this part cover CPU/memory/disk/logs in more depth; here the goal is to learn the general‑purpose tools and habits.

Core Local Monitoring Tools

You will repeatedly use these built‑in command‑line tools:

top, htop – live, interactive system/process view
ps – one‑shot process list
free, vmstat – memory usage and virtual memory activity
uptime – how long the system has been running and load averages
df, du – disk usage
iostat, iotop – disk I/O (often in sysstat and iotop packages)
ss, ip, ping – basic networking state
systemctl, journalctl – service status and logs (details in other chapters)

You don’t need all of them at once, but you should know which tool to pick when something looks wrong.

Real‑Time Interactive Monitors

`top`

top shows a live view of CPU, memory, and processes.

Run:

top

The display has two main parts:

Summary area (top lines)

Load averages, uptime
Number of tasks and their states
Overall CPU usage
Memory and swap usage

Process list

PID, USER, %CPU, %MEM, TIME+, COMMAND, etc.

Useful keys while top is running:

q – quit
P – sort by CPU usage
M – sort by memory usage
T – sort by total CPU time
k – kill a process (you enter PID and signal)
u – filter by user
h – help

Common use cases:

“Why is CPU so high?” → run top, press P, check the first few processes.
“Which processes are using most RAM?” → run top, press M.

`htop` (if installed)

htop is a friendlier alternative to top, with colors, bars, and scrolling.

Run:

htop

Advantages compared to top:

Color bar graphs for each CPU, memory, and swap
Scrollable process list
Tree view of processes (parent/child relationships)
Easy mouse interaction

Common keys:

F6 – change sort column
F3 – search process by name
F5 – tree view on/off
F9 – kill selected process
F10 – quit

On most systems you need to install it (htop package) using your distribution’s package manager.

One‑Shot Snapshots

Interactive tools are useful when you are actively investigating. For scripts or quick checks, use one‑shot commands.

`ps` – process listing

ps prints information about processes at the moment you run it.

Some common forms:

ps aux – “BSD style” full list:

a – all users with a terminal
u – show user‑oriented output
x – include processes without a terminal

ps -ef – “UNIX style” full list

Examples:

# All processes, human‑friendly view
ps aux
# Filter processes by name
ps aux | grep sshd
# Show processes for current user
ps ux

Use ps when you want to combine with grep, or in scripts where interactive tools like top don’t make sense.

`uptime` – quick health glance

uptime shows how long the system has been running and the load averages:

uptime

Example output:

 14:22:01 up 3 days,  5:17,  2 users,  load average: 0.32, 0.45, 0.40

The last three numbers are 1‑, 5‑, and 15‑minute load averages. Trend and comparison to CPU cores matter more than the raw numbers; that is covered in detail in the CPU/memory monitoring chapter.

Monitoring Memory Usage

You’ll go deeper into memory monitoring later; here are the basic tools.

`free`

free summarizes memory and swap usage:

free -h

The -h option makes the numbers “human readable” (MB/GB). Watch:

used vs free
available – how much memory can be used without swapping

This is your first stop when the system “feels slow” or is suspected of being out of memory.

`vmstat`

vmstat (virtual memory statistics) gives a compact overview of CPU, memory, and I/O activity:

vmstat 1

This prints a line every second until you stop it with Ctrl+C. Important columns include:

r – runnable processes (waiting for CPU)
si / so – swap in / swap out
bi / bo – blocks in / out (disk I/O)
us, sy, id, wa – user, system, idle, and I/O wait CPU percentages

Use vmstat when you want to see, over time, whether the system is actually swapping, waiting on disk, etc.

Monitoring Disk Usage and I/O

The “Disk and I/O monitoring” chapter goes into detail; here’s the basic toolkit.

`df` – filesystem usage

df shows how full each mounted filesystem is:

df -h

Columns:

Filesystem – device or name
Size, Used, Avail
Use% – percentage used
Mounted on

If Use% is close to 100% on / or /var, you must free space or expand storage.

`du` – where disk space is used

du summarizes how much space directories use.

Examples:

# Rough size of current directory
du -sh .
# Top‑level usage in current directory
du -sh *

Useful for tracking down which directory (logs, cache, user data) is filling the disk.

`iostat` – disk I/O

iostat (from the sysstat package on many distros) shows I/O statistics for devices.

iostat -x 1

-x shows extended statistics; 1 means update every second. Look for:

%util – how busy the device is (near 100% = saturated)
r/s, w/s – read/write requests per second
rkB/s, wkB/s – KB per second

When the system is slow, high %util and high await times on a device can indicate a disk bottleneck.

`iotop` – I/O by process

iotop (if installed) is like top for disk I/O.

sudo iotop

It shows which processes are reading/writing the most to disk in real time, which is handy when “something is hammering the disk” and you don’t know what.

Monitoring Network Activity (Quick View)

Full networking tools are covered elsewhere; these are simple first‑look commands.

`ss` – sockets and connections

ss shows listening and established sockets:

# Listening TCP ports
ss -lt
# All established TCP connections
ss -tan state established

Useful to confirm whether a service is actually listening on a port, or to see rough connection counts.

`ip` and `ping`

ip a – show network interfaces and IP addresses
ping host – test basic connectivity and latency:

ping -c 4 example.com

If users report a service is “down”, check:

Is the service process running? (ps, systemctl status)
Is it listening on the expected port? (ss -lt)
Is the network path OK? (ping, traceroute tools from the networking chapter)

Monitoring Services

Details on systemd and daemons are in another chapter, but monitoring often starts with:

systemctl status SERVICE_NAME

This tells you:

Whether the service is active / failed
Recent log lines
Start time and restart status

Common pattern during troubleshooting:

Use ss -lt to see if the port is listening.
If not, run systemctl status name.service.
If failed, inspect logs with journalctl -u name.service.

Historical vs Real‑Time Monitoring

Everything so far has been local and mostly real‑time. For larger environments you also need:

Historical metrics – CPU/memory/traffic graphs over days/months
Alerts – notifications when thresholds are exceeded

These are usually provided by dedicated monitoring systems (e.g. Prometheus, Grafana, Zabbix, Nagios, etc.). Setting them up is beyond this beginner chapter, but understanding local tools prepares you to interpret the data they collect.

Basic Monitoring Workflows

When “the server is slow”

A simple step‑by‑step checklist:

Check overall load and uptime

   uptime

Check CPU and process usage

top

Is some process at very high %CPU?

Check memory and swap

   free -h
   vmstat 1

Is available memory very low?
Is there sustained swap in/out?

Check disk space

   df -h

Check disk I/O

   iostat -x 1

Is %util near 100%?

Check network

   ss -lt
   ping -c 4 some_important_host

This simple sequence already covers many common problems.

When a specific service has issues

Is the process running?

   ps aux | grep SERVICE_NAME

Is systemd reporting problems?

   systemctl status SERVICE_NAME

Is the port open?

   ss -lt | grep PORT

Are there log errors?
Use journalctl -u SERVICE_NAME or check the relevant log file in /var/log (covered in the logs chapter).

Building Good Monitoring Habits

Regularly run uptime, top, free -h, df -h on healthy systems to learn what “normal” looks like.
Keep a simple text log of observations when you troubleshoot: times, symptoms, and key command outputs.
Learn the trend of metrics, not just single snapshots—commands like vmstat 1 and iostat -x 1 are helpful for this.
Don’t rely solely on one tool; combine:

top / htop for CPU and processes
free / vmstat for memory
df / du / iostat for disk
ss / ping for network
systemctl / logs for services

These fundamentals form the base for the more specialized monitoring chapters that follow.

3.5.1 CPU and memory monitoring

3.5.2 Disk and I/O monitoring

3.5.3 Log files in /var/log

3.5.4 Boot performance

3.5.5 Monitoring running services

Comments

Please login to add a comment.

Don't have an account? Register now!

3.5 System Monitoring

Why System Monitoring Matters

Types of Monitoring

Core Monitoring Tools Overview

Real-Time Interactive Monitors

Using `top`

Using `htop` (if installed)

Non-Interactive Snapshots

`ps`

Why System Monitoring Matters

Types of Things You Monitor

Core Local Monitoring Tools

Real‑Time Interactive Monitors

`top`

`htop` (if installed)

One‑Shot Snapshots

`ps` – process listing

`uptime` – quick health glance

Monitoring Memory Usage

`free`

`vmstat`

Monitoring Disk Usage and I/O

`df` – filesystem usage

`du` – where disk space is used

`iostat` – disk I/O

`iotop` – I/O by process

Monitoring Network Activity (Quick View)

`ss` – sockets and connections

`ip` and `ping`

Monitoring Services

Historical vs Real‑Time Monitoring

Basic Monitoring Workflows

When “the server is slow”

When a specific service has issues

Building Good Monitoring Habits

Comments

Where to Move