Table of Contents
Why System Monitoring Matters
Monitoring is about continuously observing your system so you can:
- Detect problems early (high CPU, low disk space, failing services)
- Understand “normal” behavior (baseline)
- Troubleshoot issues faster (slowness, crashes, network problems)
- Plan capacity (when to add CPU, RAM, storage)
For an administrator, monitoring is not optional—it’s how you keep systems healthy and predictable.
Key ideas that apply throughout this chapter:
- Metrics vs logs: metrics are numeric (CPU %, memory usage); logs are detailed events/messages. Monitoring usually starts with metrics and then uses logs to investigate.
- Real‑time vs historical: some tools show what’s happening now; others help you look back in time.
- Local vs centralized: you can monitor a single machine locally, or send data to a central server for many machines.
The rest of this chapter introduces practical tools and basic workflows for monitoring a Linux system.
Types of Monitoring
System monitoring usually covers:
- Resource monitoring
- CPU, memory, swap
- Disk space and I/O
- Network throughput and connections
- Service monitoring
- Is the service running?
- Is it responding correctly/quickly?
- Performance monitoring
- Which processes are heavy?
- Where is the bottleneck (CPU, RAM, disk, network)?
In later sections of this part, you’ll go deeper into specific areas (CPU, memory, disk, services). Here we’ll build a general toolkit and mindset.
Core Monitoring Tools Overview
Linux provides a set of command‑line tools that almost every admin uses:
top,htop: real‑time view of CPU, memory, processesfree,vmstat: memory and virtual memory statisticsiostat,iotop,df,du: disk usage and disk I/Ops,pgrep,pidof: process listings and searchss,ip,ping: basic network statejournalctl,/var/log/*: logs (covered in more detail in the log‑specific chapter)systemctl: service status (covered in systemd chapter, but used heavily in monitoring)
You don’t have to memorize all of them immediately, but you should know which tool to reach for in common situations.
Real-Time Interactive Monitors
Using `top`
top gives a live, updating view of processes and overall system usage.
Start it:
topKey parts of the display:
- Load averages, CPU usage, tasks summary at the top
- Memory and swap usage
- Process list (PID, user, CPU %, memory %, command)
Useful interactive keys (press while top is running):
q– quitP– sort by CPU usageM– sort by memory usagek– kill a process (you’ll be prompted for PID and signal)u– filter by userh– help
Typical use: “What is using all the CPU right now?” Sort by CPU (P) and look at the top entries.
Using `htop` (if installed)
htop is a more user‑friendly alternative to top:
htopAdvantages:
- Colorful bar graphs for CPU, memory, swap
- Scrollable list of processes
- Mouse support
- Easier to kill/renice processes
Common keys:
F6– change sort columnF3– search processesF9– kill selected process
If you don’t have it, install with your package manager (details in the package management chapters).
Non-Interactive Snapshots
Sometimes you want a one‑off snapshot instead of a live view.
`ps`
Why System Monitoring Matters
System monitoring is how you:
- Notice problems before users complain
- Identify what’s actually slow (CPU, RAM, disk, network, or a specific process)
- Compare “today” to “normal” behavior
- Decide when to add or resize resources
In this chapter you’ll build a basic toolkit and workflow for watching what a Linux system is doing, in real time and over time.
Types of Things You Monitor
System monitoring usually focuses on:
- Resources
- CPU usage and load
- Memory and swap
- Disk space and disk I/O
- Network traffic
- Processes
- Which processes exist
- How much they use of each resource
- Services
- Whether essential services are running
- Whether they are responding quickly
Other chapters in this part cover CPU/memory/disk/logs in more depth; here the goal is to learn the general‑purpose tools and habits.
Core Local Monitoring Tools
You will repeatedly use these built‑in command‑line tools:
top,htop– live, interactive system/process viewps– one‑shot process listfree,vmstat– memory usage and virtual memory activityuptime– how long the system has been running and load averagesdf,du– disk usageiostat,iotop– disk I/O (often insysstatandiotoppackages)ss,ip,ping– basic networking statesystemctl,journalctl– service status and logs (details in other chapters)
You don’t need all of them at once, but you should know which tool to pick when something looks wrong.
Real‑Time Interactive Monitors
`top`
top shows a live view of CPU, memory, and processes.
Run:
topThe display has two main parts:
- Summary area (top lines)
- Load averages, uptime
- Number of tasks and their states
- Overall CPU usage
- Memory and swap usage
- Process list
PID,USER,%CPU,%MEM,TIME+,COMMAND, etc.
Useful keys while top is running:
q– quitP– sort by CPU usageM– sort by memory usageT– sort by total CPU timek– kill a process (you enter PID and signal)u– filter by userh– help
Common use cases:
- “Why is CPU so high?” → run
top, pressP, check the first few processes. - “Which processes are using most RAM?” → run
top, pressM.
`htop` (if installed)
htop is a friendlier alternative to top, with colors, bars, and scrolling.
Run:
htop
Advantages compared to top:
- Color bar graphs for each CPU, memory, and swap
- Scrollable process list
- Tree view of processes (parent/child relationships)
- Easy mouse interaction
Common keys:
F6– change sort columnF3– search process by nameF5– tree view on/offF9– kill selected processF10– quit
On most systems you need to install it (htop package) using your distribution’s package manager.
One‑Shot Snapshots
Interactive tools are useful when you are actively investigating. For scripts or quick checks, use one‑shot commands.
`ps` – process listing
ps prints information about processes at the moment you run it.
Some common forms:
ps aux– “BSD style” full list:a– all users with a terminalu– show user‑oriented outputx– include processes without a terminalps -ef– “UNIX style” full list
Examples:
# All processes, human‑friendly view
ps aux
# Filter processes by name
ps aux | grep sshd
# Show processes for current user
ps ux
Use ps when you want to combine with grep, or in scripts where interactive tools like top don’t make sense.
`uptime` – quick health glance
uptime shows how long the system has been running and the load averages:
uptimeExample output:
14:22:01 up 3 days, 5:17, 2 users, load average: 0.32, 0.45, 0.40The last three numbers are 1‑, 5‑, and 15‑minute load averages. Trend and comparison to CPU cores matter more than the raw numbers; that is covered in detail in the CPU/memory monitoring chapter.
Monitoring Memory Usage
You’ll go deeper into memory monitoring later; here are the basic tools.
`free`
free summarizes memory and swap usage:
free -h
The -h option makes the numbers “human readable” (MB/GB). Watch:
usedvsfreeavailable– how much memory can be used without swapping
This is your first stop when the system “feels slow” or is suspected of being out of memory.
`vmstat`
vmstat (virtual memory statistics) gives a compact overview of CPU, memory, and I/O activity:
vmstat 1
This prints a line every second until you stop it with Ctrl+C. Important columns include:
r– runnable processes (waiting for CPU)si/so– swap in / swap outbi/bo– blocks in / out (disk I/O)us,sy,id,wa– user, system, idle, and I/O wait CPU percentages
Use vmstat when you want to see, over time, whether the system is actually swapping, waiting on disk, etc.
Monitoring Disk Usage and I/O
The “Disk and I/O monitoring” chapter goes into detail; here’s the basic toolkit.
`df` – filesystem usage
df shows how full each mounted filesystem is:
df -hColumns:
Filesystem– device or nameSize,Used,AvailUse%– percentage usedMounted on
If Use% is close to 100% on / or /var, you must free space or expand storage.
`du` – where disk space is used
du summarizes how much space directories use.
Examples:
# Rough size of current directory
du -sh .
# Top‑level usage in current directory
du -sh *Useful for tracking down which directory (logs, cache, user data) is filling the disk.
`iostat` – disk I/O
iostat (from the sysstat package on many distros) shows I/O statistics for devices.
iostat -x 1
-x shows extended statistics; 1 means update every second. Look for:
%util– how busy the device is (near 100% = saturated)r/s,w/s– read/write requests per secondrkB/s,wkB/s– KB per second
When the system is slow, high %util and high await times on a device can indicate a disk bottleneck.
`iotop` – I/O by process
iotop (if installed) is like top for disk I/O.
sudo iotopIt shows which processes are reading/writing the most to disk in real time, which is handy when “something is hammering the disk” and you don’t know what.
Monitoring Network Activity (Quick View)
Full networking tools are covered elsewhere; these are simple first‑look commands.
`ss` – sockets and connections
ss shows listening and established sockets:
# Listening TCP ports
ss -lt
# All established TCP connections
ss -tan state establishedUseful to confirm whether a service is actually listening on a port, or to see rough connection counts.
`ip` and `ping`
ip a– show network interfaces and IP addressesping host– test basic connectivity and latency:
ping -c 4 example.comIf users report a service is “down”, check:
- Is the service process running? (
ps,systemctl status) - Is it listening on the expected port? (
ss -lt) - Is the network path OK? (
ping, traceroute tools from the networking chapter)
Monitoring Services
Details on systemd and daemons are in another chapter, but monitoring often starts with:
systemctl status SERVICE_NAMEThis tells you:
- Whether the service is
active/failed - Recent log lines
- Start time and restart status
Common pattern during troubleshooting:
- Use
ss -ltto see if the port is listening. - If not, run
systemctl status name.service. - If failed, inspect logs with
journalctl -u name.service.
Historical vs Real‑Time Monitoring
Everything so far has been local and mostly real‑time. For larger environments you also need:
- Historical metrics – CPU/memory/traffic graphs over days/months
- Alerts – notifications when thresholds are exceeded
These are usually provided by dedicated monitoring systems (e.g. Prometheus, Grafana, Zabbix, Nagios, etc.). Setting them up is beyond this beginner chapter, but understanding local tools prepares you to interpret the data they collect.
Basic Monitoring Workflows
When “the server is slow”
A simple step‑by‑step checklist:
- Check overall load and uptime
uptime- Check CPU and process usage
top- Is some process at very high
%CPU?
- Check memory and swap
free -h
vmstat 1- Is
availablememory very low? - Is there sustained swap in/out?
- Check disk space
df -h- Check disk I/O
iostat -x 1- Is
%utilnear 100%?
- Check network
ss -lt
ping -c 4 some_important_hostThis simple sequence already covers many common problems.
When a specific service has issues
- Is the process running?
ps aux | grep SERVICE_NAME- Is systemd reporting problems?
systemctl status SERVICE_NAME- Is the port open?
ss -lt | grep PORT- Are there log errors?
Usejournalctl -u SERVICE_NAMEor check the relevant log file in/var/log(covered in the logs chapter).
Building Good Monitoring Habits
- Regularly run
uptime,top,free -h,df -hon healthy systems to learn what “normal” looks like. - Keep a simple text log of observations when you troubleshoot: times, symptoms, and key command outputs.
- Learn the trend of metrics, not just single snapshots—commands like
vmstat 1andiostat -x 1are helpful for this. - Don’t rely solely on one tool; combine:
top/htopfor CPU and processesfree/vmstatfor memorydf/du/iostatfor diskss/pingfor networksystemctl/ logs for services
These fundamentals form the base for the more specialized monitoring chapters that follow.