3.5.1 CPU and memory monitoring

Why CPU and Memory Monitoring Matters

On a running Linux system, CPU and memory are usually the first bottlenecks you’ll notice when things “feel slow.” Monitoring them helps you:

Detect overloaded or misbehaving applications.
Capacity-plan: know when you need more resources.
Troubleshoot performance complaints (“the server is slow!”).
Verify the effect of configuration changes or deployments.

This chapter focuses on practical CPU and memory monitoring using standard tools on most Linux systems.

Key Concepts for CPU Monitoring

You don’t need deep kernel internals here, but a few concepts help interpret tool output correctly.

CPU core vs CPU thread

A core is a physical processing unit.
A thread (logical CPU) is typically what the OS sees, including things like Hyper-Threading.

CPU usage percentage
Most tools show CPU usage per core or averaged across cores. A single-core CPU is 100% busy at full load; a 4-core CPU can show up to 400% if summed across all cores (in some tools).
User vs system vs idle time (common categories):

us (user): time spent running user-space processes.
sy (system): time spent in kernel code.
id (idle): CPU not doing work.
wa (iowait): CPU idle while waiting for disk I/O.
Others include ni (nice), hi/si (hardware/software interrupts), depending on the tool.

High CPU is not always bad; high CPU doing useful work can be expected. Problems usually arise when:

CPU is saturated (little idle time) and performance is poor.
System time is unusually high (possible kernel or I/O issues).
One process pegs a single core on a multicore system, creating a hotspot.

Key Concepts for Memory Monitoring

Linux memory usage can look “full” even on healthy systems because the kernel aggressively uses memory for cache. Important distinctions:

Total memory: installed RAM.
Used memory: often includes caches and buffers; raw “used” numbers can be misleading.
Free memory: truly unused RAM (often small on a healthy system).
Buffers and cache:

Buffers: metadata for block devices.
Cache (page cache): cached file contents.

Available memory: estimated memory that can be used without swapping; more useful than “free”.
Swap: disk-based extension of RAM; much slower than physical memory.

Warning signs:

Low available memory and growing swap usage.
Constant swapping and high I/O wait.
Out-of-memory (OOM) kills: kernel terminates processes due to lack of RAM.

Using `top` for Interactive Monitoring

top is installed on almost all Linux systems and shows a live view of processes and resource usage.

Run:

top

CPU Section in `top`

Near the top, you’ll see a line like:

%Cpu(s): 10.0 us,  2.0 sy,  0.0 ni, 80.0 id,  5.0 wa,  0.0 hi,  3.0 si,  0.0 st

Common fields:

us – user CPU time.
sy – system (kernel) time.
id – idle.
wa – iowait (time waiting for disk I/O).
si – soft interrupts.
st – steal time (time “stolen” by the hypervisor; relevant in virtual machines).

Reading it:

If id is close to 0%, CPU is saturated.
High wa suggests CPU is often idle waiting for disk; the system may be I/O-bound, not truly CPU-bound.
High sy relative to us can indicate kernel or system call overhead issues.

Per-CPU View in `top`

By default, top shows overall averages. To split by CPU:

Press 1 while top is running to toggle per-CPU usage.

This is useful when:

One process is pegging a single core while others are idle.
You want to see if load is well-distributed across cores.

Memory Section in `top`

Typical lines:

MiB Mem :  7859.0 total,   500.0 free,  2000.0 used,  5359.0 buff/cache
MiB Swap:  2047.0 total,  2047.0 free,     0.0 used.  4000.0 avail Mem

Important numbers:

total – total system RAM.
free – completely unused RAM.
used – used by processes + kernel + cache.
buff/cache – memory used for buffers and file cache.
avail Mem – estimated memory still available without swapping; more accurate for health checks than free.

If avail Mem is low and swap usage is increasing, the system is under memory pressure.

Focusing on CPU or Memory-Hungry Processes in `top`

In the process list:

PID – process ID.
%CPU – CPU usage (per logical CPU).
%MEM – percentage of RAM used.

Useful shortcuts:

Press P – sort by CPU usage (descending).
Press M – sort by memory usage (descending).
Press E – toggle units for memory display (KB/MB/GB).

This lets you quickly identify which processes consume the most CPU or RAM.

Using `htop` for a More User-Friendly View

htop is an enhanced alternative to top (not always installed by default, but commonly available via your package manager).

Run:

htop

Key advantages over top:

Colorful per-CPU bars at the top.
Memory and swap usage bars.
Easier navigation with arrow keys, function keys, and mouse.
Can kill or renice processes from within the interface.

Typical bars:

CPU bars show overall load and distinguish user vs system time by color.
Memory bar shows used vs cache vs free memory.
Swap bar shows swap usage.

Sorting and Filtering in `htop`

Use F6 to change sort column (e.g., %CPU, RES).
Use F3 to search process names.
Use F9 to send signals (e.g., TERM, KILL) to the selected process.

htop is ideal for quickly spotting spikes, runaway processes, and verifying that all cores are being utilized.

Using `vmstat` for System-Wide Trends

vmstat provides a snapshot of virtual memory, processes, and CPU activity.

Run once:

vmstat

Or run repeatedly every second:

vmstat 1

Typical output:

procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
 1  0      0 120000  20000 500000    0    0     5     2  100  200 10  5 80  5  0

Useful columns:

r – runnable processes (run queue length). If consistently higher than the number of CPUs, you might have CPU contention.
b – processes blocked (often on I/O).
swpd – total swap used.
si/so – swap in / swap out; sustained non-zero values indicate swapping.
us, sy, id, wa – similar meanings as in top.

vmstat is particularly useful to watch trends over time at a low overhead, for example when reproducing an issue.

Using `mpstat` for Per-CPU Monitoring

mpstat (from the sysstat package on many distros) focuses on CPU usage, especially per-CPU breakdowns.

To see overall usage every 2 seconds:

mpstat 2

To see per-CPU usage:

mpstat -P ALL 2

Typical fields:

%usr, %sys, %idle, %iowait, %irq, %soft, %steal.

Use cases:

Detect imbalanced workloads (one CPU overloaded while others idle).
Spot steal time (%steal) in virtualized environments indicating CPU contention from other VMs on the host.

Using `free` to Inspect Memory Usage

free shows a quick summary of RAM and swap.

Run:

free -h

Example:

              total        used        free      shared  buff/cache   available
Mem:           7.7G        2.0G        0.5G        0.2G        5.2G        4.0G
Swap:          2.0G          0B        2.0G

Interpretation:

used includes everything, including cache.
available is usually the best quick indicator of whether you’re actually close to memory exhaustion.
Check Swap:

If Swap used is steadily growing and available is low, the system is under memory pressure.

Using `/proc/meminfo` for Detailed Memory Stats

For more detailed memory information:

cat /proc/meminfo

This file contains many fields; common ones:

MemTotal – total physical RAM.
MemFree – completely free memory.
Buffers, Cached – memory used for buffers and cache.
SwapTotal, SwapFree.
Active, Inactive – memory actively in use vs less recently used.
Dirty – memory waiting to be written to disk.

Use /proc/meminfo when you need more precision than free gives, or when writing scripts that parse memory values.

Using `ps` to Identify Heavy Processes

While top and htop are interactive, ps is good for one-off snapshots or scripts.

Examples:

Top 10 CPU consumers:

  ps aux --sort=-%cpu | head -n 11

Top 10 memory consumers:

  ps aux --sort=-%mem | head -n 11

Common columns:

%CPU – recent CPU usage.
%MEM – percentage of physical RAM used.
RSS – resident set size (actual physical memory used).
VSZ – virtual memory size (address space, not all resident in RAM).

ps excels when you need to log or script resource checks, not just visually inspect them.

Using `sar` for Historical CPU and Memory Data

Real-time tools only show the present moment. For historical analysis, sar (also in sysstat) can log CPU and memory usage over time.

Depending on your distribution, you may need to:

Install sysstat.
Enable and start its data collection service or cron job.

Examples (after data collection is enabled):

Average CPU usage for today:

  sar -u

Detailed per-CPU usage:

  sar -P ALL

Memory usage over the day:

  sar -r

sar is useful to answer questions like “What was CPU and memory usage at 3 PM yesterday?” or to see patterns over days.

Basic Patterns and How to Interpret Them

CPU Bottlenecks

Signs:

Very low idle (id) time in top, vmstat, or mpstat.
Run queue (r in vmstat) consistently above the number of CPUs.
High %CPU usage for one or more processes.

Possible actions (overview):

Identify the process(es) causing load (top, htop, ps).
Check if the workload can be parallelized across more cores.
Consider tuning or scaling (moving work to another machine, adding CPU resources).

Memory Bottlenecks

Signs:

Low available memory (free -h, top).
Growing and active swap usage (swpd, si, so in vmstat).
Slow performance with high I/O wait; frequent disk activity.
OOM kills (check logs in /var/log).

Possible actions (overview):

Identify memory-hungry processes (top, htop, ps).
Restart or reconfigure services using excessive memory.
Add more RAM or distribute workloads.

Simple Scripting Ideas for CPU/Memory Checks

Even at an intermediate level, you can automate basic checks.

Example: warn if CPU idle falls below 10%:

#!/bin/bash
idle=$(vmstat 1 2 | tail -1 | awk '{print $15}')
if [ "$idle" -lt 10 ]; then
  echo "Warning: low CPU idle: ${idle}%"
fi

Example: warn if available memory below 500 MB:

#!/bin/bash
avail=$(awk '/MemAvailable/ {print $2}' /proc/meminfo)  # in kB
if [ "$avail" -lt 512000 ]; then
  echo "Warning: low available memory: $((avail / 1024)) MB"
fi

These are simple examples; later chapters will cover more robust monitoring and alerting using dedicated tools.

Choosing the Right Tool

As a quick guide:

Need a live interactive overview with process list? Use top or htop.
Need quick one-line memory summary? Use free -h.
Need low-overhead trend view? Use vmstat or mpstat.
Need historical data (what happened earlier)? Use sar.
Need to script checks or logs? Use ps, /proc/meminfo, and simple shell scripts.

Experiment with these tools on a lightly loaded system and then under load (e.g., while compiling software or running a CPU/memory-intensive program) to build intuition about what “normal” and “problematic” states look like.

Comments

Please login to add a comment.

Don't have an account? Register now!

3.5.1 CPU and memory monitoring

Why CPU and Memory Monitoring Matters

Key Concepts for CPU Monitoring

Key Concepts for Memory Monitoring

Using `top` for Interactive Monitoring

CPU Section in `top`

Per-CPU View in `top`

Memory Section in `top`

Focusing on CPU or Memory-Hungry Processes in `top`

Using `htop` for a More User-Friendly View

Sorting and Filtering in `htop`

Using `vmstat` for System-Wide Trends

Using `mpstat` for Per-CPU Monitoring

Using `free` to Inspect Memory Usage

Using `/proc/meminfo` for Detailed Memory Stats

Using `ps` to Identify Heavy Processes

Using `sar` for Historical CPU and Memory Data

Basic Patterns and How to Interpret Them

CPU Bottlenecks

Memory Bottlenecks

Simple Scripting Ideas for CPU/Memory Checks

Choosing the Right Tool

Comments

Where to Move