Table of Contents
Why CPU and Memory Monitoring Matters
On a running Linux system, CPU and memory are usually the first bottlenecks you’ll notice when things “feel slow.” Monitoring them helps you:
- Detect overloaded or misbehaving applications.
- Capacity-plan: know when you need more resources.
- Troubleshoot performance complaints (“the server is slow!”).
- Verify the effect of configuration changes or deployments.
This chapter focuses on practical CPU and memory monitoring using standard tools on most Linux systems.
Key Concepts for CPU Monitoring
You don’t need deep kernel internals here, but a few concepts help interpret tool output correctly.
- CPU core vs CPU thread
- A core is a physical processing unit.
- A thread (logical CPU) is typically what the OS sees, including things like Hyper-Threading.
- CPU usage percentage
Most tools show CPU usage per core or averaged across cores. A single-core CPU is 100% busy at full load; a 4-core CPU can show up to 400% if summed across all cores (in some tools). - User vs system vs idle time (common categories):
us(user): time spent running user-space processes.sy(system): time spent in kernel code.id(idle): CPU not doing work.wa(iowait): CPU idle while waiting for disk I/O.- Others include
ni(nice),hi/si(hardware/software interrupts), depending on the tool.
High CPU is not always bad; high CPU doing useful work can be expected. Problems usually arise when:
- CPU is saturated (little idle time) and performance is poor.
- System time is unusually high (possible kernel or I/O issues).
- One process pegs a single core on a multicore system, creating a hotspot.
Key Concepts for Memory Monitoring
Linux memory usage can look “full” even on healthy systems because the kernel aggressively uses memory for cache. Important distinctions:
- Total memory: installed RAM.
- Used memory: often includes caches and buffers; raw “used” numbers can be misleading.
- Free memory: truly unused RAM (often small on a healthy system).
- Buffers and cache:
- Buffers: metadata for block devices.
- Cache (page cache): cached file contents.
- Available memory: estimated memory that can be used without swapping; more useful than “free”.
- Swap: disk-based extension of RAM; much slower than physical memory.
Warning signs:
- Low
availablememory and growing swap usage. - Constant swapping and high I/O wait.
- Out-of-memory (OOM) kills: kernel terminates processes due to lack of RAM.
Using `top` for Interactive Monitoring
top is installed on almost all Linux systems and shows a live view of processes and resource usage.
Run:
topCPU Section in `top`
Near the top, you’ll see a line like:
%Cpu(s): 10.0 us, 2.0 sy, 0.0 ni, 80.0 id, 5.0 wa, 0.0 hi, 3.0 si, 0.0 stCommon fields:
us– user CPU time.sy– system (kernel) time.id– idle.wa– iowait (time waiting for disk I/O).si– soft interrupts.st– steal time (time “stolen” by the hypervisor; relevant in virtual machines).
Reading it:
- If
idis close to 0%, CPU is saturated. - High
wasuggests CPU is often idle waiting for disk; the system may be I/O-bound, not truly CPU-bound. - High
syrelative touscan indicate kernel or system call overhead issues.
Per-CPU View in `top`
By default, top shows overall averages. To split by CPU:
- Press
1whiletopis running to toggle per-CPU usage.
This is useful when:
- One process is pegging a single core while others are idle.
- You want to see if load is well-distributed across cores.
Memory Section in `top`
Typical lines:
MiB Mem : 7859.0 total, 500.0 free, 2000.0 used, 5359.0 buff/cache
MiB Swap: 2047.0 total, 2047.0 free, 0.0 used. 4000.0 avail MemImportant numbers:
total– total system RAM.free– completely unused RAM.used– used by processes + kernel + cache.buff/cache– memory used for buffers and file cache.avail Mem– estimated memory still available without swapping; more accurate for health checks thanfree.
If avail Mem is low and swap usage is increasing, the system is under memory pressure.
Focusing on CPU or Memory-Hungry Processes in `top`
In the process list:
PID– process ID.%CPU– CPU usage (per logical CPU).%MEM– percentage of RAM used.
Useful shortcuts:
- Press
P– sort by CPU usage (descending). - Press
M– sort by memory usage (descending). - Press
E– toggle units for memory display (KB/MB/GB).
This lets you quickly identify which processes consume the most CPU or RAM.
Using `htop` for a More User-Friendly View
htop is an enhanced alternative to top (not always installed by default, but commonly available via your package manager).
Run:
htop
Key advantages over top:
- Colorful per-CPU bars at the top.
- Memory and swap usage bars.
- Easier navigation with arrow keys, function keys, and mouse.
- Can kill or renice processes from within the interface.
Typical bars:
- CPU bars show overall load and distinguish user vs system time by color.
- Memory bar shows used vs cache vs free memory.
- Swap bar shows swap usage.
Sorting and Filtering in `htop`
- Use
F6to change sort column (e.g.,%CPU,RES). - Use
F3to search process names. - Use
F9to send signals (e.g.,TERM,KILL) to the selected process.
htop is ideal for quickly spotting spikes, runaway processes, and verifying that all cores are being utilized.
Using `vmstat` for System-Wide Trends
vmstat provides a snapshot of virtual memory, processes, and CPU activity.
Run once:
vmstatOr run repeatedly every second:
vmstat 1Typical output:
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
r b swpd free buff cache si so bi bo in cs us sy id wa st
1 0 0 120000 20000 500000 0 0 5 2 100 200 10 5 80 5 0Useful columns:
r– runnable processes (run queue length). If consistently higher than the number of CPUs, you might have CPU contention.b– processes blocked (often on I/O).swpd– total swap used.si/so– swap in / swap out; sustained non-zero values indicate swapping.us,sy,id,wa– similar meanings as intop.
vmstat is particularly useful to watch trends over time at a low overhead, for example when reproducing an issue.
Using `mpstat` for Per-CPU Monitoring
mpstat (from the sysstat package on many distros) focuses on CPU usage, especially per-CPU breakdowns.
To see overall usage every 2 seconds:
mpstat 2To see per-CPU usage:
mpstat -P ALL 2Typical fields:
%usr,%sys,%idle,%iowait,%irq,%soft,%steal.
Use cases:
- Detect imbalanced workloads (one CPU overloaded while others idle).
- Spot steal time (
%steal) in virtualized environments indicating CPU contention from other VMs on the host.
Using `free` to Inspect Memory Usage
free shows a quick summary of RAM and swap.
Run:
free -hExample:
total used free shared buff/cache available
Mem: 7.7G 2.0G 0.5G 0.2G 5.2G 4.0G
Swap: 2.0G 0B 2.0GInterpretation:
usedincludes everything, including cache.availableis usually the best quick indicator of whether you’re actually close to memory exhaustion.- Check
Swap: - If
Swap usedis steadily growing andavailableis low, the system is under memory pressure.
Using `/proc/meminfo` for Detailed Memory Stats
For more detailed memory information:
cat /proc/meminfoThis file contains many fields; common ones:
MemTotal– total physical RAM.MemFree– completely free memory.Buffers,Cached– memory used for buffers and cache.SwapTotal,SwapFree.Active,Inactive– memory actively in use vs less recently used.Dirty– memory waiting to be written to disk.
Use /proc/meminfo when you need more precision than free gives, or when writing scripts that parse memory values.
Using `ps` to Identify Heavy Processes
While top and htop are interactive, ps is good for one-off snapshots or scripts.
Examples:
- Top 10 CPU consumers:
ps aux --sort=-%cpu | head -n 11- Top 10 memory consumers:
ps aux --sort=-%mem | head -n 11Common columns:
%CPU– recent CPU usage.%MEM– percentage of physical RAM used.RSS– resident set size (actual physical memory used).VSZ– virtual memory size (address space, not all resident in RAM).
ps excels when you need to log or script resource checks, not just visually inspect them.
Using `sar` for Historical CPU and Memory Data
Real-time tools only show the present moment. For historical analysis, sar (also in sysstat) can log CPU and memory usage over time.
Depending on your distribution, you may need to:
- Install
sysstat. - Enable and start its data collection service or cron job.
Examples (after data collection is enabled):
- Average CPU usage for today:
sar -u- Detailed per-CPU usage:
sar -P ALL- Memory usage over the day:
sar -r
sar is useful to answer questions like “What was CPU and memory usage at 3 PM yesterday?” or to see patterns over days.
Basic Patterns and How to Interpret Them
CPU Bottlenecks
Signs:
- Very low idle (
id) time intop,vmstat, ormpstat. - Run queue (
rinvmstat) consistently above the number of CPUs. - High
%CPUusage for one or more processes.
Possible actions (overview):
- Identify the process(es) causing load (
top,htop,ps). - Check if the workload can be parallelized across more cores.
- Consider tuning or scaling (moving work to another machine, adding CPU resources).
Memory Bottlenecks
Signs:
- Low
availablememory (free -h,top). - Growing and active swap usage (
swpd,si,soinvmstat). - Slow performance with high I/O wait; frequent disk activity.
- OOM kills (check logs in
/var/log).
Possible actions (overview):
- Identify memory-hungry processes (
top,htop,ps). - Restart or reconfigure services using excessive memory.
- Add more RAM or distribute workloads.
Simple Scripting Ideas for CPU/Memory Checks
Even at an intermediate level, you can automate basic checks.
Example: warn if CPU idle falls below 10%:
#!/bin/bash
idle=$(vmstat 1 2 | tail -1 | awk '{print $15}')
if [ "$idle" -lt 10 ]; then
echo "Warning: low CPU idle: ${idle}%"
fiExample: warn if available memory below 500 MB:
#!/bin/bash
avail=$(awk '/MemAvailable/ {print $2}' /proc/meminfo) # in kB
if [ "$avail" -lt 512000 ]; then
echo "Warning: low available memory: $((avail / 1024)) MB"
fiThese are simple examples; later chapters will cover more robust monitoring and alerting using dedicated tools.
Choosing the Right Tool
As a quick guide:
- Need a live interactive overview with process list? Use
toporhtop. - Need quick one-line memory summary? Use
free -h. - Need low-overhead trend view? Use
vmstatormpstat. - Need historical data (what happened earlier)? Use
sar. - Need to script checks or logs? Use
ps,/proc/meminfo, and simple shell scripts.
Experiment with these tools on a lightly loaded system and then under load (e.g., while compiling software or running a CPU/memory-intensive program) to build intuition about what “normal” and “problematic” states look like.