Kahibaro
Discord Login Register

System Monitoring

Why System Monitoring Matters

Monitoring is about continuously observing your system so you can:

For an administrator, monitoring is not optional—it’s how you keep systems healthy and predictable.

Key ideas that apply throughout this chapter:

The rest of this chapter introduces practical tools and basic workflows for monitoring a Linux system.

Types of Monitoring

System monitoring usually covers:

In later sections of this part, you’ll go deeper into specific areas (CPU, memory, disk, services). Here we’ll build a general toolkit and mindset.

Core Monitoring Tools Overview

Linux provides a set of command‑line tools that almost every admin uses:

You don’t have to memorize all of them immediately, but you should know which tool to reach for in common situations.

Real-Time Interactive Monitors

Using `top`

top gives a live, updating view of processes and overall system usage.

Start it:

top

Key parts of the display:

Useful interactive keys (press while top is running):

Typical use: “What is using all the CPU right now?” Sort by CPU (P) and look at the top entries.

Using `htop` (if installed)

htop is a more user‑friendly alternative to top:

htop

Advantages:

Common keys:

If you don’t have it, install with your package manager (details in the package management chapters).

Non-Interactive Snapshots

Sometimes you want a one‑off snapshot instead of a live view.

`ps`

Why System Monitoring Matters

System monitoring is how you:

In this chapter you’ll build a basic toolkit and workflow for watching what a Linux system is doing, in real time and over time.

Types of Things You Monitor

System monitoring usually focuses on:

Other chapters in this part cover CPU/memory/disk/logs in more depth; here the goal is to learn the general‑purpose tools and habits.

Core Local Monitoring Tools

You will repeatedly use these built‑in command‑line tools:

You don’t need all of them at once, but you should know which tool to pick when something looks wrong.

Real‑Time Interactive Monitors

`top`

top shows a live view of CPU, memory, and processes.

Run:

top

The display has two main parts:

Useful keys while top is running:

Common use cases:

`htop` (if installed)

htop is a friendlier alternative to top, with colors, bars, and scrolling.

Run:

htop

Advantages compared to top:

Common keys:

On most systems you need to install it (htop package) using your distribution’s package manager.

One‑Shot Snapshots

Interactive tools are useful when you are actively investigating. For scripts or quick checks, use one‑shot commands.

`ps` – process listing

ps prints information about processes at the moment you run it.

Some common forms:

Examples:

# All processes, human‑friendly view
ps aux
# Filter processes by name
ps aux | grep sshd
# Show processes for current user
ps ux

Use ps when you want to combine with grep, or in scripts where interactive tools like top don’t make sense.

`uptime` – quick health glance

uptime shows how long the system has been running and the load averages:

uptime

Example output:

 14:22:01 up 3 days,  5:17,  2 users,  load average: 0.32, 0.45, 0.40

The last three numbers are 1‑, 5‑, and 15‑minute load averages. Trend and comparison to CPU cores matter more than the raw numbers; that is covered in detail in the CPU/memory monitoring chapter.

Monitoring Memory Usage

You’ll go deeper into memory monitoring later; here are the basic tools.

`free`

free summarizes memory and swap usage:

free -h

The -h option makes the numbers “human readable” (MB/GB). Watch:

This is your first stop when the system “feels slow” or is suspected of being out of memory.

`vmstat`

vmstat (virtual memory statistics) gives a compact overview of CPU, memory, and I/O activity:

vmstat 1

This prints a line every second until you stop it with Ctrl+C. Important columns include:

Use vmstat when you want to see, over time, whether the system is actually swapping, waiting on disk, etc.

Monitoring Disk Usage and I/O

The “Disk and I/O monitoring” chapter goes into detail; here’s the basic toolkit.

`df` – filesystem usage

df shows how full each mounted filesystem is:

df -h

Columns:

If Use% is close to 100% on / or /var, you must free space or expand storage.

`du` – where disk space is used

du summarizes how much space directories use.

Examples:

# Rough size of current directory
du -sh .
# Top‑level usage in current directory
du -sh *

Useful for tracking down which directory (logs, cache, user data) is filling the disk.

`iostat` – disk I/O

iostat (from the sysstat package on many distros) shows I/O statistics for devices.

iostat -x 1

-x shows extended statistics; 1 means update every second. Look for:

When the system is slow, high %util and high await times on a device can indicate a disk bottleneck.

`iotop` – I/O by process

iotop (if installed) is like top for disk I/O.

sudo iotop

It shows which processes are reading/writing the most to disk in real time, which is handy when “something is hammering the disk” and you don’t know what.

Monitoring Network Activity (Quick View)

Full networking tools are covered elsewhere; these are simple first‑look commands.

`ss` – sockets and connections

ss shows listening and established sockets:

# Listening TCP ports
ss -lt
# All established TCP connections
ss -tan state established

Useful to confirm whether a service is actually listening on a port, or to see rough connection counts.

`ip` and `ping`

ping -c 4 example.com

If users report a service is “down”, check:

  1. Is the service process running? (ps, systemctl status)
  2. Is it listening on the expected port? (ss -lt)
  3. Is the network path OK? (ping, traceroute tools from the networking chapter)

Monitoring Services

Details on systemd and daemons are in another chapter, but monitoring often starts with:

systemctl status SERVICE_NAME

This tells you:

Common pattern during troubleshooting:

  1. Use ss -lt to see if the port is listening.
  2. If not, run systemctl status name.service.
  3. If failed, inspect logs with journalctl -u name.service.

Historical vs Real‑Time Monitoring

Everything so far has been local and mostly real‑time. For larger environments you also need:

These are usually provided by dedicated monitoring systems (e.g. Prometheus, Grafana, Zabbix, Nagios, etc.). Setting them up is beyond this beginner chapter, but understanding local tools prepares you to interpret the data they collect.

Basic Monitoring Workflows

When “the server is slow”

A simple step‑by‑step checklist:

  1. Check overall load and uptime
   uptime
  1. Check CPU and process usage
   top
  1. Check memory and swap
   free -h
   vmstat 1
  1. Check disk space
   df -h
  1. Check disk I/O
   iostat -x 1
  1. Check network
   ss -lt
   ping -c 4 some_important_host

This simple sequence already covers many common problems.

When a specific service has issues

  1. Is the process running?
   ps aux | grep SERVICE_NAME
  1. Is systemd reporting problems?
   systemctl status SERVICE_NAME
  1. Is the port open?
   ss -lt | grep PORT
  1. Are there log errors?
    Use journalctl -u SERVICE_NAME or check the relevant log file in /var/log (covered in the logs chapter).

Building Good Monitoring Habits

These fundamentals form the base for the more specialized monitoring chapters that follow.

Views: 25

Comments

Please login to add a comment.

Don't have an account? Register now!