3.5.5 Monitoring running services

Why Monitoring Services Is Different from Monitoring Processes

System monitoring in general focuses on CPU, memory, disk, and overall process activity. Monitoring services is more specific:

A service is usually a long-running background program managed by a service manager (often systemd).
It has a name (like sshd, nginx, postgresql) and a defined lifecycle (start, stop, restart).
It might be critical to the system (e.g., sshd on a server) and must be kept available.

Monitoring services usually means answering questions like:

Is the service running?
Is it healthy and responding correctly?
Did it crash or restart recently?
Has it been enabled or disabled?
Is it using abnormal amounts of resources?

This chapter focuses on practical ways to answer these questions and set up basic checks and alerts.

Checking Service Status with systemd

Most modern distributions use systemd. For system services, systemctl is your primary tool.

Basic status checks

To check if a service is active:

bash

systemctl status ssh
systemctl status nginx
systemctl status cron

Key fields to pay attention to:

Loaded: whether the unit file is found and whether it’s enabled.
Active: one of active, inactive, failed, activating, deactivating.
Main PID: the primary process for the service.
Tasks, Memory, CPU: lightweight resource usage overview (on newer systemd versions).
Recent log entries for quick troubleshooting.

For a quick one-line status:

bash

systemctl is-active ssh
systemctl is-enabled ssh

is-active returns active, inactive, failed, etc.
is-enabled returns enabled, disabled, static, masked.

These are useful in scripts where you only care about a simple answer.

Listing and filtering services

To see all loaded services:

bash

systemctl list-units --type=service

To see failed services only:

bash

systemctl --failed --type=service

To filter by name:

bash

systemctl list-units --type=service | grep ssh

This helps spot services that have crashed or failed to start.

Monitoring Service Logs for Problems

Service health issues often show up first in logs. With systemd, use journalctl to view them.

Viewing logs for a specific service

bash

journalctl -u ssh.service
journalctl -u nginx.service

Useful options:

-f — follow logs in real time:

bash

  journalctl -u ssh.service -f

--since and --until — filter by time:

bash

  journalctl -u nginx.service --since "1 hour ago"

When monitoring, look for:

Repeated restarts
Authentication failures (for services like sshd)
Bind errors or port conflicts
Configuration-related errors

Spotting frequent restarts

Use systemctl for a quick overview:

bash

systemctl status nginx

Look at:

Active: line (e.g., active (running) vs failed)
A pattern like Start request repeated too quickly in the log snippets

Or use journalctl to search for restart messages:

bash

journalctl -u nginx.service | grep -i "start request"

Frequent restarts may indicate a crash loop, misconfiguration, or missing dependencies.

Simple Command-Line Health Checks

Checking that a service process is running does not guarantee it’s healthy. Basic service-level checks often involve talking to the service over the network or via its command interface.

Checking network services (HTTP, SSH, etc.)

For services that listen on TCP ports:

Use ss or ss -tulpn (detailed in networking chapters) to confirm they’re listening.
Use tools like curl, nc, or telnet for functional checks.

Examples:

bash

# Check simple HTTP response
curl -I http://localhost
# Check HTTPS (ignoring certificate issues)
curl -kI https://localhost
# Test if a TCP port is reachable (e.g., SSH on port 22)
nc -zv localhost 22

Interpretation:

curl returning HTTP 200/301/302 usually indicates the web service is responding.
nc -zv success means the port is open, but not necessarily that the service is fully healthy.

Using service-specific status commands

Some services ship their own status or health commands. Examples:

apachectl status (with mod_status)
nginx -t (configuration test)
mysqladmin ping
redis-cli ping

These often provide more accurate health information than just checking the process.

Resource Usage of Services

A service might be “running” but misbehaving due to resource issues (high CPU, memory leaks, etc.). You can connect process-level monitoring tools to specific services.

Using top/htop with service names

Start top:

bash

top

Then filter by command name (e.g., sshd, nginx, postgres). With htop, you can:

Press / to search by process name.
Add columns for COMMAND, USER, CPU%, MEM%.

This helps you watch how much CPU/memory a service is consuming over time.

Linking systemd services to their processes

To see the main PID and children of a service:

bash

systemctl status apache2

Or use:

bash

systemd-cgls

This shows a tree of control groups, letting you see which processes belong to which service.

On some distributions, systemd-cgtop gives a live view of resource usage by service:

bash

systemd-cgtop

You’ll see CPU and memory consumed per unit (service), useful for spotting resource hogs.

Automatic Restarts and Watchdogs

Monitoring often goes hand-in-hand with automatic recovery. systemd can be configured to restart services and act as a basic watchdog.

systemd service restart options

Within a service’s unit file (typically in /usr/lib/systemd/system or /etc/systemd/system), you may see options like:

ini

[Service]
Restart=on-failure
RestartSec=5s
StartLimitIntervalSec=60
StartLimitBurst=5

Key directives:

Restart= — conditions under which systemd restarts the service:

no: never restart
on-success, on-failure, on-abnormal, always, etc.

RestartSec= — delay before restart.
StartLimitIntervalSec= and StartLimitBurst= — limit how many restarts are allowed in a time window (to avoid endless crash loops).

For an existing service, you can check these settings with:

bash

systemctl cat nginx.service

While configuration details belong in service-management chapters, from a monitoring perspective, you must understand whether a failed service will be automatically restarted or not.

systemd watchdogs

Some services support systemd’s watchdog mechanism:

The service periodically notifies systemd that it is alive.
If it stops sending these pings, systemd treats it as hung and can restart it.

From a monitoring perspective, services with watchdogs can detect hangs, not just crashes.

Simple Scripting for Service Monitoring

For small systems, you might build basic checks using shell scripts and cron, before moving to full monitoring suites.

Checking service status in a script

A very simple example:

#!/bin/bash
SERVICE="nginx"
if ! systemctl is-active --quiet "$SERVICE"; then
    echo "$(date): $SERVICE is not running!" >> /var/log/service-monitor.log
    # Optional: try to restart
    systemctl start "$SERVICE"
fi

Key ideas:

Use systemctl is-active --quiet to just rely on the exit status (0 = active).
Log to a file when something goes wrong.
Optionally restart the service (only if you’re sure this is safe).

Running checks periodically with cron

You can schedule the script using the system crontab or user crontab. For example:

bash

sudo crontab -e

Add:

cron

*/5 * * * * /usr/local/bin/check-nginx.sh

This runs the script every 5 minutes. For more advanced scheduling and logging, see automation and cron chapters.

Integrating with Monitoring Systems

Larger environments usually rely on dedicated monitoring tools. While setup details belong elsewhere, it’s important to understand what they typically check for each service.

Common types of service checks

Monitoring systems (Nagios, Icinga, Zabbix, Prometheus-based stacks, etc.) often perform:

Availability checks:

Is the TCP port open?
Is the systemd unit active?

Functionality checks:

Does a web response contain expected content?
Does a database accept a simple query?

Performance checks:

Response time for a request.
Number of active clients, queue sizes.

Reliability checks:

Number of restarts in a time window.
Error rates in logs.

From a service-monitoring point of view, you’ll often:

Expose service metrics (HTTP endpoints, status pages, etc.).
Configure monitoring agents to collect metrics and statuses.
Set thresholds and alerts (e.g., “nginx down for 2 minutes” or “500 errors > 5% of requests”).

Using check scripts as plugins

Many monitoring tools allow you to register custom scripts that return:

Exit status:

0 = OK
1 = WARNING
2 = CRITICAL

A single line of text with details.

For example:

#!/bin/bash
if systemctl is-active --quiet nginx; then
    echo "OK - nginx is running"
    exit 0
else
    echo "CRITICAL - nginx is not running"
    exit 2
fi

This bridges simple systemctl checks with a full monitoring and alerting system.

Practical Service Monitoring Checklist

For each important service on a system, ensure you can answer:

Is it running?

systemctl status <service>

Does it start on boot?

systemctl is-enabled <service>

Is it healthy and responding?

Service-specific checks (curl, mysqladmin ping, etc.)

Are there frequent errors or restarts?

journalctl -u <service>
systemctl --failed --type=service

Is resource usage reasonable?

top/htop
systemd-cgtop or similar

What happens when it fails?

Does systemd restart it?
Do you have logging and alerts configured?

Focusing on these points gives you an effective, practical approach to monitoring running services, even before deploying more advanced monitoring stacks.

Comments

Please login to add a comment.

Don't have an account? Register now!

3.5.5 Monitoring running services

Why Monitoring Services Is Different from Monitoring Processes

Checking Service Status with systemd

Basic status checks

Listing and filtering services

Monitoring Service Logs for Problems

Viewing logs for a specific service

Spotting frequent restarts

Simple Command-Line Health Checks

Checking network services (HTTP, SSH, etc.)

Using service-specific status commands

Resource Usage of Services

Using top/htop with service names

Linking systemd services to their processes

Automatic Restarts and Watchdogs

systemd service restart options

systemd watchdogs

Simple Scripting for Service Monitoring

Checking service status in a script

Running checks periodically with cron

Integrating with Monitoring Systems

Common types of service checks

Using check scripts as plugins

Practical Service Monitoring Checklist

Comments

Where to Move