Table of Contents
Why Monitoring Services Is Different from Monitoring Processes
System monitoring in general focuses on CPU, memory, disk, and overall process activity. Monitoring services is more specific:
- A service is usually a long-running background program managed by a service manager (often
systemd). - It has a name (like
sshd,nginx,postgresql) and a defined lifecycle (start, stop, restart). - It might be critical to the system (e.g.,
sshdon a server) and must be kept available.
Monitoring services usually means answering questions like:
- Is the service running?
- Is it healthy and responding correctly?
- Did it crash or restart recently?
- Has it been enabled or disabled?
- Is it using abnormal amounts of resources?
This chapter focuses on practical ways to answer these questions and set up basic checks and alerts.
Checking Service Status with systemd
Most modern distributions use systemd. For system services, systemctl is your primary tool.
Basic status checks
To check if a service is active:
systemctl status ssh
systemctl status nginx
systemctl status cronKey fields to pay attention to:
Loaded: whether the unit file is found and whether it’s enabled.Active: one ofactive,inactive,failed,activating,deactivating.Main PID: the primary process for the service.Tasks,Memory,CPU: lightweight resource usage overview (on newer systemd versions).- Recent log entries for quick troubleshooting.
For a quick one-line status:
systemctl is-active ssh
systemctl is-enabled sshis-activereturnsactive,inactive,failed, etc.is-enabledreturnsenabled,disabled,static,masked.
These are useful in scripts where you only care about a simple answer.
Listing and filtering services
To see all loaded services:
systemctl list-units --type=serviceTo see failed services only:
systemctl --failed --type=serviceTo filter by name:
systemctl list-units --type=service | grep sshThis helps spot services that have crashed or failed to start.
Monitoring Service Logs for Problems
Service health issues often show up first in logs. With systemd, use journalctl to view them.
Viewing logs for a specific service
journalctl -u ssh.service
journalctl -u nginx.serviceUseful options:
-f— follow logs in real time:
journalctl -u ssh.service -f--sinceand--until— filter by time:
journalctl -u nginx.service --since "1 hour ago"When monitoring, look for:
- Repeated restarts
- Authentication failures (for services like
sshd) - Bind errors or port conflicts
- Configuration-related errors
Spotting frequent restarts
Use systemctl for a quick overview:
systemctl status nginxLook at:
Active:line (e.g.,active (running)vsfailed)- A pattern like
Start request repeated too quicklyin the log snippets
Or use journalctl to search for restart messages:
journalctl -u nginx.service | grep -i "start request"Frequent restarts may indicate a crash loop, misconfiguration, or missing dependencies.
Simple Command-Line Health Checks
Checking that a service process is running does not guarantee it’s healthy. Basic service-level checks often involve talking to the service over the network or via its command interface.
Checking network services (HTTP, SSH, etc.)
For services that listen on TCP ports:
- Use
ssorss -tulpn(detailed in networking chapters) to confirm they’re listening. - Use tools like
curl,nc, ortelnetfor functional checks.
Examples:
# Check simple HTTP response
curl -I http://localhost
# Check HTTPS (ignoring certificate issues)
curl -kI https://localhost
# Test if a TCP port is reachable (e.g., SSH on port 22)
nc -zv localhost 22Interpretation:
curlreturning HTTP 200/301/302 usually indicates the web service is responding.nc -zvsuccess means the port is open, but not necessarily that the service is fully healthy.
Using service-specific status commands
Some services ship their own status or health commands. Examples:
apachectl status(withmod_status)nginx -t(configuration test)mysqladmin pingredis-cli ping
These often provide more accurate health information than just checking the process.
Resource Usage of Services
A service might be “running” but misbehaving due to resource issues (high CPU, memory leaks, etc.). You can connect process-level monitoring tools to specific services.
Using top/htop with service names
Start top:
top
Then filter by command name (e.g., sshd, nginx, postgres). With htop, you can:
- Press
/to search by process name. - Add columns for
COMMAND,USER,CPU%,MEM%.
This helps you watch how much CPU/memory a service is consuming over time.
Linking systemd services to their processes
To see the main PID and children of a service:
systemctl status apache2Or use:
systemd-cglsThis shows a tree of control groups, letting you see which processes belong to which service.
On some distributions, systemd-cgtop gives a live view of resource usage by service:
systemd-cgtopYou’ll see CPU and memory consumed per unit (service), useful for spotting resource hogs.
Automatic Restarts and Watchdogs
Monitoring often goes hand-in-hand with automatic recovery. systemd can be configured to restart services and act as a basic watchdog.
systemd service restart options
Within a service’s unit file (typically in /usr/lib/systemd/system or /etc/systemd/system), you may see options like:
[Service]
Restart=on-failure
RestartSec=5s
StartLimitIntervalSec=60
StartLimitBurst=5Key directives:
Restart=— conditions under whichsystemdrestarts the service:no: never restarton-success,on-failure,on-abnormal,always, etc.RestartSec=— delay before restart.StartLimitIntervalSec=andStartLimitBurst=— limit how many restarts are allowed in a time window (to avoid endless crash loops).
For an existing service, you can check these settings with:
systemctl cat nginx.serviceWhile configuration details belong in service-management chapters, from a monitoring perspective, you must understand whether a failed service will be automatically restarted or not.
systemd watchdogs
Some services support systemd’s watchdog mechanism:
- The service periodically notifies
systemdthat it is alive. - If it stops sending these pings,
systemdtreats it as hung and can restart it.
From a monitoring perspective, services with watchdogs can detect hangs, not just crashes.
Simple Scripting for Service Monitoring
For small systems, you might build basic checks using shell scripts and cron, before moving to full monitoring suites.
Checking service status in a script
A very simple example:
#!/bin/bash
SERVICE="nginx"
if ! systemctl is-active --quiet "$SERVICE"; then
echo "$(date): $SERVICE is not running!" >> /var/log/service-monitor.log
# Optional: try to restart
systemctl start "$SERVICE"
fiKey ideas:
- Use
systemctl is-active --quietto just rely on the exit status (0 = active). - Log to a file when something goes wrong.
- Optionally restart the service (only if you’re sure this is safe).
Running checks periodically with cron
You can schedule the script using the system crontab or user crontab. For example:
sudo crontab -eAdd:
*/5 * * * * /usr/local/bin/check-nginx.shThis runs the script every 5 minutes. For more advanced scheduling and logging, see automation and cron chapters.
Integrating with Monitoring Systems
Larger environments usually rely on dedicated monitoring tools. While setup details belong elsewhere, it’s important to understand what they typically check for each service.
Common types of service checks
Monitoring systems (Nagios, Icinga, Zabbix, Prometheus-based stacks, etc.) often perform:
- Availability checks:
- Is the TCP port open?
- Is the
systemdunit active? - Functionality checks:
- Does a web response contain expected content?
- Does a database accept a simple query?
- Performance checks:
- Response time for a request.
- Number of active clients, queue sizes.
- Reliability checks:
- Number of restarts in a time window.
- Error rates in logs.
From a service-monitoring point of view, you’ll often:
- Expose service metrics (HTTP endpoints, status pages, etc.).
- Configure monitoring agents to collect metrics and statuses.
- Set thresholds and alerts (e.g., “nginx down for 2 minutes” or “500 errors > 5% of requests”).
Using check scripts as plugins
Many monitoring tools allow you to register custom scripts that return:
- Exit status:
0= OK1= WARNING2= CRITICAL- A single line of text with details.
For example:
#!/bin/bash
if systemctl is-active --quiet nginx; then
echo "OK - nginx is running"
exit 0
else
echo "CRITICAL - nginx is not running"
exit 2
fi
This bridges simple systemctl checks with a full monitoring and alerting system.
Practical Service Monitoring Checklist
For each important service on a system, ensure you can answer:
- Is it running?
systemctl status <service>- Does it start on boot?
systemctl is-enabled <service>- Is it healthy and responding?
- Service-specific checks (
curl,mysqladmin ping, etc.) - Are there frequent errors or restarts?
journalctl -u <service>systemctl --failed --type=service- Is resource usage reasonable?
top/htopsystemd-cgtopor similar- What happens when it fails?
- Does
systemdrestart it? - Do you have logging and alerts configured?
Focusing on these points gives you an effective, practical approach to monitoring running services, even before deploying more advanced monitoring stacks.