Table of Contents
Why Logging and Auditing Matter
Linux gives you enormous power—but with power comes responsibility. Logging and auditing are how you answer questions like:
- What happened?
- When did it happen?
- Who did it?
- From where?
- Was it allowed or blocked?
In this chapter, you’ll learn how Linux records system activity, how to read those records efficiently, and how to design a logging/auditing strategy that is useful in real-world operations, troubleshooting, and security.
Systemd’s journalctl and traditional syslog files (e.g. in /var/log) are covered in other chapters; here we look at how to think about logging holistically and how to tie logs and auditing together as an advanced skill.
Types of Events You Should Log
Before focusing on tools, decide what kinds of events you care about. Typical categories:
- Authentication & authorization
- SSH logins and failures
sudoand privilege escalations- Account lockouts, password changes
- System changes
- Service restarts/reloads
- Package installs/removals
- Kernel messages, hardware events
- Application and web access
- HTTP access logs, error logs
- API calls and failures
- Database connections and slow queries
- Security-relevant events
- Firewall blocks
- SELinux/AppArmor denials
- IDS/IPS alerts
- File integrity alerts
- Compliance/audit events
- Changes to critical configs
- Access to sensitive files
- Administrative operations (user/group changes, mount,
chmod,chown)
You won’t log everything forever; instead, decide:
- What is must-keep (for security, compliance, or forensic reasons)
- What is useful (for troubleshooting)
- What is noise (events that can be sampled or discarded)
Core Logging Architecture Concepts
Linux logging is usually built from a few core building blocks:
- Producers
Programs that generate log messages: - kernel
- daemons and services
- applications (web servers, DB servers, etc.)
- security components (firewalls, SELinux/AppArmor, IDS)
- audit subsystems (e.g.
auditd) - Collectors / routers
- systemd-journald
- syslog daemons (e.g.
rsyslog,syslog-ng) - audit daemons (
auditd) - agents for external platforms (e.g.
filebeat,fluentd,vector) - Storage
- Local log files under
/var/log - Binary journals (systemd)
- Remote log servers via syslog or custom protocols
- Central log platforms (ELK/Opensearch, Graylog, Splunk, Loki, etc.)
- Consumers
- Humans (admins, security teams)
- Alerting systems (email, PagerDuty, Slack, etc.)
- Dashboards and reports
- Automated responses (scripts, SOAR tools)
Important design principle:
Logging and auditing become powerful when you centralize and correlate logs from many sources.
Designing a Logging Strategy
Instead of simply “turning on everything”, design your logging with intent.
1. Define Goals
Common goals:
- Operations: detect failures and performance issues quickly
- Security: detect suspicious behavior, policy violations, intrusions
- Compliance: provide evidence for audits (e.g. who did what, when)
- Forensics: reconstruct timelines after an incident
Each goal implies different retention, detail level, and alerting.
2. Decide on Log Levels
Log messages typically use levels such as:
emerg/alert/crit– system unusable or critical failureserr– errors that require attentionwarning– potential issuesnotice– normal but significantinfo– routine informationdebug– very detailed, for troubleshooting
Guidelines:
- On production systems:
- Default to
infoornoticefor services. - Enable
debugonly for limited time when debugging. - On security-sensitive components:
- Use more detailed logging (but watch out for size and privacy).
3. Retention and Rotation
Logs grow quickly. You need a policy:
- Rotation (e.g. via
logrotate): - Rotate by size or time (daily/weekly).
- Compress older logs (
gzip,xz). - Limit number of rotated archives.
- Retention:
- Short (hours–days) for debug noise.
- Medium (weeks–months) for operational logs.
- Long (months–years) for security/compliance, but often on central servers or cold storage, not on each host.
Balance cost vs. usefulness: old logs that nobody can query are not helpful.
4. Centralization
For anything beyond a few servers, centralize logs:
- Use syslog over the network (
@serveror@@serverconventions) or log agents. - Ensure reliable transport (TCP, TLS) for critical logs.
- Tag logs with:
- hostname
- application
- environment (prod/test/dev)
- service name or role
Centralization enables:
- Cross-host correlation
- Single place for searching
- Simple integration with alerting and dashboards
Auditing vs Logging
Logging and auditing overlap but are not identical:
- Logging:
- Broad, mostly for troubleshooting and visibility.
- May be incomplete or high-level (“service X restarted”, “HTTP 500 error”).
- Auditing:
- Focused on security-relevant, accountability, or compliance events.
- Aimed at answering “who did what, where, and when?”.
- More structured and controlled.
On Linux, auditing typically means:
- Kernel-level auditing via the audit subsystem (handled by
auditd). - High-fidelity records about:
- syscalls (
open,execve,chown,mount, etc.) - access to specific files or directories
- changes to identities (users, groups, capabilities)
- security policy decisions (e.g. MAC denials)
You’ll usually use:
- Logs for: routine operations, debugging, and basic security monitoring.
- Audit trails for: investigations, compliance, and fine-grained trace of actions.
Common Events to Include in an Audit Policy
A typical Linux audit configuration focuses on events like:
- Identity & access
- Logins and logouts
- SSH key usage
sudoinvocations and their results- Changes to users and groups
- Privilege boundaries
- Use of
setuidbinaries - Raising capabilities
suanddoasusage- Critical file accesses
/etc/passwd,/etc/shadow/etc/sudoersand sudoers.d- Security configs (e.g. SSHD configs, PAM configs, firewall configs)
- Application secrets (keys, credentials) directories
- System configuration changes
- Changes to network configuration
- Mount/umount operations
- Kernel parameter tuning (e.g. via
/proc/sys) - Audit system integrity
- Modifications of audit rules themselves
- Attempts to disable logging or auditing
Not all systems require the same level. For a personal laptop, heavy audit rules may be overkill; for a payment processing server, they may be mandatory.
Building an End‑to‑End Logging & Auditing Workflow
This is about how logs and audits flow through your environment.
1. Ingestion
- System and app logs go to:
- journald
- syslog daemons
- application-specific log files
- Audit events go to:
auditdlogs
2. Normalization
Different sources use different formats. Normalization common steps:
- Parse timestamps into a consistent format (e.g. UTC).
- Extract fields such as:
- source IP
- username
- process name / PID
- event type (login, file access, command execution)
- Assign categories (auth, system, network, application, audit, etc.).
Central platforms and agents often provide parsers for common log formats.
3. Storage and Indexing
Design decisions:
- Hot storage (fast search, recent days/weeks).
- Warm storage (slower, longer retention).
- Cold/archive (cheap storage, rarely queried, maybe offline).
Ensure:
- Time-synchronized across servers (NTP is critical).
- Index by at least:
- time
- host
- program/service
- severity
4. Detection and Alerting
You can’t manually read all logs. Rely on:
- Static rules:
- “Alert if more than N failed SSH logins from same IP in 5 minutes”
- “Alert if
sudofails more than N times” - “Alert on any modification of
/etc/sudoers” - Thresholds:
- High 5xx rate in HTTP logs
- Sudden spike in
authfailures - Unusual rate of
rmorchmodcalls (via audit) - Baselines & anomaly detection (in advanced systems):
- Detect deviations from normal access patterns.
Set alerts for actionable conditions; too many alerts cause people to ignore them.
5. Response and Enrichment
Responses range from manual to automatic:
- Manual:
- Investigate the logs, correlate events, maybe escalate.
- Semi-automatic:
- Pre-written runbooks and scripts help respond quickly.
- Automatic:
- Block IPs based on thresholds.
- Disable accounts after suspicious events.
- Revert configuration via configuration management.
Enrichment adds value:
- Map IPs to geographic locations or internal roles.
- Link logs to asset inventory (which app/team owns this server?).
- Attach context (for example, which change/commit was deployed just before an error spike).
Logging and Auditing for Different Environments
The “right” setup depends on scale and importance.
Single Server / Small Homelab
Goals:
- Basic forensic and troubleshooting ability.
- Some security visibility.
Typical approach:
- Use default system logs and rotate with
logrotate. - Keep authentication logs and key service logs for at least a few weeks.
- Optionally enable lightweight audit rules for critical files.
- Manually inspect logs when troubleshooting or after suspicious events.
Small–Medium Organization
Goals:
- Centralize logs from multiple servers.
- Provide basic alerting and dashboards.
Typical approach:
- Choose a central log server or log platform.
- Standardize:
- log formats (where possible)
- timezone (UTC)
- host naming conventions
- Deploy minimal, standardized audit rules on all servers, with more detailed rules on critical systems.
Enterprise / Compliance-Driven
Goals:
- Detailed, reliable, and tamper-resistant audit trails.
- Strong incident response and compliance reporting.
Typical approach:
- Dedicated logging infrastructure with redundancy.
- Syslog or agents sending to SIEM or log platforms.
- Advanced audit rules on sensitive systems, carefully tuned to avoid overload.
- Immutable or write-once storage for certain logs.
- Strict access controls and monitoring for the logging system itself.
Best Practices and Common Pitfalls
Best Practices
- Time synchronization is non-negotiable
- Always run NTP/chrony.
- Use UTC in logs and dashboards.
- Protect log integrity
- Restrict access to logs (
/var/log, audit logs, central platform). - Consider:
- checksums/hashes for critical logs
- append-only flags
- shipping logs off-host as soon as possible
- Avoid logging sensitive data unnecessarily
- Do not log:
- plain-text passwords
- full credit card numbers
- secret keys and tokens
- If necessary, mask or tokenize sensitive fields.
- Test your logging and alerting regularly
- Simulate:
- failed login bursts
- service failures
- audit events (e.g. reading a critical file)
- Confirm:
- events are generated
- they reach the central system
- alerts fire as expected
- Document your log schema and rules
- Where are logs stored?
- How long are they kept?
- Which events are monitored?
- What alerts exist, and who owns them?
Common Pitfalls
- Overlogging
- Turning on verbose debug logging everywhere indefinitely.
- Consequences: high disk use, unreadable noise, and cost.
- Underlogging
- Missing key events, such as:
- failed login attempts
- changes to security-related configs
- service crashes/restarts
- Not watching the watchers
- Ignoring logs from:
- the logging infrastructure
- the audit subsystem
- If those fail or are tampered with, you may lose visibility entirely.
- Treating logs as an afterthought
- Trying to “retrofit” logging/auditing only after an incident.
- Set up at least minimal logging & auditing when deploying new services.
Using Logs and Audit Trails During an Incident
When something goes wrong—security breach, data loss, or major outage—logs and audits are your primary evidence.
A typical workflow:
- Define the timeframe
- When was the issue first noticed?
- Use alerts, user reports, or monitoring data as boundaries.
- Identify relevant sources
- Authentication logs
- Service logs
- System logs (kernel, hardware)
- Audit logs (for file and process activity)
- Build a timeline
- List events by time:
- user logins
- commands run
- configuration changes
- crashes, restarts
- Look for cause-and-effect relationships.
- Correlate across systems
- Use hostnames, IPs, user IDs, and session IDs.
- Tie together:
- web requests
- app logs
- DB queries
- OS-level audit events
- Preserve evidence
- Copy relevant logs to a safe place.
- Ensure they cannot be modified.
- Document the steps you take.
- Feed back into improvements
- After the incident, update:
- logging rules (to capture what was missing)
- alert rules (to detect earlier next time)
- documentation and training
Integrating Logging and Auditing with Other Tools
Logging and auditing become much more powerful when combined with other systems:
- Configuration management (Ansible, Puppet, etc.)
- Use to deploy standardized logging/audit configurations.
- Monitoring + alerting (Prometheus, Nagios, etc.)
- Combine metrics (CPU, errors/sec) with logs for full context.
- Security tools (IDS/IPS, vulnerability scanners)
- Feed their alerts into the central log system.
- Orchestration / containers
- Ensure containers send logs to the host or directly to your log platform.
- Understand how audit rules behave in containerized environments (namespaces, cgroups can change how events are seen).
Summary
In advanced Linux administration, logging and auditing are not optional—they are how you observe, understand, and defend your systems.
Key ideas:
- Treat logs and audits as part of system design, not an afterthought.
- Decide what to log, how long to keep it, and how to centralize it.
- Use audit trails for fine-grained accountability and compliance.
- Normalize, index, and monitor logs to support rapid troubleshooting and incident response.
- Continuously refine your logging and auditing based on real-world incidents and evolving requirements.
Subsequent sections will dive into the specifics of systemd logging, traditional /var/log files, auditd, log rotation, and creating custom logs.