Table of Contents
Understanding Evidence in Linux Incidents
In a Linux incident, collecting evidence is the act of preserving information that can tell you what happened, how it happened, and who or what was responsible. The goal is not only to fix the system, but also to keep a trustworthy record that can be analyzed later, shared with others, and potentially used in formal investigations.
This chapter focuses on what is unique about evidence collection on Linux systems. Analysis, response strategy, and long term remediation are covered elsewhere. Here you learn what to collect, how to preserve it, and how to avoid destroying useful traces while you work.
Evidence collection must always aim to preserve original data in an unmodified state, and clearly separate raw evidence from your analysis.
Principles of Evidence Preservation
Before touching a compromised Linux system, you need a mental checklist of what you must not do. Simple actions, like logging in or running familiar commands, can overwrite key evidence. Evidence collection on Linux is constrained by how quickly system state changes, especially in memory and in log files.
The most important principle is to minimize changes to the system while you gather data. Every action you perform creates new processes, writes new logs, and modifies access times on files. This does not mean you must be completely passive, but it means you should think about the cost of each command. For example, an interactive text editor might alter configuration file timestamps, while a read only viewing tool has less impact.
On running systems, volatile evidence in RAM, network connections, and running processes will disappear at shutdown or reboot. Non volatile evidence in filesystems, disks, and logs can persist longer, but can still be rotated, overwritten, or deleted. Therefore you generally try to collect volatile evidence first, then less volatile data, while keeping your tools as minimally invasive as you can.
If the system is extremely critical, or suspected of hosting dangerous malware, it can make sense to isolate it from the network instead of shutting it down. Disconnecting network interfaces or applying firewall rules on surrounding devices can stop further damage but keep the live system state available for collection.
Chain of Custody and Documentation
Evidence is less useful if you cannot prove where it came from and how it was handled. Even if you are not involved in a legal case, good documentation makes your own analysis more reliable and repeatable. On Linux, this usually means simple, consistent records and clear separation between copies of data and your working files.
You should record who is performing the collection, where they are collecting from, and when each step occurs. A simple log file or text document on a trusted workstation is enough for many environments. For each collected artifact, you want to note the hostname, IP addresses, time, and the tool and command used to gather it.
Often you will create cryptographic hashes of collected files or images. A hash like SHA 256 gives a fixed length fingerprint of a file. If the hash remains the same every time you re compute it, you have strong evidence that the file was not altered.
For a file $F$, once you compute its hash $H = \text{SHA256}(F)$, you must store $H$ in a trusted location. Any later change to $F$ will produce a different $H$.
These hashes are created using Linux tools such as sha256sum and should be saved along with timestamps and filenames. This process does not alter the evidence itself but gives you a reference that can be used to confirm integrity later.
Volatile vs Non Volatile Evidence on Linux
Linux systems hold different kinds of evidence that change at different speeds. It is useful to categorize them so you can decide what to grab first during an incident.
Volatile evidence resides mostly in RAM and in current runtime structures. On Linux, this includes the list of running processes, open network connections, contents of /proc, in memory logs and buffers, and process memory. If the system loses power, or you reboot it, these pieces vanish or change dramatically. Therefore volatile data is collected as soon as possible on a live system.
Non volatile evidence lives on persistent storage such as disks, solid state drives, and removable media. On Linux, this includes filesystem contents, log files under /var/log, configuration files under /etc, user data in /home, and the on disk structures of the filesystem itself. Non volatile data is more durable but still not permanent, because logs can rotate and attackers can modify or delete data.
The order of collection usually starts with volatile data on a live system, then moves to disk images, filesystem copies, and specific log files. If you do not have the ability to capture memory, you should still record as much runtime state as you can with commands that display processes, open files, and connections, while recognizing that those commands also leave a footprint.
Capturing Live System State
On Linux, the first layer of evidence from a running system is its immediate state. This includes which processes are running, how they were started, and what resources they are using. Many familiar administrative tools such as ps, top, or ss become evidence collection tools when their output is redirected into files and stored externally.
To reduce your impact, you typically collect this state using simple shell pipelines that send output to text files, then copy those files off the system. For example, commands that list all processes, running services, and kernel messages can give you a snapshot that you cannot reconstruct after the fact. When collecting such data, you should also include timestamps and hostname information in the file contents or filenames.
The /proc filesystem is especially important in Linux for live state evidence. It provides pseudo files that represent current kernel and process information. Examining entries in /proc/<pid>/ can reveal command line arguments, environment variables, and network sockets of specific processes at the time you access them. However, /proc reflects the present, not the past, so any delay between detection and collection can cause important processes to exit and disappear.
Timing is critical. If you suspect that malicious activity is happening at that moment, you must prioritize commands that reveal process lists, listening ports, and current user sessions, even if you plan to return later for more detailed examination. Keeping your command history and terminal output can itself become part of the evidence, since it shows exactly what you observed and when.
Memory Acquisition on Linux
Capturing RAM content is one of the most powerful forms of evidence collection, but also among the most invasive and technically demanding. On Linux, memory acquisition often requires loading specialized kernel modules or using pre installed tools that can access physical memory. Because of this, you should plan ahead and have approved tools ready in your environment before an incident.
A full memory image can contain running processes, encryption keys, passwords, in memory logs, and remnants of data that never touch disk. At the same time, the act of capturing memory can modify the very state you are trying to preserve, since any tool that reads memory must run code on the system. In practice, you accept a controlled amount of change in order to freeze the majority of the memory content.
Different distributions and kernel versions restrict direct access to memory devices like /dev/mem and /dev/kmem, especially on 64 bit systems with modern protections. That is why memory acquisition usually uses loaders and kernel drivers that bypass these limitations in a supported way. You must verify that the tool versions match your kernel and architecture to avoid crashes or incomplete captures.
When you obtain a memory dump, you treat it like any other evidence file. You compute strong hashes, store them securely, and only work on copies. Because memory images are large, you should consider storage capacity on your collection system and on any external media you use. Moving these images across networks can create privacy and security concerns, so secure transfer methods and encryption are important in real deployments.
Network Related Evidence
During the early stages of an incident, network activity often provides the clearest indicators of malicious behavior. On a Linux host, you have two main types of network evidence: snapshots of current connections on the host, and traffic captures that record packets over time.
Current connection information is available through commands and pseudo files that show open sockets and listening ports. These outputs can be redirected to files to preserve a moment in time. Combined with process listings, they allow you to tie a specific PID and executable to a connection, which is critical when you investigate outbound traffic to suspicious addresses.
Traffic captures are more detailed, but they must be started before or during the malicious activity to be useful. On a Linux system, this usually involves capturing at the interface level and writing packets to a file in formats that analysis tools understand. Because full packet capture can create large files quickly, and may include sensitive user data, organizations often restrict how and when it can be used. For forensics, it is essential to document capture start and end times and the interfaces or filters used.
Network devices outside the Linux host, such as firewalls and routers, also hold evidence, but accessing those is outside the scope of the host specific tasks in this chapter. From the perspective of a single Linux machine, what matters most is that you preserve how the system was communicating at the time of compromise, through connection tables, routing information, and packet traces if available.
Disk and Filesystem Imaging
Disk imaging is a core technique in Linux evidence collection. It involves creating a complete, bit for bit copy of a physical device or a partition, so that you can analyze it later without touching the original. Because Linux exposes devices as files under /dev, you can use standard tools to perform this copying at a low level.
There is a crucial difference between copying files from a mounted filesystem and imaging the underlying block device. A file level copy operates through the filesystem layer, and may skip deleted or hidden data. A block level image includes used and unused space, which makes it possible to recover deleted files and inspect metadata that ordinary file copies do not preserve.
Creating a disk image usually requires that the device be unmounted to avoid changes during the copy. In some scenarios, especially on live servers, it may not be feasible to unmount or shut down, and then you must weigh the value of a live image against the risk that data changes during acquisition. Forensic practice prefers images from devices that are not being written to, but real incidents sometimes force compromises.
On Linux, you should always store the image on a separate device and avoid writing back to the original. Once you create the image, you compute hashes for the complete file to verify integrity. Later analysis is performed against read only copies or filesystems mounted with the ro option so that the image stays unchanged. If you use specialist forensic image formats that support metadata and compression, you still follow the same principle of hashing and preserving the original file or set of files.
Collecting Filesystem and Log Evidence
Beyond full disk images, there are specific directories and files on Linux systems that are particularly rich sources of evidence. Rather than exploring them deeply at this stage, you focus on capturing them intact so that later analysis can proceed without needing to return to the compromised host.
Log files in /var/log are central to this effort. Depending on the distribution and configuration, you may find system logs, authentication logs, service specific logs, and rotated archives in compressed formats. If the volume of logs is large, you might choose to copy only a time window around the suspected incident, but when possible, preserving the entire log directory is safer, especially if attackers have already tried to erase traces selectively.
Configuration files under /etc reveal how services were set up at the time of compromise. Copying these files, including permissions and ownership, helps reconstruct the environment exactly as it was. For example, cron jobs, web server configurations, and SSH settings often live there and can show how an attacker gained or maintained access.
User data directories under /home are also important, since attackers might drop scripts, backdoors, or tools inside user owned paths. Capturing entire user home directories can be space intensive, so you may prioritize specific users associated with suspicious processes or authentication events. You should preserve not only file contents, but also metadata such as timestamps and permission bits, because these attributes can tell you when and how a file was created or modified.
Working from a Trusted Environment
One of the biggest challenges in Linux evidence collection is the possibility that the compromised system itself cannot be trusted. Attackers may have modified binaries, libraries, or even the kernel, which makes any command you run on that system less reliable. To mitigate this, you often collect evidence either from a separate, trusted host, or by booting the machine into an alternate environment that you control.
A trusted environment can be a dedicated forensic workstation or a live Linux system on removable media. When you boot a machine from a read only live image, you bypass the possibly compromised operating system on its disks and gain more confidence in your tools. From this environment, you can mount the suspect disks read only and create images or copies without executing code from the compromised installation.
Remote collection from another system over the network introduces its own risks, but allows you to reduce your dependence on local binaries. You must secure the transport channel and take into account that your connections may still alter logs and process state on the remote host. In either case, you should aim to use tools that you have validated ahead of time, and avoid installing new software through untrusted package sources on the compromised host.
On modern Linux systems, it is common to use small static binaries for crucial tasks, stored on write protected media. This approach reduces the chance that shared libraries or environment variables on the suspect system can interfere with your evidence collection. By controlling both the tools and the environment in which they run, you improve the reliability of the data you gather.
Maintaining an Evidence Repository
Once evidence leaves the compromised Linux system, it needs a controlled home. An evidence repository is not just a file share, but a structured space where images, logs, memory captures, and documentation are stored, cataloged, and protected. This is especially important in multi incident environments, where many cases share the same storage and tools.
Every file brought into the repository should be accompanied by its recorded hashes, timestamps, and origin. Directory names, filenames, and internal tracking documents should include case identifiers, system names, and relevant dates. This straightforward organization enables future analysts to locate the right artifacts without ambiguity and reduces the risk of mixing data from different investigations.
Access control is also part of evidence handling. On a Linux based repository server, you manage permissions and group memberships so that only authorized people can read or modify the collected artifacts. Backups of the repository must preserve both the data and its integrity information, and you must be careful that backup tools do not alter metadata in a way that breaks the chain of custody.
Even if you work alone, keeping a disciplined repository helps you return to old incidents and re analyze them with new tools or knowledge. Evidence that is properly stored, verified, and documented remains valuable long after the incident is closed, and can inform detection rules, hardening measures, and response playbooks in the future.