7.2.2 Log and file recovery

Table of Contents

Introduction

Log and file recovery is a central part of Linux forensics and incident response. When something goes wrong, you rarely have a perfect record of what happened. Logs may be rotated, deleted, or tampered with. Files may be removed, partially overwritten, or hidden. This chapter focuses on how to recover and reconstruct logs and files as evidence, without reexplaining the broader incident response process or generic evidence collection, which are covered elsewhere.

Forensic Mindset for Recovery

In log and file recovery, you must balance the need to gain information with the need to preserve evidence. Your goal is to reconstruct past activity as faithfully as possible, while avoiding actions that destroy or alter remaining traces.

You should prefer working on copies of data, not on the live system. Whenever possible, work from a disk image, a copy of log directories, or snapshots. If you must operate live, restrict yourself to read only access or commands that touch memory only, so that you reduce the impact on the disk.

Always preserve original data and work on verified copies whenever possible. Never perform recovery experiments directly on the only copy of evidence.

Understanding What Can Be Recovered

Before attempting recovery, you should understand what is realistic. Deletion in a filesystem usually removes references to the data, not the underlying bytes, until they are overwritten. Logs may be rotated, compressed, moved to backup, or forwarded to remote systems. Recovery often means following these traces.

On a typical Linux system, you may be able to recover:

Locally stored logs that were rotated, compressed, or archived.
Log fragments left in memory dumps, shell histories, or application caches.
Deleted but not yet overwritten files at the filesystem or block level.
Indirect evidence of missing logs or files, such as gaps in sequences or references in other logs.

Your methods will depend heavily on the filesystem, logging system, and any prior backup or snapshot infrastructure.

Reconstructing Logs from Local Sources

The primary place to look for logs is the regular logging path, most commonly /var/log. For forensic recovery you focus not just on present files, but also on older rotations, compressed archives, and any unusual locations.

On systemd based systems, journald is a key source. The binary journal may contain past entries even when text logs in /var/log are incomplete. Reading older entries with journalctl can expose historical system and service activity.

It is common for logs to be rotated with tools such as logrotate. In such a case, older logs appear with incremental suffixes, dates, or both, and may be compressed with gzip or xz. You can decompress and review these older copies to fill gaps.

When applications manage their own logs, they may store them in application specific directories under /var/log, /opt, or even in user home directories. Investigating configuration files in /etc can reveal custom log paths and rotation rules.

Never assume that a missing log file means the events never existed. Always search for rotated, compressed, or moved versions, and confirm log rotation policies in configuration files.

Recovering Logs from systemd Journals

Where systemd is in use, the system journal often preserves logs beyond what you see in plain text files. The journal is usually stored under /var/log/journal or /run/log/journal and holds binary data that journalctl can query by time, unit, or priority.

You can reconstruct timelines by asking for all messages around a suspected incident window. If current on disk journal files seem truncated, you can still look for rotated journal files in the same directory. These files may retain older entries that are no longer visible in everyday commands.

If you have acquired an image of a compromised system, you can copy its journal directory to a safe environment and inspect it there. journalctl supports reading from specific journal files, which lets you avoid altering the original media.

Recovering Logs from Remote and Secondary Sources

If local logs are missing or obviously tampered with, remote logging can save an investigation. Many environments forward logs from Linux hosts to a central syslog, SIEM, or log server. These systems often store multiple copies, apply retention policies, and keep integrity checks.

Recovering logs from such systems means correlating hostnames, IP addresses, and timestamps. You may have partial local logs and a complete remote set. By combining them, you can often detect exactly which entries were removed locally and when.

Some services also emit logs over network protocols or append to databases rather than text files. Web applications, intrusion detection systems, and firewalls may store records in SQL or NoSQL databases. When reconstructing activity, you should compare database backed logs with local system logs to detect discrepancies.

Detecting Log Tampering and Gaps

Before and during recovery, you must assess whether logs were altered. Common signs include inconsistent time gaps, log files that end abruptly at a suspicious moment, or entries that reappear out of chronological order.

When logs are rotated by standard tools, time gaps follow a predictable pattern and filenames match configured formats. Manual deletion or overwriting usually breaks this pattern. For instance, you might find that an older rotation file ends later than the current primary log, or that log ownership and permissions changed unexpectedly.

Other logs can expose tampering. For example, file access and modification times in the filesystem metadata may show recent changes to log files that do not align with their recorded entries. When audit frameworks are active, such as Linux Audit or MAC systems, they may record attempts to alter or delete logs.

Treat sudden gaps, abnormal ownership, or unexpected truncation of log files as strong indicators of tampering, and search for corroborating evidence in other logs and metadata.

Log Recovery from Backups and Snapshots

Backups and snapshots are often the most reliable log recovery sources. If your environment uses filesystem snapshots, such as with Btrfs, ZFS, or LVM snapshots, you can mount an older snapshot and extract logs from a past point in time. This can reveal the state of the system before or during an incident.

Traditional backups, whether file based or image based, serve a similar role. If logs were removed recently, an older backup may contain complete versions. You can restore only the relevant log directories to a forensic workspace and compare them to the current state.

Backups also help evaluate when tampering began. By examining multiple points in time you can pinpoint the first backup that shows missing or altered logs, which helps narrow the incident timeline.

Principles of File Deletion and Recoverability

File recovery depends strongly on how deletion is implemented by the underlying filesystem. On many filesystems, a delete operation marks the file as unreferenced but does not immediately overwrite its content. The data remains on disk until the filesystem allocator reuses those blocks.

In conceptual terms, you can view a file as a mapping from metadata to a set of blocks on disk. Deletion removes the mapping but may leave the blocks untouched. Recovery aims to find those leftover blocks and rebuild the file content.

If we call $B$ the set of blocks formerly used by a file and $M$ the metadata record, then after deletion the system has lost the pair $(M, B)$, but $B$ may still exist on disk. Recovery attempts to recreate at least an approximation of this pair.

Over time, newly written files reuse blocks from $B$. Each reuse overwrites part of the deleted content. So the probability that a file can be fully recovered decreases as more data is written after deletion.

The more write activity occurs on a device after deletion, the less likely it is that you can recover deleted files. Avoid mounting suspect filesystems read write if you plan to attempt recovery.

Recovering Files from the Filesystem Layer

Many filesystems provide tools or characteristics that help with file recovery. For example, some support undelete tools, or leave traces of directory entries and inodes in metadata structures. While details differ across filesystems, the general approach is similar.

You begin by working on a copy of the storage. For block devices, that means acquiring a bit level image and protecting the original. Then you use forensic tools that understand the filesystem to examine unallocated space, directory records, and journal segments where past metadata operations are stored.

Certain filesystems include an internal journal recording changes to metadata and sometimes data. Even if a file was deleted, references to its name, location, or content may remain in the journal until it is overwritten. By parsing that journal, you can recover file names and potentially content fragments.

On systems with snapshot capable filesystems, snapshots are effectively frozen views of past filesystem states. Even if a user deleted a file in the current view, an older snapshot may still contain it intact. Mounting the snapshot read only lets you copy the file without traditional undelete operations.

Raw Data Recovery at the Block Level

When filesystem aware methods fail or when metadata is severely damaged, you can attempt raw recovery at the block level. This technique treats the storage as a flat array of bytes and searches for recognizable patterns, often called file carving.

File carving tools scan an image looking for signatures at the start or end of common file types. For instance, JPEG files have characteristic headers and footers. Once a tool finds a header, it follows the data until it encounters an expected ending sequence or reaches inconsistent data. The recovered content is then written to a separate file.

This approach does not depend on filesystem metadata. It works even on partially damaged or unmounted filesystems, as long as the raw bytes exist. However, you lose original file names, directory paths, and exact modification times. The recovered files typically receive generic names based on offsets or sequence.

You can help carving by restricting the search to relevant regions of the image or specific file types that matter for your investigation. For example, if you are interested in document exfiltration, you may focus on office document signatures rather than all possible formats.

Recovering Partially Overwritten Files

Sometimes only part of a file has been overwritten. In these cases, you may be able to recover fragments that are still meaningful. For instance, a partially overwritten log file might still preserve entries from the beginning, especially if the truncation was implemented by rewriting from the start without zeroing the rest.

If you can recover contiguous segments, you can often use context to interpret gaps. For logs, timestamps and sequence numbers help interpolate missing ranges. For structured files, such as databases or compressed archives, you may be able to extract intact records even when the file is technically corrupt.

In some scenarios, application specific tools can salvage data from damaged files. For example, some database engines have built in recovery utilities that scan for valid pages or records. Combining these utilities with low level recovery gives you a better reconstruction.

Recovering Files from Caches and Temporary Areas

Many applications use temporary files, caches, and autosave mechanisms. Even if a user deletes a primary file, these associated copies might still exist. Typical locations include /tmp, user specific cache directories under $HOME, browser profiles, and office suite backup files.

Temporary directories are usually cleared at boot or periodically, but not always completely. If a system has not rebooted, you may find open temporary files that still contain critical content. Even when cleaned, unallocated space on the filesystem may retain remnants of cached data for some time.

You should inspect configuration files and documentation for the applications involved in an incident. They often specify cache locations, naming conventions, and retention rules. Understanding these allows you to search systematically instead of scanning blindly.

Log and File Reconstruction from Multiple Sources

Single source recovery rarely gives a full picture. You often need to combine partial logs, recovered file fragments, backups, and independent monitoring data. Reconstruction is an analytic process that aligns all these pieces along a common timeline.

To align events, you use timestamps, sequence IDs, and shared identifiers such as process IDs, session IDs, or IP addresses. When you have overlapping coverage, you can fill gaps in one source by referencing another. For example, a recovered proxy log might reveal external connections that no longer appear in the host firewall logs.

It is common to visualize the reconstructed timeline, even informally. Mark the origin and confidence level of each event, whether it comes from a live log, a rotated archive, a recovered fragment, or a secondary system. This helps communicate the strength of your conclusions and where uncertainty remains.

Always distinguish between directly observed events and inferred events when reconstructing timelines. Do not present interpolated activity as certain evidence.

Tracking and Documenting Chain of Custody

During recovery, you generate new artifacts such as images, extracted files, and processed copies. For forensic validity, you must track these artifacts carefully. Each time you create or transform data, record how, when, and by whom it happened.

Even for internal incident response, this discipline is valuable. Clear documentation allows later reviewers to understand which data is original, which is derived, and which tools were used. It also aids reproducibility if you or others need to rerun analyses.

While detailed chain of custody procedures belong to a broader incident response context, for log and file recovery you should, at minimum, keep checksums for key evidence files and note the commands used to create and process them.

Limitations and When Recovery Is Not Possible

Not all logs or files can be recovered. Secure deletion methods intentionally overwrite data multiple times or encrypt and discard keys so that recovery is mathematically infeasible. If a filesystem has been in heavy use for a long period after deletion, the underlying blocks are likely overwritten.

Some storage technologies, such as modern SSDs, employ wear leveling and internal garbage collection. These mechanisms can remove or remap data outside the view of the operating system, which reduces the chances of low level recovery.

You must recognize these limits and state them clearly in your findings. Overpromising or assuming that recovery is always possible leads to unrealistic expectations and can damage the credibility of your analysis.

Integrating Recovery into Incident Response

Log and file recovery does not stand alone. Its value appears when integrated with host analysis, network forensics, and broader incident timelines. As you recover information, you should immediately feed it back into your investigative hypotheses.

Newly recovered logs might reveal previously unknown accounts, IP addresses, or tools. Recovered files may show exfiltrated data, malware, or attacker scripts. These discoveries then guide further searches across other systems and datasets.

By viewing recovery as iterative rather than a single step, you can refine your efforts, focusing on the most promising sources as new evidence emerges. This approach makes log and file recovery a powerful force multiplier for the entire incident response process.

Comments

Please login to add a comment.

Don't have an account? Register now!