3.6.1 Backup strategies

Table of Contents

Understanding Backup Strategies

Designing a backup strategy means deciding what to back up, how often to do it, where to store it, and how to get it back when needed. In Linux system administration this is not only a technical question but also a planning problem that balances time, storage, and risk.

A backup strategy is a policy, not a single command. Tools such as rsync, tar, or snapshot systems belong in other chapters. Here the focus is on planning and concepts that guide how you use those tools.

A backup strategy is only valid if restores are tested regularly.
If you cannot restore, you do not have a backup.

Goals of a Backup Strategy

Before choosing any method you need to define what you are trying to protect and what kinds of failure you care about. For a Linux system this usually includes your data, your system configuration, and sometimes the entire operating system installation.

Two key ideas are often used to describe backup goals:

Recovery Point Objective, written as $RPO$
Recovery Time Objective, written as $RTO$

$RPO$ is about how much data loss is acceptable, measured as time. For example, with a daily backup your $RPO$ is roughly 24 hours, which means you may lose nearly one day of new changes if the system fails just before the next backup runs.

$RTO$ is about how quickly you need to be up and running again after a failure. A backup that takes several hours to restore has a larger $RTO$ than one that lets you switch over in minutes.

You choose backup frequency and style so that:

$$\text{Actual RPO} \le \text{Required RPO}$$

and

$$\text{Actual RTO} \le \text{Required RTO}$$

The more strict your $RPO$ and $RTO$, the more often you must back up and the more carefully you must plan where and how you store backups.

What to Back Up

For many Linux systems not everything needs the same level of protection.

User data is usually the most important. This lives typically in /home, application data directories in /var, and databases that store information critical to your use case.

System configuration is next in importance. These are the files that describe how the system is set up, usually in /etc and sometimes in application specific locations such as /var/lib for some services. Losing configuration makes recovery slower because you must reconfigure your services.

The operating system itself is often less important, especially for servers that can be rebuilt from installation media or configuration management tools. In that case your strategy might skip some system binaries in /usr and focus instead on data and configuration. For desktops or complex machines you might decide to include the whole system so you can restore everything to an earlier state without reinstalling.

A clear backup strategy explicitly names which paths are in scope and which are out of scope. It also defines exclusions, such as caches under /var/cache or temporary files under /tmp, that are not worth backing up.

Full, Incremental, and Differential Backups

Most backup strategies are built from three basic types of backup: full, incremental, and differential. The choice among them strongly affects backup time, restore time, and storage usage.

A full backup is a complete copy of all selected data at the time the backup runs. It is simple to manage and simple to restore. To recover, you restore the most recent full backup and you are done. The downside is that full backups are large and often slow to create, which can waste storage and backup windows.

An incremental backup stores only the changes that occurred since the last backup of any type. For example, you might perform a full backup on Sunday, and then each day from Monday to Saturday you run an incremental backup that saves only files that changed since the previous day.

To restore from incremental backups, you usually need the last full backup plus every incremental backup taken after that full backup. Recovery may be slower, but the backups themselves are smaller and faster to create.

A differential backup stores the changes since the last full backup only. With a full backup on Sunday, a differential on Monday backs up changes since Sunday. On Tuesday another differential still backs up all changes since Sunday, so the differential grows over the week. To restore, you need the full backup plus only the latest differential.

In practice, strategies often mix these types. For example, you might use full backups weekly with daily incrementals or full backups weekly with daily differentials. The trade off is between higher storage and network usage for full or differential backups and longer restore chains for incremental backups.

Scheduling and Frequency

Once you know what to back up and which types to use, you need a schedule. Schedules define how often each backup type runs. They are chosen to keep the actual $RPO$ and $RTO$ within acceptable limits while respecting resource usage such as disk space and network bandwidth.

A common pattern is a weekly full backup combined with daily incremental backups. In practice this could mean a full backup every Sunday night, with incrementals every night from Monday to Saturday. The benefit is smaller daily backups yet relatively short restore chains, limited to one week.

Another example is a monthly full backup, weekly differential backups, and daily incrementals. This pattern lowers the cost of full backups but still keeps restore times manageable, at the cost of more complex planning.

For systems that change rarely, lower frequency backups may be enough. For active databases or log servers where data is constantly changing, you may need hourly backups or continuous mechanisms such as streaming database logs, which aim to bring $RPO$ close to zero.

You should also consider when backups run. On busy servers you may schedule them at night or during low activity periods to reduce performance impact. On personal systems you might schedule backups when the machine is usually on but lightly used. Some strategies include a grace period before shutdown or reboot to trigger a quick backup when needed.

Versioning and Retention Policies

Backups often include multiple historical versions of files so that you can recover from problems that started earlier, such as accidental deletion or corruption. The strategy for keeping old backups is called a retention policy.

A simple retention policy might keep the last 7 daily backups and the last 4 weekly backups. A more advanced policy might keep daily backups for one week, weekly backups for a month, and monthly backups for a year.

Retention policies usually try to balance the need for history with the cost of storing many copies. Older backups may be pruned automatically. Some tools perform deduplication, which reduces storage usage by storing identical blocks only once, even across many backup versions.

Many retention schemes follow a pattern often called the grandfather father son model. In this model, son backups are frequent and short lived, such as daily incrementals. Father backups are medium term, such as weekly full or differential backups. Grandfather backups are longer term, such as monthly full backups that are kept for many months or years. Each layer provides a different timescale of recovery.

Your retention rules should be explicit. They should answer the questions: how far back can we restore, how much storage will that require, and when are old backups deleted. In some organizations, retention is also constrained by legal or compliance requirements that dictate how long you must keep certain data.

Local, Offsite, and Offline Backups

A reliable backup strategy does not rely on a single location. You should distinguish between local backups, offsite backups, and offline backups, because each behaves differently when something goes wrong.

Local backups are stored on devices physically close to the original system, such as an external disk connected by USB or another disk inside the same machine. These backups are quick and useful for day to day recovery but they are vulnerable to hardware loss, theft, or local disasters like fire or power surges.

Offsite backups are stored in a different physical location. This can mean another building, a remote server, or a cloud storage service. The key requirement is that an event that destroys the original machine and its local backups should not affect the offsite storage. Many strategies use network transfers, such as rsync over SSH or specialized backup tools, to send backups to remote locations.

Offline backups are not directly accessible to the system during normal operation. Examples include external drives that are unplugged and stored separately, removable media, or immutable storage that cannot be changed without a deliberate manual step. Offline backups are important for protecting against threats like ransomware or accidental deletion that might affect all online copies.

A resilient strategy often combines these. For example, it might keep daily local backups for quick restores, plus weekly offsite copies for disaster situations, with some of those offsite backups also being offline or immutable to resist tampering.

The 3‑2‑1 Rule Adapted for Linux Systems

A widely used guideline for backup strategies is known as the 3‑2‑1 rule. It can be expressed as a simple formula:

$$\text{Copies of data} \ge 3, \quad \text{Backup media} \ge 2, \quad \text{Offsite copies} \ge 1$$

In words, you keep at least three copies of your data, across at least two different kinds of storage media or locations, with at least one copy stored offsite.

For a Linux system this might mean the primary data on the machine, a local backup on an external disk, and a remote backup on a server in another building or cloud environment. The different media could be an internal disk and a network attached storage device, or a mixture of spinning hard drives and solid state drives, or cloud object storage plus on premises disks.

Following the 3‑2‑1 rule significantly increases resilience against hardware failure, user error, and site level disasters.
If all copies are in one place or on one type of device, a single incident can destroy them all.

While 3‑2‑1 is a guideline, not a strict law, it helps to evaluate whether a proposed strategy is robust. You can extend the idea for higher risk environments, for example by adding more offsite copies or using multiple independent cloud providers.

Automating Backups

Manual backups are easy to forget and difficult to perform consistently. A practical backup strategy for Linux relies heavily on automation. Once you have chosen what, when, and where, you must make sure it happens without daily human intervention.

Automation usually starts with scripts or backup tools configured to run on a schedule. On Linux there are standard scheduling mechanisms, such as cron and systemd timers, covered elsewhere in the course. Your strategy should specify that backups are launched automatically by such a scheduler.

Automation also involves logging and notification. Backups should write logs, and the strategy should require regular review of these logs or automatic alerting when backups fail. Silent failures can leave you with gaps in your history without noticing until a restore is needed.

In addition, automation should include pruning according to your retention policy. Otherwise your backup storage will eventually fill up and backups will start to fail or stop, which breaks your strategy. Many backup tools support automatic pruning, but the key point is that your plan needs to define how and when older backups are removed.

Finally, automation should be documented. A good backup strategy describes not just the schedule and destination but also the scripts, tools, and configuration files that implement it. This documentation is part of your ability to recover if you lose the original system.

Protecting Backup Integrity and Security

Backups are valuable assets and also sensitive. They contain the same data as the original system, sometimes more, and they may be kept for a long time. A complete backup strategy must address integrity and security.

Integrity means that backups are not silently corrupted. Many tools can compute checksums for backup data and verify them during backups and restores. Your strategy might require periodic verification jobs that scan older backups to detect any bit rot or storage errors. If you use filesystem snapshots or advanced storage, you may also use scrubbing features in those systems, but the core idea is the same: backups must remain readable and correct.

Security means controlling who can read and modify backups. This often includes encrypting backups, especially when they are stored offsite or in the cloud. Encryption protects data in case the backup storage is lost or stolen. Access controls such as file permissions and network authentication protect against unauthorized access on backup servers.

However, security can conflict with convenience. A realistic strategy must decide how encryption keys or passwords are stored. If the keys live only on the original machine and that machine is destroyed, you may lose the ability to decrypt backups. Many administrators handle this by keeping keys in a secure but separate location and by documenting the recovery steps.

You also need to think about how backups interact with system permissions. For example, if you run backups as root, they can capture everything but may become a powerful target. If you use non root accounts, you may miss some files. The strategy should define the chosen approach and its implications.

Testing Restores and Practicing Recovery

The final and most important part of any backup strategy is restore testing. Backups are created for one purpose, to restore data and systems when necessary. A strategy that does not regularly test restores is incomplete.

Restore tests can be simple or complex. A simple test might restore a single directory from a recent backup into a temporary location and compare the contents with expectations. A more comprehensive test might involve restoring a full system into a virtual machine and confirming that services start correctly.

Your strategy should define how often restore tests are done, which parts of the system are tested, and how results are recorded. For example, you might test a file level restore every month and a full system restore every quarter. In more critical environments, tests may be more frequent.

Testing also reveals gaps in documentation. During a restore you will discover which steps are missing or unclear, such as where to find encryption keys, how to reconfigure network settings on a restored system, or how to reattach a restored database to its applications. These discoveries should feed back into improving both your backup process and your written recovery procedures.

Backups without verified restore procedures provide a false sense of security.
A tested restore plan is as important as the backups themselves.

By combining clear goals, appropriate backup types, thoughtful scheduling, layered storage locations, automation, security, and regular restore practice, you create a backup strategy that can protect Linux systems from many kinds of failure and make recovery a predictable, controlled process instead of a panic.

Comments

Please login to add a comment.

Don't have an account? Register now!