5.2.3 Database backup strategies

Table of Contents

Understanding Database Backup Strategies

Database backup strategies for Linux servers must balance data safety, performance, storage cost, and recovery speed. In production you do not just “take a backup,” you design a process that guarantees you can restore the exact data you need, within a known time, with minimal data loss.

This chapter focuses on strategy and planning. Concrete commands and database specific tools belong to other chapters about particular database systems.

Backup Objectives and Constraints

Every backup strategy starts from two key objectives: how much data you can afford to lose, and how long you can afford to be down.

Recovery Point Objective, usually written as RPO, defines how much data loss is acceptable, measured as time. If your RPO is 15 minutes, your strategy must ensure that at most 15 minutes of data changes can be lost in a disaster.

Recovery Time Objective, usually written as RTO, defines how long it may take to restore service after failure. If your RTO is 1 hour, your backup plan must allow you to go from failure to a working system within 60 minutes.

RPO answers “How far back in time can we afford to go?”
RTO answers “How long can we afford to stay down?”

In practice you translate business expectations like “we cannot lose more than a few minutes of orders” or “we can tolerate half a day of outage” into explicit RPO and RTO numbers. These numbers guide how frequently you back up, which methods you use, and where you store backups.

You must also consider constraints such as database size, write rate, hardware capacity, maintenance windows, bandwidth if you use offsite storage, and legal or compliance requirements.

Types of Database Backups

At a conceptual level, database backups usually fall into three main categories. The tools and commands differ by database engine, but the roles these types play in your strategy are consistent.

A full backup is a complete copy of the database at a point in time. You use full backups as a base for any recovery plan. They are simple to reason about but can be large and slow to create and restore for big databases.

An incremental backup contains only changes since a previous backup. The reference point is usually the last backup of any type. Incrementals are smaller and quicker to create, but you must apply a chain of backups to restore, which can increase recovery time and complexity.

A differential backup contains all changes since the last full backup. Each new differential grows larger until the next full backup. Recovery usually needs only the latest full backup and the latest differential. This reduces recovery complexity compared to long incremental chains, at the cost of larger periodic differentials.

In transactional databases you also have log based backups. The database writes all changes to a transaction log or write ahead log. By archiving these logs, you can replay changes on top of a base backup. This enables point in time recovery and supports very small RPOs, often down to seconds.

Typical strategy uses a full backup as a base, then either differential or incremental backups, optionally combined with continuous transaction log archiving for point in time recovery.

In file level snapshots and filesystem level backups, you copy the database files directly with filesystem tools. This approach is only valid when taken consistently with the database in a safe state, which often requires snapshot integration or specific database options.

Logical vs Physical Backups

Logical backups export the database contents as SQL statements or another logical format, independent of the database’s on disk layout. Tools that dump schema and data to text or portable formats are logical. These backups are usually smaller for heavily sparse or compressible data, and they can be restored on different versions or different systems of the same database engine. They are also easy to inspect and migrate. However, they can be slow to create and even slower to restore, especially for large, write heavy databases.

Physical backups copy the actual data files, indexes, and sometimes transaction logs at the storage level. They are usually much faster to create and restore, particularly for large databases. They preserve low level details such as page layout and index structure. However, they are bound to the database version, storage architecture, and often the same or very similar environment. They are harder to use for migrations.

A robust strategy often uses both. Logical backups are ideal for long term archival, verification of schema, and migrations. Physical backups are ideal for fast recovery of large production systems.

Cold, Hot, and Warm Backups

The state of the database during backup creation is another critical design aspect.

A cold backup is taken while the database is shut down. The files are static and consistent, so you can copy them with standard file tools. This is conceptually simple and safe, but you incur downtime equal to the backup duration, which is often unacceptable for production systems.

A hot backup is taken while the database remains online and processing queries. This requires database features that ensure consistency of the snapshot despite concurrent writes. Hot backups allow continuous availability but may add load or performance impact during the backup window.

A warm backup is taken from a standby or replica server. The primary database continues to serve traffic without backup overhead, while the standby is used as the backup source. This pattern is very common in high availability architectures, because it isolates backup costs and risks from the primary system.

For modern production systems you usually avoid cold backups, except for small or non critical databases, and rely mostly on hot or warm backups integrated with replication.

Point in Time Recovery

Point in time recovery, often written as PITR, means restoring the database not only to the time of the last backup, but to an arbitrary timestamp between backups. This is essential protection against logical errors such as accidental deletes, bad schema migrations, or buggy application code, which might not be noticed immediately.

PITR usually relies on a combination of base backups and continuous transaction log archiving. The typical procedure is to restore from the latest base backup from before the incident, then replay logs up to the target time. The effective RPO in this model is roughly the log shipping or archiving interval.

You must plan where the archived logs are stored, how long they are kept, and how they are included in disaster recovery procedures. Log volumes can be large, so integrate log retention with your overall backup retention schedule.

To support point in time recovery, you must keep a consistent chain of base backups and all transaction logs from that base backup up to your desired recovery point.

Backup Schedules and Retention Policies

A strategy is not complete until you define when backups run and how long you keep them. You want enough history to cover most scenarios, but not so much that storage costs or complexity become unmanageable.

A classic pattern is often called the grandfather–father–son rotation, which can be described without hyphenated terms as a multi tier retention schedule. For example, you might keep daily backups for the last 7 days, weekly backups for the last 4 weeks, and monthly backups for the last 12 months. Older backups are deleted or archived to slower storage.

You can combine this pattern with full and differential or incremental backups. One common arrangement is to take a full backup weekly, differential backups daily, and combine that with transaction log backups every few minutes for PITR within the last week. As data grows, you might adjust this to full backups less often, with more frequent differential or incremental backups.

Retention policies must also account for compliance and legal requirements such as data protection regulations and sector specific laws. Some environments require that backups older than a certain age be encrypted, located within a specific region, or destroyed permanently.

Document your schedule and retention rules so that operators know exactly which backups will exist at any time, and which ones will be automatically pruned.

Local, Remote, and Offsite Backups

Where you store backups is just as important as how you take them. A copy on the same server as the database does not qualify as real protection against hardware failure, disk corruption, theft, or datacenter disasters.

A common framework uses the so called 3–2–1 pattern, which can be described like this.

Maintain at least 3 copies of your data, stored on at least 2 different types of media or systems, with at least 1 copy stored offsite.

In practice this means you might keep one backup on local attached storage for fast restores, another on a separate backup server or network attached storage, and a third copy in remote object storage or another datacenter. The local copy serves fast recovery from common incidents. The remote or offsite copy protects against catastrophic events such as fire or major outages.

Bandwidth constraints can be significant. For large backups, you may need to compress data, use deduplication, and schedule transfers during low traffic periods. Incremental or differential backups are especially important when sending backups over the network.

Consistency and Quiescing Applications

A valid database backup must be internally consistent. Partial writes or concurrent modifications during backup can lead to corruption or incorrect state. Database engines usually provide mechanisms to guarantee consistency while still allowing hot backups, such as specialized backup modes, read locks, or snapshot integration.

From a strategy perspective, you must identify how your chosen database signals a consistent snapshot and ensure that every backup operation uses that mechanism. This is separate from filesystem snapshots, which only capture the state of the underlying files. For data safety, filesystem level snapshots must be aligned with database level consistency guarantees, often by coordinating snapshot creation with the database.

If other components rely on the database, such as application caches or search indexes, you may need to decide whether they are reconstructed after restore or also included in backup plans. In many designs, the database is the source of truth, and other components are rebuilt from it.

Automation of Backups on Linux

Manual backups are error prone and do not scale. On Linux servers, you normally automate backups with scripts, scheduled jobs, and system services. The details of scripting and scheduling belong to other chapters, but here the focus is on how automation fits into strategy.

You typically wrap the database specific backup tool in a script that handles configuration, naming of backup files, compression, encryption, logging, and notification. This script may also manage retention, for example by removing old backups after copying them offsite.

Scheduling can be achieved with cron or systemd timers. For each database you define time windows when load is lower, coordinate with maintenance periods, and align with other jobs such as log rotation. The schedule should be consistent with your RPO and RTO requirements. If your RPO is 15 minutes, it is not sufficient to run backups once per day.

Automation also includes monitoring. Your scripts should exit with clear status codes and log useful information. Integration with your monitoring system is essential so that failed backups generate alerts.

Encryption and Access Control

Backups contain complete copies of your data. From a security perspective, they are as sensitive as, or sometimes more sensitive than, the live database. A secure strategy protects backups both at rest and in transit.

At rest, you can encrypt backup files or use encrypted storage volumes. For file level encryption, you might use tools that integrate with your backup process in Linux. At the database level, some engines offer built in backup encryption options.

In transit, you should protect data with secure transport such as SSH, TLS enabled protocols, or secure object storage endpoints. Unencrypted transfers over plain protocols are unacceptable for most production databases.

Key management is critical. If you lose encryption keys, backups become useless. If keys are stored next to the backups without protection, encryption provides little value. Ideally you integrate with a key management system or secrets management tool, restrict access, and document key rotation procedures.

Access control policies must define who can read or restore backups. On Linux servers, that means user and group permissions, sudo policies, restricted service accounts, and careful separation of duties. Logs of backup access and restore operations are important for auditing.

Testing Restores and Validating Backups

A backup that has never been restored is an assumption, not a guarantee. A complete strategy always includes regular restore tests on non production environments.

You should periodically select recent backups, restore them to a test server, and verify that the database starts, integrity checks pass, and the application can connect and read expected data. This exercise validates that your backups are complete, consistent, and compatible with the current software versions.

You should also rehearse disaster recovery scenarios. For example, simulate loss of the primary database server and walk through the full restoration process, including fetching offsite backups, setting up a replacement server, and updating application configuration. Measure the actual time required, and compare it against your RTO.

Verification can also include checksum validation during backup and restore. Many backup tools or storage systems can compute and compare checksums so that you detect corruption early.

A backup strategy is only reliable if you routinely test restores and verify that you can meet your RPO and RTO in real conditions.

Integrating Backups with Replication and High Availability

When using database replication or high availability setups, backups must be designed to complement, not replace, these mechanisms.

Replication, such as streaming replicas or clustered databases, primarily protects against server failure and improves availability. It does not protect against logical errors that are replicated everywhere, such as a query that deletes important records. You still need backups for those cases.

A common pattern is to take backups from a replica server to offload overhead from the primary. However, you must ensure that the replica is healthy and not lagging too far behind. If replication delay is high, backups from the replica may lag significantly behind the primary’s state, which affects your effective RPO.

Clustered or shared storage systems have their own failure modes. A misconfigured operation might delete or corrupt data on all nodes simultaneously. Backups must exist outside the cluster to provide independent recovery options.

Your backup documentation should precisely describe how replication and backups interact, including which node performs backups, which node is restored first in a disaster, and how you resynchronize replicas after a restore.

Choosing a Strategy for Different Use Cases

Different workloads and environments need different strategies. There is no single approach that suits every database.

For a small, low traffic database for internal tools, daily full logical backups stored on a backup server may be entirely sufficient, combined with weekly offsite copies. RPO and RTO can be relatively relaxed, and simplicity is more important than optimization.

For a medium sized production application, you might choose weekly full physical backups, daily differential backups, and transaction log backups every 5 or 15 minutes, with backups written first to local storage and then synchronized to encrypted object storage in another region. Restore tests can run monthly on a staging environment.

For a high volume transactional system, you might require continuous log shipping, frequent physical base backups, backups from a read replica, and multiple offsite destinations. RPO and RTO might be in the range of a few minutes, which forces you to engineer detailed documented procedures and automation around every step.

Across all these cases, the central idea remains the same. Start from explicit recovery objectives, classify which backup types and locations you need to meet them, protect backups with security and redundancy, and continuously validate that your plan works through testing and monitoring.

Comments

Please login to add a comment.

Don't have an account? Register now!