5.2.3 Database backup strategies

Table of Contents

Why Database Backups Are Different

Backing up databases is not the same as copying regular files:

Databases are:

Always changing (writes in progress)
Structured (tables, indexes, constraints)
Often large and performance‑sensitive

A “good” backup must be:

Consistent: represents the database at a single point in time
Restorable: you can actually bring it back and start the service
Tested: regularly verified, including recovery time

For Linux database servers, your backup strategy must align with:

Data criticality (RPO – how much data you can lose)
Service availability (RTO – how long you can be down)
DB engine (MySQL/MariaDB vs PostgreSQL vs others)
Storage and network capabilities

This chapter focuses on strategy and patterns, not engine‑specific commands.

Core Concepts: RPO, RTO, and Backup Types

RPO and RTO for Databases

RPO (Recovery Point Objective)
Maximum tolerated data loss, measured in time.

RPO = 24h → daily backups are enough
RPO = 5 min → you need continuous or near‑continuous replication/log shipping

RTO (Recovery Time Objective)
Maximum tolerated downtime to restore service.

RTO = hours → full restore from cold storage might be fine
RTO = minutes → need fast local copies, replicas, or pre‑provisioned standby

Every database backup strategy is a trade‑off between RPO, RTO, and cost.

Backup Types (Logical vs Physical)

Logical backups

Export data at SQL or logical object level (e.g. mysqldump, pg_dump)
Pros:

Portable between versions/instances
Human‑readable, easy to subset (per database/table)

Cons:

Slower for large DBs
Restore can be slow and CPU‑intensive
May require consistent snapshot logic

Physical backups

Copy database files or pages at storage level (e.g. xtrabackup, pg_basebackup, filesystem snapshots)
Pros:

Fast for large databases
Keeps indexes, reduces restore time

Cons:

Tied to same DB engine and often same major version
Usually not portable across platforms
Must ensure consistency with engine’s write‑ahead logs / redo logs

Most serious strategies use both (e.g. logical for portability, physical for speed).

Backup Granularity

Full backup: complete copy of DB data at a point in time
Incremental backup: only changes since last backup of any type
Differential backup: changes since last full backup
Log‑based backup: capture transaction/WAL/binlog segments

Typical patterns:

Weekly full + daily differential
Weekly full + hourly incremental
Periodic full + continuous log shipping

Consistency and Backup Modes

Hot, Warm, and Cold Backups

Cold backups

Database is shut down; files copied while DB is offline
Simple, always consistent
Requires downtime → usually unacceptable for production

Warm backups

Limited writes (read‑only mode, single‑user, or quiesced)
Lower risk of inconsistency, less downtime than cold
Still requires some disruption

Hot backups

Database stays online and writable during backup
Requires engine support (hot backup tools, WAL/binlog handling, snapshots)
Usually necessary for 24/7 services

Your approach must match your uptime requirements.

Transaction Consistency

A backup is transaction‑consistent if it includes:

All effects of committed transactions before a certain point
None of the effects of transactions that committed after that point
No partial transactions

How this is usually achieved:

Using built‑in backup tools that:

Use consistent snapshots
Coordinate with WAL/binlogs

Temporarily applying:

FLUSH TABLES WITH READ LOCK (MySQL family) or
pg_start_backup / pg_stop_backup or similar for PostgreSQL

Using filesystem or storage snapshots at a defined checkpoint

For mission‑critical data, always design for transaction‑consistent backups, not just “files copied without errors”.

Strategy Patterns for Different Scale Levels

Small / Simple Deployments

Typical characteristics:

Single DB server
Databases up to a few GBs
Moderate traffic

Suggested strategy:

Daily full logical backup per database
More frequent logical backups of the most important DBs (e.g. every 4h)
Store backups on:

Local disk (short retention, fast restore)
Remote storage (long retention, disaster recovery)

Periodic test restorations into a non‑production instance

When it works well:

When RPO in hours is acceptable
When RTO of tens of minutes to a few hours is okay

Medium Deployments

Typical characteristics:

DB size in tens–hundreds of GBs
Continuous usage, higher write load
Possibly master‑replica setup

Suggested strategy:

Weekly physical full backup (e.g. engine‑specific backup or snapshot)
Daily logical or differential backup
Continuous or frequent log backups (WAL/binlogs/redo logs) to meet stricter RPO
Use a replica:

To offload backup load from primary
As an additional recovery option

Maintain:

On‑server recent backups (fast restore)
Off‑server/off‑region backups (disaster recovery)

Large / Critical Deployments

Typical characteristics:

Hundreds of GBs to TBs+
24/7 high‑traffic systems
Strict SLAs

Suggested strategy:

Continuous log shipping or streaming replication to a standby site
Frequent physical base backups (e.g. nightly) integrated with log shipping
Multiple backup tiers:

Local fast recovery copies (e.g. snapshots kept for hours/days)
Nearline object storage (e.g. S3‑like) for weeks/months
Cold archival (glacier‑type, tape) for long‑term compliance

Automated verification:

Regularly restore into validation environments
Run health checks and consistency checks on restored DB

You should treat backup & restore as part of the application architecture, not as an afterthought.

Engine‑Aware Strategy Decisions

Without diving into command details, you must account for:

MySQL/MariaDB

Transactional vs non‑transactional tables
Binary logs for point‑in‑time recovery
Tools that support hot backup and incremental copies
Replication topologies (e.g. backup from replica)

PostgreSQL

WAL (Write‑Ahead Log) for point‑in‑time recovery
Base backups + WAL archiving
Logical vs physical replication and how they complement backups

For each engine, decide:

How you will take base backups (logical or physical, from primary or replica)
How you will capture ongoing changes (logs, replication)
How you will do recovery:

Full restore only, or
Restore + roll forward to a chosen point in time

Storage and Location Strategy

3‑2‑1 Rule Adapted to Databases

A common guideline:

3 copies of your data
2 different storage media
1 off‑site

For databases:

Copy 1: live database data
Copy 2: backup on local or same‑site storage (fast restore)
Copy 3: backup in another location/region (disaster recovery)

Different media can mean:

Local SSD/HDD
Network storage (NFS, SAN)
Object storage (S3‑compatible, cloud bucket)
Tape or long‑term archive

On‑Server vs Off‑Server Backups

On‑server backups

Pros: fast, simple, quick restore
Cons: lost together with server (disk failure, compromise, ransomware)

Off‑server backups

Pros: survive local failures and many security incidents
Cons: require network bandwidth, more complex encryption and access control

Good practice: maintain both, with different retention policies.

Encryption and Access Control

Database backups often include full datasets, so they are extremely sensitive.

Key points:

Encrypt backups at rest:

Disk encryption (LUKS) for local storage
Client‑side or server‑side encryption for object storage

Encrypt in transit:

Use TLS when sending backups to remote locations

Restrict access:

Separate backup accounts and service users
Principle of least privilege on backup repositories

Key management:

Store encryption keys separately from backup storage
Consider centralized secret management

Scheduling, Automation, and Retention

Scheduling Backups

Use Linux scheduling tools (cron, systemd timers, or external orchestrators) to:

Run:

Full backups at low‑traffic times
Incremental/log backups more frequently

Throttle backup load or run with lower IO priority (ionice, nice) to reduce impact
Stagger backup jobs across multiple databases/servers

Design for:

Predictable windows for heavy operations (full backup, compaction)
Avoiding overlapping maintenance tasks that can overload storage/CPU

Retention Policies

Define how long you keep each class of backup:

Operational (fast access, short retention)

e.g. daily full + hourly incrementals kept for 7–14 days

Compliance / audit (long retention, slower access)

e.g. monthly snapshots kept for 1–7 years

Practical considerations:

Regulatory requirements
Storage cost vs restore usefulness
Legal hold vs routine clean‑up

Automate deletion of old backups so disks/buckets don’t quietly fill.

Backup Catalogs and Metadata

Backups need metadata to be usable:

What DB/server and version
Date/time and time zone
Backup type (full, incremental, log)
Dependencies (e.g. “incremental #3 needs full backup from date X”)

You can:

Maintain a backup catalog (database or structured files)
Embed metadata in backup filenames and manifest files
Use backup systems that track chains automatically

Point‑in‑Time Recovery (PITR) Strategies

PITR lets you restore to just before a corruption or destructive query.

High‑level pattern:

Take periodic base backups
Continuously archive write‑ahead logs / binlogs / redo logs
When needed:

Restore the last good base backup
Replay logs up to a chosen timestamp or log position
Stop replay just before the unwanted change

Strategy considerations:

How much log history you keep (affects maximum PITR window)
How you identify the recovery target:

Timestamp of incident
Transaction ID / log sequence number
Known good application event

Where logs are stored:

Local and/or off‑site
With integrity checks and immutability where possible

Using Replication and Snapshots as Part of Backup Strategy

Replication ≠ Backup

Replication (e.g. primary–replica, streaming replication):

Helps with:

Read scaling
High availability
Offloading backups

Does not protect against:

Application bugs (bad DELETEs replicate instantly)
Logical corruption replicated to all nodes
Human error replicated across cluster

Replicas are complements, not replacements, for backups.

Backups from Replicas

Advantages:

Reduces load and distraction on primary
Still gives consistent backups if done correctly

Considerations:

Ensure replica is:

In good health and not lagging too much
Configured to allow consistent backups

Don’t assume “if replica exists, I’m safe” → still need:

Independent backups
Off‑site copies

Filesystem and Storage Snapshots

For databases that support hot backup modes, filesystem/storage snapshots can be powerful:

LVM snapshots, ZFS snapshots, Btrfs snapshots
SAN or cloud storage snapshots

Strategy cautions:

Ensure snapshot is DB‑aware:

Coordinate with DB to flush and quiesce where necessary
Or integrate with engine’s backup API

Don’t rely solely on snapshot metadata → periodically restore a snapshot to verify it boots and DB starts cleanly
Plan snapshot frequency vs performance and space overhead

Testing Restores and Validating Backups

Regular Recovery Drills

A backup you’ve never restored is not truly a backup.

Integrate into strategy:

Scheduled test restores:

Restore to a separate test server or container
Verify:

DB starts
Application can connect and perform basic queries

Document:

Steps executed
Time taken (RTO measurement)
Any issues encountered

Integrity and Consistency Checks

Validation methods:

Check backup files:

Checksums (e.g. SHA‑256) on backup archives
Storage system integrity tools

After restore:

Run DB‑native consistency checks
Run a subset of application tests or health checks

For log‑based chains:

Verify you have a contiguous sequence of logs from base backup to target

Include validation in automation pipelines where possible.

Special Scenarios and Anti‑Patterns

Handling Schema Changes and Migrations

When planning schema migrations:

Take a backup immediately before applying major changes
For “dangerous” operations (massive deletes/updates), consider:

Short‑interval snapshot or PITR marker
Ability to roll back via PITR rather than only forward migrations

Incorporate backup checks into deployment pipelines (DevOps/CI)

Anti‑Patterns to Avoid

Common pitfalls:

Only keeping backups on the same server as the database
Never testing restores
Relying exclusively on replicas as “backups”
Single huge monolithic backup with no:

PITR
Granularity (per database/tenant)

No documentation → only one person knows how to restore
Manual processes with no automation → error‑prone, often forgotten

Designing a Backup Strategy: A Practical Checklist

When you design or review a strategy for a Linux database server, answer:

Requirements

What are RPO and RTO?
Regulatory or compliance retention requirements?

Backup Types

Which full, incremental/differential, and log backups do you use?
Logical, physical, or both?

Consistency

How do you guarantee transaction‑consistent backups?
Are you using hot, warm, or cold backups?

Location and Redundancy

Where are backups stored (local, remote, off‑site)?
Do you meet the 3‑2‑1 principle or an equivalent?

Security

Are backups encrypted at rest and in transit?
Who can access them; how are keys managed?

Automation

How are backups scheduled and monitored?
Are failed backups alerted and logged?

Retention

What is kept for how long, and where?
How are old backups pruned?

Testing

How often do you run restore drills?
Do you measure and document actual RTO/RPO achieved?

Using these questions as a framework helps ensure your database backup strategies are deliberate, auditable, and aligned with business needs—not just a collection of ad‑hoc scripts.

Comments

Please login to add a comment.

Don't have an account? Register now!

5.2.3 Database backup strategies

Why Database Backups Are Different

Core Concepts: RPO, RTO, and Backup Types

RPO and RTO for Databases

Backup Types (Logical vs Physical)

Backup Granularity

Consistency and Backup Modes

Hot, Warm, and Cold Backups

Transaction Consistency

Strategy Patterns for Different Scale Levels

Small / Simple Deployments

Medium Deployments

Large / Critical Deployments

Engine‑Aware Strategy Decisions

Storage and Location Strategy

3‑2‑1 Rule Adapted to Databases

On‑Server vs Off‑Server Backups

Encryption and Access Control

Scheduling, Automation, and Retention

Scheduling Backups

Retention Policies

Backup Catalogs and Metadata

Point‑in‑Time Recovery (PITR) Strategies

Using Replication and Snapshots as Part of Backup Strategy

Replication ≠ Backup

Backups from Replicas

Filesystem and Storage Snapshots

Testing Restores and Validating Backups

Regular Recovery Drills

Integrity and Consistency Checks

Special Scenarios and Anti‑Patterns

Handling Schema Changes and Migrations

Anti‑Patterns to Avoid

Designing a Backup Strategy: A Practical Checklist

Comments

Where to Move