Table of Contents
Why Database Backups Are Different
Backing up databases is not the same as copying regular files:
- Databases are:
- Always changing (writes in progress)
- Structured (tables, indexes, constraints)
- Often large and performance‑sensitive
- A “good” backup must be:
- Consistent: represents the database at a single point in time
- Restorable: you can actually bring it back and start the service
- Tested: regularly verified, including recovery time
For Linux database servers, your backup strategy must align with:
- Data criticality (RPO – how much data you can lose)
- Service availability (RTO – how long you can be down)
- DB engine (MySQL/MariaDB vs PostgreSQL vs others)
- Storage and network capabilities
This chapter focuses on strategy and patterns, not engine‑specific commands.
Core Concepts: RPO, RTO, and Backup Types
RPO and RTO for Databases
- RPO (Recovery Point Objective)
Maximum tolerated data loss, measured in time. - RPO = 24h → daily backups are enough
- RPO = 5 min → you need continuous or near‑continuous replication/log shipping
- RTO (Recovery Time Objective)
Maximum tolerated downtime to restore service. - RTO = hours → full restore from cold storage might be fine
- RTO = minutes → need fast local copies, replicas, or pre‑provisioned standby
Every database backup strategy is a trade‑off between RPO, RTO, and cost.
Backup Types (Logical vs Physical)
- Logical backups
- Export data at SQL or logical object level (e.g.
mysqldump,pg_dump) - Pros:
- Portable between versions/instances
- Human‑readable, easy to subset (per database/table)
- Cons:
- Slower for large DBs
- Restore can be slow and CPU‑intensive
- May require consistent snapshot logic
- Physical backups
- Copy database files or pages at storage level (e.g.
xtrabackup,pg_basebackup, filesystem snapshots) - Pros:
- Fast for large databases
- Keeps indexes, reduces restore time
- Cons:
- Tied to same DB engine and often same major version
- Usually not portable across platforms
- Must ensure consistency with engine’s write‑ahead logs / redo logs
Most serious strategies use both (e.g. logical for portability, physical for speed).
Backup Granularity
- Full backup: complete copy of DB data at a point in time
- Incremental backup: only changes since last backup of any type
- Differential backup: changes since last full backup
- Log‑based backup: capture transaction/WAL/binlog segments
Typical patterns:
- Weekly full + daily differential
- Weekly full + hourly incremental
- Periodic full + continuous log shipping
Consistency and Backup Modes
Hot, Warm, and Cold Backups
- Cold backups
- Database is shut down; files copied while DB is offline
- Simple, always consistent
- Requires downtime → usually unacceptable for production
- Warm backups
- Limited writes (read‑only mode, single‑user, or quiesced)
- Lower risk of inconsistency, less downtime than cold
- Still requires some disruption
- Hot backups
- Database stays online and writable during backup
- Requires engine support (hot backup tools, WAL/binlog handling, snapshots)
- Usually necessary for 24/7 services
Your approach must match your uptime requirements.
Transaction Consistency
A backup is transaction‑consistent if it includes:
- All effects of committed transactions before a certain point
- None of the effects of transactions that committed after that point
- No partial transactions
How this is usually achieved:
- Using built‑in backup tools that:
- Use consistent snapshots
- Coordinate with WAL/binlogs
- Temporarily applying:
FLUSH TABLES WITH READ LOCK(MySQL family) orpg_start_backup/pg_stop_backupor similar for PostgreSQL- Using filesystem or storage snapshots at a defined checkpoint
For mission‑critical data, always design for transaction‑consistent backups, not just “files copied without errors”.
Strategy Patterns for Different Scale Levels
Small / Simple Deployments
Typical characteristics:
- Single DB server
- Databases up to a few GBs
- Moderate traffic
Suggested strategy:
- Daily full logical backup per database
- More frequent logical backups of the most important DBs (e.g. every 4h)
- Store backups on:
- Local disk (short retention, fast restore)
- Remote storage (long retention, disaster recovery)
- Periodic test restorations into a non‑production instance
When it works well:
- When RPO in hours is acceptable
- When RTO of tens of minutes to a few hours is okay
Medium Deployments
Typical characteristics:
- DB size in tens–hundreds of GBs
- Continuous usage, higher write load
- Possibly master‑replica setup
Suggested strategy:
- Weekly physical full backup (e.g. engine‑specific backup or snapshot)
- Daily logical or differential backup
- Continuous or frequent log backups (WAL/binlogs/redo logs) to meet stricter RPO
- Use a replica:
- To offload backup load from primary
- As an additional recovery option
- Maintain:
- On‑server recent backups (fast restore)
- Off‑server/off‑region backups (disaster recovery)
Large / Critical Deployments
Typical characteristics:
- Hundreds of GBs to TBs+
- 24/7 high‑traffic systems
- Strict SLAs
Suggested strategy:
- Continuous log shipping or streaming replication to a standby site
- Frequent physical base backups (e.g. nightly) integrated with log shipping
- Multiple backup tiers:
- Local fast recovery copies (e.g. snapshots kept for hours/days)
- Nearline object storage (e.g. S3‑like) for weeks/months
- Cold archival (glacier‑type, tape) for long‑term compliance
- Automated verification:
- Regularly restore into validation environments
- Run health checks and consistency checks on restored DB
You should treat backup & restore as part of the application architecture, not as an afterthought.
Engine‑Aware Strategy Decisions
Without diving into command details, you must account for:
- MySQL/MariaDB
- Transactional vs non‑transactional tables
- Binary logs for point‑in‑time recovery
- Tools that support hot backup and incremental copies
- Replication topologies (e.g. backup from replica)
- PostgreSQL
- WAL (Write‑Ahead Log) for point‑in‑time recovery
- Base backups + WAL archiving
- Logical vs physical replication and how they complement backups
For each engine, decide:
- How you will take base backups (logical or physical, from primary or replica)
- How you will capture ongoing changes (logs, replication)
- How you will do recovery:
- Full restore only, or
- Restore + roll forward to a chosen point in time
Storage and Location Strategy
3‑2‑1 Rule Adapted to Databases
A common guideline:
- 3 copies of your data
- 2 different storage media
- 1 off‑site
For databases:
- Copy 1: live database data
- Copy 2: backup on local or same‑site storage (fast restore)
- Copy 3: backup in another location/region (disaster recovery)
Different media can mean:
- Local SSD/HDD
- Network storage (NFS, SAN)
- Object storage (S3‑compatible, cloud bucket)
- Tape or long‑term archive
On‑Server vs Off‑Server Backups
- On‑server backups
- Pros: fast, simple, quick restore
- Cons: lost together with server (disk failure, compromise, ransomware)
- Off‑server backups
- Pros: survive local failures and many security incidents
- Cons: require network bandwidth, more complex encryption and access control
Good practice: maintain both, with different retention policies.
Encryption and Access Control
Database backups often include full datasets, so they are extremely sensitive.
Key points:
- Encrypt backups at rest:
- Disk encryption (LUKS) for local storage
- Client‑side or server‑side encryption for object storage
- Encrypt in transit:
- Use TLS when sending backups to remote locations
- Restrict access:
- Separate backup accounts and service users
- Principle of least privilege on backup repositories
- Key management:
- Store encryption keys separately from backup storage
- Consider centralized secret management
Scheduling, Automation, and Retention
Scheduling Backups
Use Linux scheduling tools (cron, systemd timers, or external orchestrators) to:
- Run:
- Full backups at low‑traffic times
- Incremental/log backups more frequently
- Throttle backup load or run with lower IO priority (
ionice,nice) to reduce impact - Stagger backup jobs across multiple databases/servers
Design for:
- Predictable windows for heavy operations (full backup, compaction)
- Avoiding overlapping maintenance tasks that can overload storage/CPU
Retention Policies
Define how long you keep each class of backup:
- Operational (fast access, short retention)
- e.g. daily full + hourly incrementals kept for 7–14 days
- Compliance / audit (long retention, slower access)
- e.g. monthly snapshots kept for 1–7 years
Practical considerations:
- Regulatory requirements
- Storage cost vs restore usefulness
- Legal hold vs routine clean‑up
Automate deletion of old backups so disks/buckets don’t quietly fill.
Backup Catalogs and Metadata
Backups need metadata to be usable:
- What DB/server and version
- Date/time and time zone
- Backup type (full, incremental, log)
- Dependencies (e.g. “incremental #3 needs full backup from date X”)
You can:
- Maintain a backup catalog (database or structured files)
- Embed metadata in backup filenames and manifest files
- Use backup systems that track chains automatically
Point‑in‑Time Recovery (PITR) Strategies
PITR lets you restore to just before a corruption or destructive query.
High‑level pattern:
- Take periodic base backups
- Continuously archive write‑ahead logs / binlogs / redo logs
- When needed:
- Restore the last good base backup
- Replay logs up to a chosen timestamp or log position
- Stop replay just before the unwanted change
Strategy considerations:
- How much log history you keep (affects maximum PITR window)
- How you identify the recovery target:
- Timestamp of incident
- Transaction ID / log sequence number
- Known good application event
- Where logs are stored:
- Local and/or off‑site
- With integrity checks and immutability where possible
Using Replication and Snapshots as Part of Backup Strategy
Replication ≠ Backup
Replication (e.g. primary–replica, streaming replication):
- Helps with:
- Read scaling
- High availability
- Offloading backups
- Does not protect against:
- Application bugs (bad DELETEs replicate instantly)
- Logical corruption replicated to all nodes
- Human error replicated across cluster
Replicas are complements, not replacements, for backups.
Backups from Replicas
Advantages:
- Reduces load and distraction on primary
- Still gives consistent backups if done correctly
Considerations:
- Ensure replica is:
- In good health and not lagging too much
- Configured to allow consistent backups
- Don’t assume “if replica exists, I’m safe” → still need:
- Independent backups
- Off‑site copies
Filesystem and Storage Snapshots
For databases that support hot backup modes, filesystem/storage snapshots can be powerful:
- LVM snapshots, ZFS snapshots, Btrfs snapshots
- SAN or cloud storage snapshots
Strategy cautions:
- Ensure snapshot is DB‑aware:
- Coordinate with DB to flush and quiesce where necessary
- Or integrate with engine’s backup API
- Don’t rely solely on snapshot metadata → periodically restore a snapshot to verify it boots and DB starts cleanly
- Plan snapshot frequency vs performance and space overhead
Testing Restores and Validating Backups
Regular Recovery Drills
A backup you’ve never restored is not truly a backup.
Integrate into strategy:
- Scheduled test restores:
- Restore to a separate test server or container
- Verify:
- DB starts
- Application can connect and perform basic queries
- Document:
- Steps executed
- Time taken (RTO measurement)
- Any issues encountered
Integrity and Consistency Checks
Validation methods:
- Check backup files:
- Checksums (e.g. SHA‑256) on backup archives
- Storage system integrity tools
- After restore:
- Run DB‑native consistency checks
- Run a subset of application tests or health checks
- For log‑based chains:
- Verify you have a contiguous sequence of logs from base backup to target
Include validation in automation pipelines where possible.
Special Scenarios and Anti‑Patterns
Handling Schema Changes and Migrations
When planning schema migrations:
- Take a backup immediately before applying major changes
- For “dangerous” operations (massive deletes/updates), consider:
- Short‑interval snapshot or PITR marker
- Ability to roll back via PITR rather than only forward migrations
- Incorporate backup checks into deployment pipelines (DevOps/CI)
Anti‑Patterns to Avoid
Common pitfalls:
- Only keeping backups on the same server as the database
- Never testing restores
- Relying exclusively on replicas as “backups”
- Single huge monolithic backup with no:
- PITR
- Granularity (per database/tenant)
- No documentation → only one person knows how to restore
- Manual processes with no automation → error‑prone, often forgotten
Designing a Backup Strategy: A Practical Checklist
When you design or review a strategy for a Linux database server, answer:
- Requirements
- What are RPO and RTO?
- Regulatory or compliance retention requirements?
- Backup Types
- Which full, incremental/differential, and log backups do you use?
- Logical, physical, or both?
- Consistency
- How do you guarantee transaction‑consistent backups?
- Are you using hot, warm, or cold backups?
- Location and Redundancy
- Where are backups stored (local, remote, off‑site)?
- Do you meet the 3‑2‑1 principle or an equivalent?
- Security
- Are backups encrypted at rest and in transit?
- Who can access them; how are keys managed?
- Automation
- How are backups scheduled and monitored?
- Are failed backups alerted and logged?
- Retention
- What is kept for how long, and where?
- How are old backups pruned?
- Testing
- How often do you run restore drills?
- Do you measure and document actual RTO/RPO achieved?
Using these questions as a framework helps ensure your database backup strategies are deliberate, auditable, and aligned with business needs—not just a collection of ad‑hoc scripts.