4.4.5 Snapshots and rollbacks

Understanding Snapshots and Rollbacks

In the context of filesystems and storage, a snapshot is a point‑in‑time view of data that can be used to inspect, back up, or restore the state of a filesystem or volume. A rollback is the act of reverting live data back to a state captured in a snapshot.

This chapter focuses on how snapshots and rollbacks are implemented and used on Linux, particularly with common technologies:

LVM (Logical Volume Manager)
Btrfs
ZFS

You’ll see conceptual differences, typical workflows, and how these tools behave in real systems.

Snapshot Design Concepts

Although implementations differ, most snapshot systems revolve around two core ideas:

Copy-on-Write (CoW) vs Full Copies

Two major strategies exist:

Copy-on-Write (CoW) snapshots

Initial snapshot creation is nearly instantaneous and tiny.
The snapshot initially “shares” blocks with the source.
When the source changes a block, the old data is copied to snapshot storage first, then the source is modified.
Efficient for many snapshots and frequent changes.

This is how Btrfs, ZFS, and modern LVM thin snapshots work.

Full copy (clone) snapshots

Snapshot is a complete copy of data at snapshot time.
Takes time and space proportional to data size.
Simple to reason about, but less efficient for frequent snapshots.
Often used when migrating or backing up data offline.

Sometimes tools call full copies “clones” or “replicas” to distinguish them from CoW snapshots.

Crash-Consistent vs Application-Consistent

Snapshots capture blocks, not necessarily a fully flushed application state.

Crash-consistent: Equivalent to pulling the power plug. Filesystem will replay its journal or log on mount, but in-flight application writes may be partially completed.
Application-consistent: Databases or services are told to flush buffers, pause writes, or enter a quiesced state before the snapshot, then resume afterward.

For databases or VMs, use hooks (e.g., pre/post snapshot scripts, systemd units) to quiesce applications if you need application-consistent snapshots.

LVM Snapshots

LVM snapshots operate on logical volumes rather than individual filesystems. You snapshot a logical volume, not the filesystem inside it (although the two are tightly related).

There are two relevant modes:

Legacy LVM snapshots (not thin provisioning): CoW in a dedicated snapshot LV.
Thin-provisioned LVM snapshots: Modern, scalable snapshots with a thin pool.

Legacy LVM Snapshots (thick volumes)

Conceptually:

You have an origin LV, e.g. vg0/root.
You create a snapshot LV, e.g. vg0/root_snap, with its own capacity.
As blocks of the origin change, old versions are copied to the snapshot LV.

Key characteristics:

The snapshot LV must be large enough to hold all original blocks that change after snapshot creation.
When the snapshot fills, it becomes invalid and is dropped.
Multiple snapshots significantly increase write I/O due to CoW overhead.

Typical creation flow:

Ensure filesystem on the origin LV is in a consistent state:

For ext4 or xfs, flush with sync and possibly remount read-only if you want a very “clean” snapshot.
For live systems, this is often skipped in favor of crash-consistent snapshots.

Create a snapshot:

Example (conceptual only): lvcreate -s -L 5G -n root_snap /dev/vg0/root

Use the snapshot LV for:

Backup (mount it read-only and archive).
Testing changes.

Remove snapshot when no longer needed to free space.

Rollback strategy with legacy snapshots

LVM’s native “rollback” is not as direct as on Btrfs/ZFS:

Normally you:

Boot from rescue media or a different root.
Optionally make sure the origin is not mounted.
Restore the origin by merging snapshot content back into it (or by overwriting the origin from backup/clone).

Using the LVM merge feature (where supported):

Concept: “rollback” by merging snapshot LV into origin LV.
Typical flow:

Unmount origin.
Request merge of snapshot into origin.
Reboot or remount the restored volume.

This is more disruptive than filesystems designed around snapshots.

Thin-Provisioned LVM Snapshots

Thin provisioning adds a thin pool that stores data blocks for many thin volumes and snapshots.

Properties:

Snapshots are cheap to create (metadata-only changes).
Thin pool space is shared across all thin LVs and their snapshots.
More scalable than legacy snapshots.

Typical structure:

Thin pool LV: vg0/thinpool
Thin volumes: vg0/root, vg0/home
Snapshots: vg0/root_snap1, vg0/root_snap2, …

Workflow:

Create a thin pool and thin LV (covered elsewhere in the storage chapter).
Create snapshots of thin LVs inside the pool.
Rollback:

Use lvconvert --merge (thin-snapshot aware) or create a new LV from a snapshot, then switch to it.

You must monitor thin pool usage; if the pool fills, all thin volumes become read-only or fail, depending on configuration.

Btrfs Snapshots and Rollbacks

Btrfs implements snapshots at the filesystem level on subvolumes. A Btrfs subvolume is a separately mountable, CoW-managed tree inside a Btrfs filesystem.

Key properties:

Snapshots are instantaneous and CoW-based.
Snapshots can be read-only or read-write.
Snapshots are space-efficient, growing only as data diverges.
Rollbacks can be done by changing which subvolume is mounted as root or by replacing data with a snapshot.

Subvolumes vs Filesystems

Btrfs typically uses one block device (e.g. /dev/sda2) with many subvolumes inside:

Example subvolumes:

@ (root filesystem)
@home
@snapshots

The block device is mounted once, and subvol/subvolid mount options choose which subvolume to expose at a given mount point.

Snapshots are always of subvolumes, not of individual files.

Creating and Managing Snapshots

Typical operations (conceptual):

Create a read-only snapshot:

Use btrfs subvolume snapshot -r source target.

Create a read-write snapshot:

Use btrfs subvolume snapshot source target.

List subvolumes/snapshots:

Use btrfs subvolume list.

Read-only snapshots are ideal for backup and rollback operations, because they cannot be accidentally modified.

Rollbacks with Btrfs

You don’t “rewind in place” as with some LVM merges; instead, you pick which subvolume the system should use as the active root.

Common distro strategy:

Your root filesystem is a subvolume @.
Snapshots of root live in @/.snapshots/<id>/snapshot or under a separate @snapshots subvolume.
Bootloader (e.g., GRUB with a Btrfs-aware plugin) can boot directly into a selected snapshot.

Manual rollback workflow (conceptually):

Boot into a rescue system or another Linux environment with the Btrfs volume mounted.
Mount the Btrfs device with a generic mountpoint, e.g. /mnt, and locate your subvolumes and snapshots.
Decide which snapshot to roll back to.
Either:

Rename current @ to something like @.broken and rename snapshot to @, or
Adjust /etc/fstab and bootloader entries to use the snapshot subvolume.

Reboot into the rolled-back system.

Advantages:

Rollback can be nearly instantaneous.
Multiple snapshots can be kept for incremental rollback options.
Very common in snapshot‑based update systems (openSUSE, some Debian derivatives).

Snapshots and System Updates

Btrfs snapshots pair well with transactional or snapshot-aware update tools:

Before applying a large update:

Take a snapshot of the root subvolume.

Apply updates to a new snapshot or to the current root.
If the system fails to boot or misbehaves:

Boot into the pre-update snapshot and roll back, or
Switch the default boot entry to a known-good snapshot.

This pattern dramatically reduces the risk of system updates.

ZFS Snapshots and Rollbacks

ZFS integrates volume management and filesystem functionality. Snapshots operate at the dataset level.

Key properties:

Snapshots are instant, CoW-based, and immutable.
Datasets are mounted filesystems; each can have its own snapshots.
ZFS has powerful send/receive features to replicate snapshots.

ZFS Datasets and Snapshots

A dataset is a ZFS filesystem or volume:

Example datasets:

pool/root
pool/home
pool/var/log

Snapshots are named with an @ suffix:

pool/root@before_upgrade
pool/home@2025-01-01

Snapshots do not appear as normal directories unless exposed via specific mountpoints or tools; they’re managed via ZFS commands.

Typical operations:

Create snapshot:

zfs snapshot pool/root@before_upgrade

List snapshots:

zfs list -t snapshot

Destroy snapshot:

zfs destroy pool/root@before_upgrade

Rollbacks with ZFS

ZFS provides a dedicated rollback mechanism:

zfs rollback dataset@snapshot replaces the dataset’s current contents with that snapshot.
Any changes made after the snapshot are discarded.

Workflow:

Identify the snapshot to roll back to.
Unmount or stop services using the dataset if needed (some operations require this).
Run rollback command.
Remount or restart services.

Caveats:

You cannot roll back if the dataset has snapshots newer than the target snapshot, unless you destroy those newer snapshots first or use specific flags to discard them.
Rolling back is a destructive operation for changes made after the snapshot.

Clone vs Snapshot

ZFS supports both:

Snapshot: read-only point-in-time reference.
Clone: a writable dataset created from a snapshot.

Workflow:

Take a snapshot of a dataset.
Create a clone from that snapshot.
Use the clone to test changes, upgrades, or experiments.
Either promote the clone to be the main dataset, or discard it.

This is useful for safe testing environments or quickly spawning development sandboxes.

Snapshots for Replication

A major strength of ZFS is snapshot-based replication:

zfs send streams data corresponding to a snapshot (or differences between snapshots).
zfs receive applies that stream on another system or pool.

Conceptual flow:

zfs snapshot pool/data@backup1
zfs send pool/data@backup1 | ssh backuphost zfs receive backup/data
Next time:

zfs snapshot pool/data@backup2
zfs send -i pool/data@backup1 pool/data@backup2 | ssh backuphost zfs receive backup/data

This creates incremental backups based on snapshots with minimal data transfer.

Comparing Snapshot Approaches

Granularity and Scope

LVM:

Operates at the block device/logical volume level.
Filesystem-agnostic.
Good for whole-volume rollback or consistent backups across multiple filesystems (if coordinated).

Btrfs:

Operates at subvolume level.
Filesystem-integrated; aware of file metadata.
Very convenient for OS-level snapshot/rollback mechanisms.

ZFS:

Operates at dataset level.
Deep integration with storage, quotas, compression, and replication.

Performance Considerations

Snapshots introduce extra metadata and CoW overhead:

Writes may slow as more snapshots accumulate.
On heavily written systems, too many snapshots can degrade performance.

Cleanup is crucial:

Deleting unneeded snapshots frees space and can reduce CoW complexity.

For high‑churn workloads (e.g., databases):

Prefer fewer, well-timed snapshots.
Use application-level backup tools in combination with lower-level snapshots.

Space Management

Snapshots share blocks with their origin until data diverges.
Space used by snapshots grows with the size of changed data since snapshot creation.
You must monitor:

LVM thin pool usage.
Btrfs filesystem free space.
ZFS pool capacity.

If the underlying pool or filesystem runs out of space, all operations, including snapshots, can fail or destabilize the system.

Safe Rollback Practices

Rollbacks are powerful but disruptive; they rewrite history.

Guidelines:

Always have backups separate from snapshots

Snapshots protect against accidental modification or short-term issues.
They do not protect against:

Disk failure.
Pool corruption.
Catastrophic hardware loss.

Use off-host or off-site backups (e.g. rsync, ZFS send, Btrfs send) in addition to snapshots.

Plan for configuration and data separation

Put system and user data on separate logical units (LVM volumes, Btrfs subvolumes, ZFS datasets).
Roll back system snapshots without losing user data.
Or use different snapshot policies for system vs data volumes.

Use read-only snapshots for rollback

Take read-only snapshots (Btrfs/ZFS) as “golden” points.
For experiments, create writable snapshots/clones derived from read-only snapshots.
Roll back from known-good read-only snapshots instead of mutable ones.

Integrate with the bootloader and package manager

Some distros integrate:

snapshots with grub-btrfs or similar tools,
transactional upgrades with automatic snapshots (e.g., openSUSE).

This makes rollback from failed updates routine and safer.

Test rollback procedures

Practice in a virtual machine:

Take snapshots, break the system, then roll back.

Confirm:

Services start correctly after rollback.
Databases and applications behave as expected.

Document the procedure for your environment.

Example Use Cases

Desktop OS Protection

Before large updates or configuration changes:

Take snapshots of the root filesystem (Btrfs or ZFS).

If the system fails to boot or becomes unstable:

Select an older snapshot from the boot menu.
Validate it, then make it the new default.

Developer Sandboxes

Base image stored as a read-only snapshot/dataset/subvolume.
Each developer:

Creates a writable snapshot (clone) to experiment.
Discards and recreates snapshots when environment breaks.

Very fast compared to rebuilding environments from scratch.

Quick “Oh No” Recovery

Accidentally deleted or modified files:

Find the nearest snapshot preceding the mistake.
Copy files from snapshot back into live filesystem.

No need for full rollback; fine-grained restoration from snapshots is possible (Btrfs/ZFS, or by mounting LVM snapshots read-only and copying data).

Backup Integration

Use snapshots to freeze a consistent view of data:

Create snapshot.
Run backup from the snapshot (not from the live filesystem).
Destroy snapshot when backup is complete.

This avoids issues where files change during backup and can reduce downtime for backup windows.

When to Use Which Snapshot Tool

Given a choice:

LVM snapshots:

You already use LVM for partition management.
Your filesystem doesn’t provide snapshots (e.g. ext4, xfs).
You need volume-level snapshots, possibly across multiple filesystems.

Btrfs snapshots:

Your distribution supports Btrfs-based root filesystems.
You want easy OS rollback; openSUSE, Fedora variants, or some Ubuntu flavors often use this.
You want per-subvolume snapshots, especially for configuration vs data separation.

ZFS snapshots:

You are using ZFS pools for data or root.
You need strong replication, checksumming, and advanced data integrity.
You are comfortable with ZFS licensing implications on your platform.

Each technology can implement snapshots and rollbacks; pick the one that fits your filesystem choice and operational needs.

Comments

Please login to add a comment.

Don't have an account? Register now!

4.4.5 Snapshots and rollbacks

Understanding Snapshots and Rollbacks

Snapshot Design Concepts

Copy-on-Write (CoW) vs Full Copies

Crash-Consistent vs Application-Consistent

LVM Snapshots

Legacy LVM Snapshots (thick volumes)

Thin-Provisioned LVM Snapshots

Btrfs Snapshots and Rollbacks

Subvolumes vs Filesystems

Creating and Managing Snapshots

Rollbacks with Btrfs

Snapshots and System Updates

ZFS Snapshots and Rollbacks

ZFS Datasets and Snapshots

Rollbacks with ZFS

Clone vs Snapshot

Snapshots for Replication

Comparing Snapshot Approaches

Granularity and Scope

Performance Considerations

Space Management

Safe Rollback Practices

Example Use Cases

Desktop OS Protection

Developer Sandboxes

Quick “Oh No” Recovery

Backup Integration

When to Use Which Snapshot Tool

Comments

Where to Move