Table of Contents
Understanding `tar` in the Context of Backups
tar (tape archiver) is the standard tool on Linux for creating and restoring archive files. In backup workflows it is most often used to:
- Bundle many files and directories into a single archive.
- Preserve ownership, permissions, and timestamps.
- Optionally compress data via other tools (gzip, bzip2, xz, zstd).
Most Linux backup tools and scripts either invoke tar directly or mimic its behavior.
Basic `tar` Syntax for Archives
The general tar structure:
- Create:
tar -cf ARCHIVE.tar [options] FILES... - List:
tar -tf ARCHIVE.tar - Extract:
tar -xf ARCHIVE.tar [options]
Commonly used short options (can be combined, e.g., czf):
c– create a new archive.x– extract an archive.t– list archive contents.f– specify archive file name (almost always needed).v– verbose output (show files processed).
Examples:
# Create an uncompressed archive of /etc
tar -cf etc-backup.tar /etc
# List contents
tar -tf etc-backup.tar
# Extract into current directory
tar -xf etc-backup.tarPreserving Metadata for System Backups
For system or configuration backups, preserving metadata is crucial.
Key options:
-p– preserve permissions when extracting (often used withsudo).--same-owner– try to restore original owners (requires root).--acls– preserve Access Control Lists (if supported).--xattrs– preserve extended attributes (SELinux labels, capabilities).--numeric-owner– store numeric UID/GID instead of names (safer across systems).
When creating an archive for system-level restore:
sudo tar -cpf rootfs-backup.tar \
--acls --xattrs --numeric-owner \
/In practice you will exclude runtime directories (covered below), but this shows the pattern.
Using Compression with `tar`
tar itself does not compress by default; it can delegate compression to other tools.
Common options:
-z– use gzip (.tar.gzor.tgz)-j– use bzip2 (.tar.bz2)-J– use xz (.tar.xz)--zstd– use zstd (.tar.zst) on newer versions
Trade-offs (roughly):
- gzip: fast, moderate compression.
- bzip2: slower, better compression (less common today).
- xz: slowest, best compression (good for archives you rarely change).
- zstd: very fast, good compression (increasingly popular).
Examples:
# gzip-compressed archive
tar -czf home-backup.tar.gz /home
# xz-compressed archive
tar -cJf config-backup.tar.xz /etc
# zstd-compressed archive (if supported)
tar --zstd -cf data-backup.tar.zst /var/www
To extract, you only need -xf; tar auto-detects the compressor:
tar -xf home-backup.tar.gz
tar -xf config-backup.tar.xz
tar -xf data-backup.tar.zstRelative vs Absolute Paths in Archives
How you specify paths when creating an archive affects where files extract later.
- Absolute paths (start with
/): archive records/etc,/home/user, etc. - Extraction by default will recreate them as absolute paths.
- Relative paths: do not start with
/(e.g.,etc,home/user). - Extraction places them under your current directory (unless you use
-C).
Safer approach for portable backups:
# From root, but store as relative paths
cd /
sudo tar -czf root-backup.tar.gz \
--acls --xattrs --numeric-owner \
etc var home
Extraction example into /restore-root:
sudo mkdir -p /restore-root
cd /restore-root
sudo tar -xzf /path/to/root-backup.tar.gz
# etc, var, home now appear under /restore-root, not overwriting your live systemExcluding Files and Directories
For practical backups, you rarely want to archive everything.
--exclude allows you to skip paths:
# Exclude a directory
tar -czf etc-backup.tar.gz \
--exclude='/etc/ssl/private' \
/etc
# Multiple excludes
tar -czf root-backup.tar.gz \
--exclude='/proc' \
--exclude='/sys' \
--exclude='/dev' \
--exclude='/run' \
/You can also exclude by pattern:
# Exclude all .cache directories inside /home
tar -czf home-backup.tar.gz \
--exclude='*/.cache' \
/homeExclusion file (for many patterns):
# exclude.txt
/proc
/sys
/dev
/run
/tmp
/var/tmp
/var/cache
/home/*/.cache
# Use it
sudo tar -czf root-backup.tar.gz \
--exclude-from=exclude.txt \
/Incremental and Differential Archives with `tar`
GNU tar supports incremental-style backups using snapshot files.
Core idea:
--listed-incremental=SNAPFILEtellstarto track file state inSNAPFILE.- The first run using a new snapshot file is a full backup.
- Later runs with the same snapshot file only include changed files.
Example workflow:
# 1. Full backup
sudo tar -czf full-backup.tar.gz \
--listed-incremental=snapshot.snar \
/
# 2. Incremental backup next day
sudo tar -czf incr-2025-12-13.tar.gz \
--listed-incremental=snapshot.snar \
/
# 3. Another incremental backup later
sudo tar -czf incr-2025-12-14.tar.gz \
--listed-incremental=snapshot.snar \
/To inspect what an incremental archive contains:
tar -tvf incr-2025-12-13.tar.gz
Restoring incremental backups is more complex than simple archives and must follow the sequence (full, then each incremental). For more complex strategies, many admins prefer external backup tools that manage this layering for you, but this demonstrates tar’s built-in capability.
Managing Large Archives and Splitting
When archiving large datasets, you may want to split archives into manageable pieces (for example, to fit on media or upload limits).
Typical approach: combine tar with split:
# Create and split into 1G chunks
tar -czf - /home | split -b 1G - home-backup.tar.gz.part-
# This produces files like:
# home-backup.tar.gz.part-aa, home-backup.tar.gz.part-ab, ...To restore:
# Reassemble and extract
cat home-backup.tar.gz.part-* | tar -xzf -
You can also use --multi-volume with tar itself, but combining with split is simpler and more common.
Using `tar` with Pipes and Remote Backups
Because tar reads/writes to standard input/output, it integrates well with other tools and remote transfers.
Using `tar` over `ssh`
Create a backup on a remote host:
# From local machine, backing up /var/www on remote host to local file
ssh user@remote 'tar -czf - /var/www' > remote-www-backup.tar.gzOr send local backup to remote host:
# From local machine, store/archive on remote host
tar -czf - /var/www | ssh user@backup-host 'cat > www-backup.tar.gz'Combining with `rsync` or other tools
You can also compress or transform a stream:
# tar + xz via pipe, then encrypt with gpg (example)
tar -cf - /home | xz | gpg --symmetric -o home-backup.tar.xz.gpgVerifying Archive Integrity
tar itself does not embed checksums of the entire archive, but it will report I/O errors and format issues. To add stronger verification, combine tar with checksum tools.
Basic verification approach:
- After creating an archive, compute a checksum:
sha256sum home-backup.tar.gz > home-backup.tar.gz.sha256- Later, verify:
sha256sum -c home-backup.tar.gz.sha256
# OK if it prints: 'home-backup.tar.gz: OK'
You can also use tar --compare (-d) to compare archive contents with the filesystem:
# Compare archive vs current filesystem
sudo tar -df etc-backup.tar /etcDifferences may appear if files have changed since the backup; this is more useful immediately after creation to confirm consistency.
Practical Backup Examples with `tar`
Example: Backing Up `/etc` Configuration
sudo tar -czf etc-$(date +%F).tar.gz \
--acls --xattrs --numeric-owner \
/etcExample: Home Directory Backup with Exclusions
tar -czf home-$(date +%F).tar.gz \
--exclude='*/.cache' \
--exclude='*/Downloads' \
/homeExample: Root Filesystem Backup (Non-Live Restore)
Using an exclusion file:
# exclude-root.txt
/proc
/sys
/dev
/run
/tmp
/var/tmp
/var/cache
/home/*/.cache
# Create backup
sudo tar -czf root-$(date +%F).tar.gz \
--acls --xattrs --numeric-owner \
--exclude-from=exclude-root.txt \
/This archive is suitable for restoring into a non-live environment (e.g., rescue system, chroot, or another disk) as part of a broader restore process.
Restoring from `tar` Archives Safely
Restoring is the other half of using tar for backups. Some practical points:
- Always verify the target directory before extraction.
- Use
-tto inspect contents before-x. - For system restores, perform operations from a rescue environment to avoid overwriting a running system.
Examples:
# Inspect archive structure
tar -tf root-2025-12-12.tar.gz | head
# Extract into a custom directory (e.g., new root)
sudo mkdir -p /mnt/restore
sudo tar -xzf root-2025-12-12.tar.gz -C /mnt/restoreFor system-wide restore (overwriting existing paths), use absolute paths with care and usually from a non-booted system (e.g., using a live USB or recovery mode).
Integrating `tar` into Backup Strategies
Within a broader backup and restore strategy, tar is typically:
- The low-level tool to create consistent, metadata-preserving archives.
- Combined with:
- Scheduling (cron/systemd timers).
- Off-site or remote storage (ssh, cloud sync tools).
- Checksums and possibly encryption.
Understanding tar’s options and behavior lets you build reliable backup routines and reason about how to reconstruct systems from those archives when needed.