Table of Contents
Why Archiving and Compression Matter
On Linux servers and workstations, you’ll constantly need to:
- Back up directories
- Transfer groups of files efficiently
- Package logs or project directories
- Save space on disk or over the network
Two common tasks:
- Archiving: grouping many files into a single file (no size reduction by itself).
- Compression: reducing the size of data using algorithms.
On Linux, these are often combined, especially using tar for archiving and tools like gzip or xz for compression.
This chapter focuses specifically on:
taras an archivergzipandxzas compression tools- Practical usage patterns you’ll actually use as an admin
`tar`: The Standard Archiver
tar (tape archive) combines files/directories into a single archive file, often called a “tarball”.
Typical filename conventions:
- Archive only (no compression):
backup.tar - Compressed with
gzip:backup.tar.gzorbackup.tgz - Compressed with
xz:backup.tar.xz
Basic `tar` Options
Common short options (often combined):
c— createx— extractt— list contentsf— use archive file (almost always needed)v— verbose (show files being processed)z— filter archive throughgzipJ— filter archive throughxz
Creating an Archive (No Compression)
Create backup.tar from /etc and /home/user/docs:
tar cf backup.tar /etc /home/user/docsc— createf backup.tar— use the filebackup.tar
Add v for verbose output:
tar cvf backup.tar /etc /home/user/docsListing Archive Contents
Without extracting:
tar tf backup.tarVerbose listing:
tar tvf backup.tarExtracting an Archive
Extract in the current directory:
tar xf backup.tarExtract a specific file or directory:
tar xf backup.tar etc/hosts
Change extraction directory with -C:
mkdir restore
tar xf backup.tar -C restorePreserving Permissions and Ownership
By default tar preserves permissions; when running as root, it can preserve ownership too. Commonly:
sudo tar cpf backup.tar /var/www
sudo tar xpf backup.tar -C /var/www
p (--preserve-permissions) is especially relevant when extracting as root to maintain original modes.
Using `tar` with `gzip`
gzip is a widely supported, fast compressor. tar can call it directly with -z.
Creating a `.tar.gz` Archive
Standard form:
tar czf backup.tar.gz /etc /home/user/docsExplanation:
c— createz— compress withgzipf backup.tar.gz— archive file name
Verbose:
tar czvf backup.tar.gz /etc /home/user/docsExtracting a `.tar.gz` Archive
tar detects gzip via -z:
tar xzf backup.tar.gzVerbose and change target directory:
tar xzvf backup.tar.gz -C /tmp/restoreListing Contents of `.tar.gz`
tar tzf backup.tar.gzUsing `tar` with `xz`
xz usually compresses smaller than gzip, but more slowly and using more CPU.
tar uses -J to integrate with xz.
Creating a `.tar.xz` Archive
tar cJf backup.tar.xz /etc /home/user/docsVerbose:
tar cJvf backup.tar.xz /etc /home/user/docsExtracting a `.tar.xz` Archive
tar xJf backup.tar.xzWith verbose and target directory:
tar xJvf backup.tar.xz -C /tmp/restoreListing Contents of `.tar.xz`
tar tJf backup.tar.xzUsing `gzip` Directly
gzip works on a single file at a time. It compresses that file and replaces it with a .gz version.
gzip file.log
Now you have file.log.gz and the original file.log is gone (by default).
Decompress with `gunzip` or `gzip -d`
gunzip file.log.gz
# or
gzip -d file.log.gzKeep Original File While Compressing
Use -c to write to stdout and redirect:
gzip -c file.log > file.log.gz
Now both file.log and file.log.gz exist.
Compression Levels
gzip supports levels $1$ to $9$:
-1— fastest, least compression-9— slowest, best compression (default is usually-6)
Example:
gzip -9 bigfile.imgViewing Inside a `.gz` File
You can use zcat, zless, zmore to read compressed text files without manually uncompressing:
zcat file.log.gz
zless file.log.gzUsing `xz` Directly
xz is similar to gzip but with different defaults and typically better compression ratios at the cost of speed.
Compress:
xz file.img
This creates file.img.xz and removes file.img by default.
Decompress:
unxz file.img.xz
# or
xz -d file.img.xzKeep Original File
xz -c file.img > file.img.xzCompression Levels and Presets
xz uses presets -0 to -9:
-0— fastest, least compression-6— default-9— maximum compression (slow, high memory)
Example:
xz -9 backup.tar
This produces backup.tar.xz.
Streaming and Pipelines
Both gzip and xz support streaming, which is useful in pipelines or over SSH (see below).
Common `tar` + Compression Workflows
Archive and Compress a Directory
Using gzip:
tar czf /backups/etc-$(date +%F).tar.gz /etc
Using xz:
tar cJf /backups/etc-$(date +%F).tar.xz /etcBackup With Exclusions
Exclude certain paths with --exclude:
tar czf home.tar.gz /home \
--exclude=/home/user/.cache \
--exclude=/home/user/DownloadsUse patterns:
tar czf logs.tar.gz /var/log --exclude='*.gz'Incremental or Partial Backups (Simplified)
Without full incremental features, you can still select a subset of data, e.g., configuration files only:
tar czf configs.tar.gz \
/etc \
/home/*/.config \
--exclude=/etc/ssl/private
(Full backup strategy design is covered elsewhere; here the focus is just on how tar performs selection.)
Archiving Over SSH (Without Temporary Files)
You can use pipelines to avoid creating large temporary archives on disk.
Sending an Archive to Another Machine
From source machine, sending /var/www to backup@server:
Using gzip:
tar czf - /var/www | ssh backup@server "cat > /backups/www.tar.gz"Explanation:
-as filename fortarmeans “write to stdout”- Piped through SSH to remote
catthat saves it to a file
Using xz (slower, smaller):
tar cJf - /var/www | ssh backup@server "cat > /backups/www.tar.xz"Remote Extraction Without Intermediate File
From local, read remote archive and extract locally:
ssh backup@server "cat /backups/www.tar.gz" | tar xzvf -
Similarly for .tar.xz:
ssh backup@server "cat /backups/www.tar.xz" | tar xJvf -(This approach is common in admin workflows when disk space is tight.)
Handling Permissions, Ownership, and Timestamps
When using tar for system backups, pay attention to metadata:
-p— preserve permissions on extraction--same-owner— try to preserve file ownership (typically default when run as root)--numeric-owner— store numeric UID/GID rather than names (useful across systems with different user databases)
Example for a more “faithful” archive, as root:
sudo tar cpf backup-system.tar \
--numeric-owner \
/etc /var /homePractical Comparison: `gzip` vs `xz`
As an admin, you choose based on trade-offs:
gzip- Faster compression and decompression
- Lower CPU and memory usage
- Slightly larger files
- Very widely supported
- Good default for logs, quick backups, temporary archives
xz- Better compression ratio (smaller files)
- Slower, more CPU-intensive
- Higher memory usage, especially at high levels
- Good for long-term storage, large infrequent backups
In practice:
- Use
tar.gzfor routine backups/log rotation. - Use
tar.xzfor archives that must be as small as possible and are not constantly created/restored.
Safety and Verification
Test Archive Contents Before Extracting
List before extracting to avoid surprises:
tar tzf backup.tar.gz | head
Check where files will go (paths usually start with / or a relative path).
Extract to a Temporary Directory
Avoid overwriting existing files accidentally:
mkdir /tmp/test-restore
tar xzvf backup.tar.gz -C /tmp/test-restoreInspect the restored content, then move what you need.
Summary of Useful Commands
Create archives:
# tar only
tar cf backup.tar /path/to/data
# tar + gzip
tar czf backup.tar.gz /path/to/data
# tar + xz
tar cJf backup.tar.xz /path/to/dataList contents:
tar tf backup.tar
tar tzf backup.tar.gz
tar tJf backup.tar.xzExtract:
tar xf backup.tar
tar xzf backup.tar.gz
tar xJf backup.tar.xzDirect compression:
gzip file
gunzip file.gz
xz file
unxz file.xzThese tools form the foundation of backup, transfer, and storage workflows you’ll use repeatedly as a Linux administrator.