Kahibaro
Discord Login Register

Archiving and compression (tar, gzip, xz)

Why Archiving and Compression Matter

On Linux servers and workstations, you’ll constantly need to:

Two common tasks:

On Linux, these are often combined, especially using tar for archiving and tools like gzip or xz for compression.

This chapter focuses specifically on:

`tar`: The Standard Archiver

tar (tape archive) combines files/directories into a single archive file, often called a “tarball”.

Typical filename conventions:

Basic `tar` Options

Common short options (often combined):

Creating an Archive (No Compression)

Create backup.tar from /etc and /home/user/docs:

tar cf backup.tar /etc /home/user/docs

Add v for verbose output:

tar cvf backup.tar /etc /home/user/docs

Listing Archive Contents

Without extracting:

tar tf backup.tar

Verbose listing:

tar tvf backup.tar

Extracting an Archive

Extract in the current directory:

tar xf backup.tar

Extract a specific file or directory:

tar xf backup.tar etc/hosts

Change extraction directory with -C:

mkdir restore
tar xf backup.tar -C restore

Preserving Permissions and Ownership

By default tar preserves permissions; when running as root, it can preserve ownership too. Commonly:

sudo tar cpf backup.tar /var/www
sudo tar xpf backup.tar -C /var/www

p (--preserve-permissions) is especially relevant when extracting as root to maintain original modes.


Using `tar` with `gzip`

gzip is a widely supported, fast compressor. tar can call it directly with -z.

Creating a `.tar.gz` Archive

Standard form:

tar czf backup.tar.gz /etc /home/user/docs

Explanation:

Verbose:

tar czvf backup.tar.gz /etc /home/user/docs

Extracting a `.tar.gz` Archive

tar detects gzip via -z:

tar xzf backup.tar.gz

Verbose and change target directory:

tar xzvf backup.tar.gz -C /tmp/restore

Listing Contents of `.tar.gz`

tar tzf backup.tar.gz

Using `tar` with `xz`

xz usually compresses smaller than gzip, but more slowly and using more CPU.

tar uses -J to integrate with xz.

Creating a `.tar.xz` Archive

tar cJf backup.tar.xz /etc /home/user/docs

Verbose:

tar cJvf backup.tar.xz /etc /home/user/docs

Extracting a `.tar.xz` Archive

tar xJf backup.tar.xz

With verbose and target directory:

tar xJvf backup.tar.xz -C /tmp/restore

Listing Contents of `.tar.xz`

tar tJf backup.tar.xz

Using `gzip` Directly

gzip works on a single file at a time. It compresses that file and replaces it with a .gz version.

gzip file.log

Now you have file.log.gz and the original file.log is gone (by default).

Decompress with `gunzip` or `gzip -d`

gunzip file.log.gz
# or
gzip -d file.log.gz

Keep Original File While Compressing

Use -c to write to stdout and redirect:

gzip -c file.log > file.log.gz

Now both file.log and file.log.gz exist.

Compression Levels

gzip supports levels $1$ to $9$:

Example:

gzip -9 bigfile.img

Viewing Inside a `.gz` File

You can use zcat, zless, zmore to read compressed text files without manually uncompressing:

zcat file.log.gz
zless file.log.gz

Using `xz` Directly

xz is similar to gzip but with different defaults and typically better compression ratios at the cost of speed.

Compress:

xz file.img

This creates file.img.xz and removes file.img by default.

Decompress:

unxz file.img.xz
# or
xz -d file.img.xz

Keep Original File

xz -c file.img > file.img.xz

Compression Levels and Presets

xz uses presets -0 to -9:

Example:

xz -9 backup.tar

This produces backup.tar.xz.

Streaming and Pipelines

Both gzip and xz support streaming, which is useful in pipelines or over SSH (see below).


Common `tar` + Compression Workflows

Archive and Compress a Directory

Using gzip:

tar czf /backups/etc-$(date +%F).tar.gz /etc

Using xz:

tar cJf /backups/etc-$(date +%F).tar.xz /etc

Backup With Exclusions

Exclude certain paths with --exclude:

tar czf home.tar.gz /home \
  --exclude=/home/user/.cache \
  --exclude=/home/user/Downloads

Use patterns:

tar czf logs.tar.gz /var/log --exclude='*.gz'

Incremental or Partial Backups (Simplified)

Without full incremental features, you can still select a subset of data, e.g., configuration files only:

tar czf configs.tar.gz \
  /etc \
  /home/*/.config \
  --exclude=/etc/ssl/private

(Full backup strategy design is covered elsewhere; here the focus is just on how tar performs selection.)


Archiving Over SSH (Without Temporary Files)

You can use pipelines to avoid creating large temporary archives on disk.

Sending an Archive to Another Machine

From source machine, sending /var/www to backup@server:

Using gzip:

tar czf - /var/www | ssh backup@server "cat > /backups/www.tar.gz"

Explanation:

Using xz (slower, smaller):

tar cJf - /var/www | ssh backup@server "cat > /backups/www.tar.xz"

Remote Extraction Without Intermediate File

From local, read remote archive and extract locally:

ssh backup@server "cat /backups/www.tar.gz" | tar xzvf -

Similarly for .tar.xz:

ssh backup@server "cat /backups/www.tar.xz" | tar xJvf -

(This approach is common in admin workflows when disk space is tight.)


Handling Permissions, Ownership, and Timestamps

When using tar for system backups, pay attention to metadata:

Example for a more “faithful” archive, as root:

sudo tar cpf backup-system.tar \
  --numeric-owner \
  /etc /var /home

Practical Comparison: `gzip` vs `xz`

As an admin, you choose based on trade-offs:

In practice:

Safety and Verification

Test Archive Contents Before Extracting

List before extracting to avoid surprises:

tar tzf backup.tar.gz | head

Check where files will go (paths usually start with / or a relative path).

Extract to a Temporary Directory

Avoid overwriting existing files accidentally:

mkdir /tmp/test-restore
tar xzvf backup.tar.gz -C /tmp/test-restore

Inspect the restored content, then move what you need.


Summary of Useful Commands

Create archives:

# tar only
tar cf backup.tar /path/to/data
# tar + gzip
tar czf backup.tar.gz /path/to/data
# tar + xz
tar cJf backup.tar.xz /path/to/data

List contents:

tar tf backup.tar
tar tzf backup.tar.gz
tar tJf backup.tar.xz

Extract:

tar xf backup.tar
tar xzf backup.tar.gz
tar xJf backup.tar.xz

Direct compression:

gzip file
gunzip file.gz
xz file
unxz file.xz

These tools form the foundation of backup, transfer, and storage workflows you’ll use repeatedly as a Linux administrator.

Views: 23

Comments

Please login to add a comment.

Don't have an account? Register now!