Table of Contents
Introduction
Archiving and compression are two related but distinct tasks that you will perform often on a Linux system. Archiving is about collecting multiple files and directories into a single file. Compression is about reducing the size of data by encoding it more efficiently. In Linux, these tasks are commonly combined, which is why you will frequently see files that are both archived and compressed.
This chapter focuses on practical use of the traditional tar archiver together with common compression tools such as gzip and xz. It assumes you already understand basic filesystem concepts, paths, and file operations, and it does not cover backup strategies or snapshot systems in detail, since those are discussed elsewhere.
Archiving versus compression
An archive is usually a single file that contains multiple files and directories. The classic Linux archive format is the tar file, often called a tarball. A tarball preserves file names, directory structure, permissions, timestamps, and sometimes ownership, inside a single file.
Compression operates on any data stream or file, and tries to represent it using fewer bytes. Compression does not inherently know about files, directories, or permissions, it only sees a sequence of bytes. Compression formats such as gzip and xz wrap that data with some metadata but do not replace what an archiver like tar does.
In practice, you first create an archive using tar, then compress it with a compression tool. Many tar versions automate this combination, so you rarely need to run a separate compression command.
An archive is not necessarily compressed. A compressed file is not necessarily an archive. Do not assume that a single file is compressed just because it is an archive, or that a compressed file contains multiple files.
The `tar` archiver
The tar program, originally "tape archiver", is the standard tool on Linux to pack and unpack collections of files. Its power comes from being able to operate on complex directory trees while preserving metadata.
Basic `tar` syntax
tar uses a set of options that traditionally start with a single dash and are sometimes combined. The most important options are:
c create an archive.
x extract an archive.
t list the contents without extracting.
v verbose, show processed files.
f specify the archive file name.
A typical tar command follows the structure:
tar OPTIONS ARCHIVE FILES...
For example, to create an archive called backup.tar from the directory mydir:
tar cvf backup.tar mydir
This command creates backup.tar in the current directory, containing mydir and its contents. The v flag lists each file as it is added.
To list files in an archive without extracting:
tar tvf backup.tarTo extract an archive into the current directory:
tar xvf backup.tarBy default, extraction recreates the directory structure stored in the archive relative to the current working directory.
Absolute and relative paths in `tar` archives
When creating archives, it is generally safer to store relative paths instead of absolute paths. If you run:
tar cvf etc-backup.tar /etc
some tar implementations will store paths beginning with /. When you later extract as root, these may overwrite files under /etc directly. To avoid this, you can change to a parent directory and archive using relative paths:
cd /
tar cvf /root/etc-backup.tar etc
Now the archive stores etc as a relative directory name and extraction from any working directory will create an etc directory relative to that location, unless you explicitly choose a different extraction directory with options described next.
Extracting into a specific directory
You can control where tar writes extracted files with the -C option. For example:
mkdir /tmp/test-extract
tar xvf backup.tar -C /tmp/test-extract
The -C option changes to the specified directory before extraction. This is useful to avoid polluting your current directory or to test an archive safely.
Excluding files from an archive
For selective archiving, tar can exclude certain files or directories. The --exclude option operates on path patterns relative to the paths you specify.
For example, to archive a project directory without its .git folder:
tar cvf project.tar --exclude='.git' myproject
You can repeat --exclude multiple times with different patterns. For complex exclusion rules, tar supports exclude files, but detailed backup strategies are covered elsewhere.
Always verify your archive before relying on it. Use tar tvf to list contents, and perform test extractions into a temporary directory to confirm that all required files are present and paths look as expected.
`tar` with compression
Although tar by itself does not compress data, it integrates tightly with common compression tools. When you use specific options, tar pipes the archive through a compressor automatically. This results in a single file that is both an archive and compressed, such as .tar.gz or .tar.xz.
`tar` with `gzip`
gzip is a widely used compression tool that trades off compression ratio and speed. tar can call it directly with the -z option.
To create a gzip compressed tarball:
tar czf archive.tar.gz directoryTo list contents of a gzipped tarball:
tar tzf archive.tar.gzTo extract a gzipped tarball:
tar xzf archive.tar.gz
You can still add v for verbose output, for example tar xvzf. The file extension .tar.gz or .tgz is conventional and helps users understand how to handle the file.
`tar` with `xz`
xz is a modern compression tool that usually achieves better compression than gzip, but it can be slower, especially for compression. tar uses xz with the -J option.
To create an xz compressed tarball:
tar cJf archive.tar.xz directoryTo list its contents:
tar tJf archive.tar.xzTo extract it:
tar xJf archive.tar.xz
Again, you may include v for verbose operation, such as tar xJvf.
Using `tar` with arbitrary compressors
Beyond built in options such as -z and -J, tar can work with any compressor that reads from standard input and writes to standard output, by using the -I (capital i) option.
For example, if you wanted to compress using xz with a specific compression level:
tar cI 'xz -9' -f archive.tar.xz directory
Here -I 'xz -9' tells tar which command to run for compression. The same option works for decompression as long as the compressor supports automatic detection, which xz and gzip do.
Working with `gzip`
gzip works on single files, and its default behavior is to compress a file in place, replacing it with a new file with a .gz suffix.
Basic `gzip` operations
To compress a file:
gzip file.txt
After this, file.txt is removed and file.txt.gz appears. To keep the original file, use -k:
gzip -k file.txt
To decompress a .gz file:
gzip -d file.txt.gzor equivalently:
gunzip file.txt.gzTo view information about a gzipped file, such as original name and compression ratio:
gzip -l file.txt.gz
Unlike tar, gzip does not handle multiple files directly. If you run gzip on several files, it will compress each file into its own .gz file. To compress a directory as a single unit, you must combine tar and gzip.
Compression levels in `gzip`
gzip supports compression levels from -1 to -9. Lower numbers compress faster with less size reduction. Higher numbers compress more but take longer.
For example:
gzip -1 largefile
gzip -9 anotherfile
If not specified, gzip uses a reasonable default, usually -6. You will often accept the default unless you have particular performance or size constraints.
Do not expect gzip to preserve permissions, ownership, or directory structures for arbitrary collections of files. It can only compress one file at a time. Use tar when working with directories or when metadata must be preserved.
Working with `xz`
xz offers strong compression which is useful for software distribution, source code archives, and long term storage where saving space matters more than compression speed.
Basic `xz` operations
To compress a file:
xz file.img
The original file is removed and replaced by file.img.xz. To keep the original file:
xz -k file.imgTo decompress:
xz -d file.img.xzor:
unxz file.img.xzTo get information about compressed size, uncompressed size, and ratio:
xz -l file.img.xz
xz also works on a single file or stream at a time and does not manage multiple files as an archive. For that, you must use tar or another archiver.
Compression levels in `xz`
xz supports a wider range of compression levels. The usual range is -0 to -9, similar in meaning to gzip, but there are also presets that adjust memory usage and algorithm behavior.
A high compression example:
xz -9e largearchive.tar
Here -9 requests maximum compression and -e selects a slower, more thorough variant. This may be very slow on large files and can require significant memory.
A faster compression example:
xz -1 fastfile
If you work on systems with limited memory, very high xz levels can cause memory pressure. Consider this when compressing large backups.
Choosing between `gzip` and `xz`
The choice between gzip and xz involves trade offs between speed, compatibility, and compression ratio.
gzip is faster, uses less memory, and is universally available and supported. It is suitable for logs, temporary archives, and situations where you frequently compress and decompress.
xz typically provides better compression at the cost of higher CPU and memory usage. It is common in software distribution archives and long term storage where file size is more important than speed.
If you need rapid handling of many archives, prefer gzip. If you distribute large read only data sets or software tarballs, xz can save bandwidth and storage.
Test your compression choices on representative data. Files that are already compressed, such as media files or existing archives, often gain very little from further compression and can waste CPU time without meaningful space savings.
Combining `tar`, `gzip`, and `xz` in practice
In everyday administration, you will often create archives of configuration directories, application data, or source code trees. The typical steps follow a simple pattern.
To create and compress in one step, choose your compressor and an appropriate extension:
For gzip:
tar czf etc-backup-$(date +%F).tar.gz /etc
For xz:
tar cJf home-backup-$(date +%F).tar.xz /home/user
To restore or inspect these archives, use tar with matching options, or let tar auto detect if your version supports that.
When working with scripts, be explicit in your options so that the intent is clear and less dependent on default behavior. For example, always include -f before the archive name, and use well known file extensions so other tools and administrators can understand what they are dealing with.
With these tools and patterns, you can reliably create compact archives that preserve file metadata and save disk space, and you can integrate them into larger storage, deployment, or backup workflows that are covered in other chapters.