Table of Contents
Introduction
In Linux, tar is the classic tool for bundling many files and directories into a single archive file. The name comes from “tape archive,” but today it is used for backups, packaging, and transfers. In this chapter you will focus on creating and handling tar archives, including compressed variants, as part of a simple backup strategy.
What a tar Archive Is
A tar archive is a single file that contains many files and directories, along with their paths, permissions, timestamps, and basic metadata. It does not compress by itself. Compression is usually added by combining tar with tools such as gzip, bzip2, or xz.
A plain tar file typically has the extension .tar. When compressed, you often see .tar.gz, .tgz, .tar.bz2, or .tar.xz.
Important: A tar archive is a container. It preserves file structure and metadata inside one file. Compression is optional and is added on top of tar, not instead of it.
Basic tar Command Structure
The tar command has a consistent pattern when creating archives. The general structure for creating an archive is:
$$
\text{tar} \ \text{[options]} \ -f \ \text{archive-name} \ \text{file-or-directory}...
$$
You typically use:
cto create an archivefto specify the archive file name- Other options to control compression, verbosity, and paths
A simple example to create an archive from a directory called project:
tar cf project.tar project
Here c means create, f means use project.tar as the archive file, and project is what you are archiving.
Creating a Simple tar Archive
To create a plain, uncompressed archive of a directory:
tar cf backup.tar /home/alex/documents
This command will create backup.tar in the current working directory, containing the /home/alex/documents directory and everything inside it.
If you want to see which files are being added during creation, add the v option for verbose output:
tar cvf backup.tar /home/alex/documents
Inside the archive, paths are stored exactly as you pass them. If you use absolute paths like /home/alex/documents, the archive will contain absolute paths. If you use relative paths, such as documents from within /home/alex, the archive will reflect that.
Rule: Use v only for human inspection. It is not required for correct archives. A common combination for creation is cvf but the only required options are c and f.
Adding Multiple Files and Directories
You can add several paths in one command. For example:
tar cf work_backup.tar reports spreadsheets notes.txt
Here tar includes the reports directory, the spreadsheets directory, and the single file notes.txt. You can mix directories and individual files freely.
If you are in /home/alex and want selected items from there:
cd /home/alex
tar cf selected.tar documents pictures todo.txt
The archive will contain documents, pictures, and todo.txt as relative paths starting from the directory where you ran tar.
Compression with gzip
Although tar itself does not compress, it is commonly used together with compression. The most common combination is tar with gzip. With GNU tar, you can request gzip compression with the z option.
To create a gzip compressed archive:
tar czf backup.tar.gz /home/alex/documentsor, with verbose output:
tar czvf backup.tar.gz /home/alex/documents
The z option tells tar to run the data through gzip during archive creation. The resulting .tar.gz file is sometimes shortened to .tgz, which is simply a different filename extension for the same format.
If you prefer the shorter extension:
tar czf backup.tgz /home/alex/documentsThis behaves identically. Only the name differs.
Rule: c means create, f gives the archive filename, and z enables gzip compression. The order of letters in czf does not matter as long as f itself is followed by the archive filename.
Compression with bzip2 and xz
For better compression ratios, you can request bzip2 or xz compression. These are slower than gzip, especially xz, but often produce smaller archives.
To use bzip2 compression, use the j option and usually the .tar.bz2 extension:
tar cjf backup.tar.bz2 /home/alex/documents
To use xz compression, use the J option and usually the .tar.xz extension:
tar cJf backup.tar.xz /home/alex/documentsBoth commands parallel the gzip example, with only the compression option and file extensions changed.
Archiving from the Correct Directory
Where you run tar from affects the paths inside the archive. For clean, portable archives, it is often useful to change into a directory first, then archive subdirectories or specific files.
Suppose you want an archive that contains only documents without the entire absolute path /home/alex/documents. You can do:
cd /home/alex
tar czf documents.tar.gz documents
Inside documents.tar.gz, paths start at documents/ rather than /home/alex/documents/.
You can achieve the same result using -C to change directory only for tar:
tar czf documents.tar.gz -C /home/alex documents
Here -C /home/alex tells tar to internally change to /home/alex before adding documents. This is particularly useful in scripts where you want explicit control of paths without manually changing directories.
Rule: Use -C to control the root of paths inside the archive. This makes archives more portable and avoids saving long absolute paths when you do not need them.
Excluding Files and Directories
Often you want to exclude certain files or directories from the archive, such as cache directories or large temporary files. The --exclude option lets you omit paths that match a pattern.
For example, to exclude everything under .cache when archiving your home directory:
tar czf home_backup.tar.gz /home/alex --exclude='/home/alex/.cache'
Patterns can use wildcards. To exclude all .tmp files within the directory you are archiving:
tar czf project.tar.gz project --exclude='*.tmp'
Patterns are matched against the stored path names. If you are using relative paths, adjust your patterns accordingly. For more complex exclusions, you can also provide a file that lists patterns, but for many backups a small number of --exclude options is enough.
Verifying and Listing tar Archives
After creating an archive, you might want to check which files it contains, without extracting it. The t option lists the contents of a tar archive.
To list a plain archive:
tar tf backup.tarTo list a gzip compressed archive:
tar tzf backup.tar.gz
For bzip2 and xz compressed archives, use j and J respectively:
tar tjf backup.tar.bz2
tar tJf backup.tar.xz
Adding v will show more details, such as permissions and timestamps, similar to ls -l:
tar tvf backup.tarListing is a simple way to verify that the files and directories you expect are present in the archive and that the paths look correct.
Using tar for Simple Backups
As part of a basic backup strategy, tar is often used to create periodic snapshots of important directories. For example, a daily backup of a user’s home directory might look like this:
tar czf /backups/alex-$(date +%F).tar.gz /home/alex --exclude='/home/alex/.cache'
Here $(date +%F) produces a date string in the form YYYY-MM-DD, so each backup file has a unique, date based name.
You can also archive specific application data directories, configuration directories, or project trees. For example, to back up a web application directory:
tar cJf /backups/webapp-$(date +%F).tar.xz -C /var/www webapp
This uses xz compression to minimize size and -C to store paths relative to /var/www.
Rule: When using tar for backups, always test archives by listing or extracting them on a noncritical system or test location. Do not assume a backup is valid until you have confirmed it.
Testing Restoration without Overwriting
Even though extraction is covered elsewhere, it is helpful to understand a safe way to test an archive you created. You can extract an archive to a temporary directory so it does not overwrite existing files.
For example:
mkdir /tmp/test_restore
tar xzf backup.tar.gz -C /tmp/test_restoreThis lets you confirm that the archive contains what you need and that paths and permissions make sense, without changing your real data.
Using tar in Pipelines
A common advanced usage is piping a tar archive through other tools. Even if you are a beginner, you will sometimes see or need commands that use tar with ssh or other tools.
To create a tar archive and simultaneously compress it with a different tool, you can use standard input and output. For example, if you want to use gzip explicitly:
tar cf - /home/alex/documents | gzip > documents.tar.gz
Here - as the filename tells tar to write the archive to standard output. The pipe | sends it to gzip, and the redirection > writes the compressed data into documents.tar.gz.
Similarly, you can stream an archive over the network, but that belongs in a broader networking context. The key idea is that tar can read from and write to standard streams, which makes it flexible for scripting and remote backups.
Summary
In this chapter you focused on creating tar archives for backup and archival purposes. You saw how to create plain and compressed archives, how to control the paths inside them, how to exclude unnecessary data, and how to verify your archives. These tools form the core of file based backup using tar, and they integrate naturally with the broader backup strategies and tools covered elsewhere.