Kahibaro
Discord Login Register

3.6.3 Creating tar archives

Introduction

In Linux, tar is the classic tool for bundling many files and directories into a single archive file. The name comes from “tape archive,” but today it is used for backups, packaging, and transfers. In this chapter you will focus on creating and handling tar archives, including compressed variants, as part of a simple backup strategy.

What a tar Archive Is

A tar archive is a single file that contains many files and directories, along with their paths, permissions, timestamps, and basic metadata. It does not compress by itself. Compression is usually added by combining tar with tools such as gzip, bzip2, or xz.

A plain tar file typically has the extension .tar. When compressed, you often see .tar.gz, .tgz, .tar.bz2, or .tar.xz.

Important: A tar archive is a container. It preserves file structure and metadata inside one file. Compression is optional and is added on top of tar, not instead of it.

Basic tar Command Structure

The tar command has a consistent pattern when creating archives. The general structure for creating an archive is:

$$
\text{tar} \ \text{[options]} \ -f \ \text{archive-name} \ \text{file-or-directory}...
$$

You typically use:

A simple example to create an archive from a directory called project:

bash
tar cf project.tar project

Here c means create, f means use project.tar as the archive file, and project is what you are archiving.

Creating a Simple tar Archive

To create a plain, uncompressed archive of a directory:

bash
tar cf backup.tar /home/alex/documents

This command will create backup.tar in the current working directory, containing the /home/alex/documents directory and everything inside it.

If you want to see which files are being added during creation, add the v option for verbose output:

bash
tar cvf backup.tar /home/alex/documents

Inside the archive, paths are stored exactly as you pass them. If you use absolute paths like /home/alex/documents, the archive will contain absolute paths. If you use relative paths, such as documents from within /home/alex, the archive will reflect that.

Rule: Use v only for human inspection. It is not required for correct archives. A common combination for creation is cvf but the only required options are c and f.

Adding Multiple Files and Directories

You can add several paths in one command. For example:

bash
tar cf work_backup.tar reports spreadsheets notes.txt

Here tar includes the reports directory, the spreadsheets directory, and the single file notes.txt. You can mix directories and individual files freely.

If you are in /home/alex and want selected items from there:

bash
cd /home/alex
tar cf selected.tar documents pictures todo.txt

The archive will contain documents, pictures, and todo.txt as relative paths starting from the directory where you ran tar.

Compression with gzip

Although tar itself does not compress, it is commonly used together with compression. The most common combination is tar with gzip. With GNU tar, you can request gzip compression with the z option.

To create a gzip compressed archive:

bash
tar czf backup.tar.gz /home/alex/documents

or, with verbose output:

bash
tar czvf backup.tar.gz /home/alex/documents

The z option tells tar to run the data through gzip during archive creation. The resulting .tar.gz file is sometimes shortened to .tgz, which is simply a different filename extension for the same format.

If you prefer the shorter extension:

bash
tar czf backup.tgz /home/alex/documents

This behaves identically. Only the name differs.

Rule: c means create, f gives the archive filename, and z enables gzip compression. The order of letters in czf does not matter as long as f itself is followed by the archive filename.

Compression with bzip2 and xz

For better compression ratios, you can request bzip2 or xz compression. These are slower than gzip, especially xz, but often produce smaller archives.

To use bzip2 compression, use the j option and usually the .tar.bz2 extension:

bash
tar cjf backup.tar.bz2 /home/alex/documents

To use xz compression, use the J option and usually the .tar.xz extension:

bash
tar cJf backup.tar.xz /home/alex/documents

Both commands parallel the gzip example, with only the compression option and file extensions changed.

Archiving from the Correct Directory

Where you run tar from affects the paths inside the archive. For clean, portable archives, it is often useful to change into a directory first, then archive subdirectories or specific files.

Suppose you want an archive that contains only documents without the entire absolute path /home/alex/documents. You can do:

bash
cd /home/alex
tar czf documents.tar.gz documents

Inside documents.tar.gz, paths start at documents/ rather than /home/alex/documents/.

You can achieve the same result using -C to change directory only for tar:

bash
tar czf documents.tar.gz -C /home/alex documents

Here -C /home/alex tells tar to internally change to /home/alex before adding documents. This is particularly useful in scripts where you want explicit control of paths without manually changing directories.

Rule: Use -C to control the root of paths inside the archive. This makes archives more portable and avoids saving long absolute paths when you do not need them.

Excluding Files and Directories

Often you want to exclude certain files or directories from the archive, such as cache directories or large temporary files. The --exclude option lets you omit paths that match a pattern.

For example, to exclude everything under .cache when archiving your home directory:

bash
tar czf home_backup.tar.gz /home/alex --exclude='/home/alex/.cache'

Patterns can use wildcards. To exclude all .tmp files within the directory you are archiving:

bash
tar czf project.tar.gz project --exclude='*.tmp'

Patterns are matched against the stored path names. If you are using relative paths, adjust your patterns accordingly. For more complex exclusions, you can also provide a file that lists patterns, but for many backups a small number of --exclude options is enough.

Verifying and Listing tar Archives

After creating an archive, you might want to check which files it contains, without extracting it. The t option lists the contents of a tar archive.

To list a plain archive:

bash
tar tf backup.tar

To list a gzip compressed archive:

bash
tar tzf backup.tar.gz

For bzip2 and xz compressed archives, use j and J respectively:

bash
tar tjf backup.tar.bz2
tar tJf backup.tar.xz

Adding v will show more details, such as permissions and timestamps, similar to ls -l:

bash
tar tvf backup.tar

Listing is a simple way to verify that the files and directories you expect are present in the archive and that the paths look correct.

Using tar for Simple Backups

As part of a basic backup strategy, tar is often used to create periodic snapshots of important directories. For example, a daily backup of a user’s home directory might look like this:

bash
tar czf /backups/alex-$(date +%F).tar.gz /home/alex --exclude='/home/alex/.cache'

Here $(date +%F) produces a date string in the form YYYY-MM-DD, so each backup file has a unique, date based name.

You can also archive specific application data directories, configuration directories, or project trees. For example, to back up a web application directory:

bash
tar cJf /backups/webapp-$(date +%F).tar.xz -C /var/www webapp

This uses xz compression to minimize size and -C to store paths relative to /var/www.

Rule: When using tar for backups, always test archives by listing or extracting them on a noncritical system or test location. Do not assume a backup is valid until you have confirmed it.

Testing Restoration without Overwriting

Even though extraction is covered elsewhere, it is helpful to understand a safe way to test an archive you created. You can extract an archive to a temporary directory so it does not overwrite existing files.

For example:

bash
mkdir /tmp/test_restore
tar xzf backup.tar.gz -C /tmp/test_restore

This lets you confirm that the archive contains what you need and that paths and permissions make sense, without changing your real data.

Using tar in Pipelines

A common advanced usage is piping a tar archive through other tools. Even if you are a beginner, you will sometimes see or need commands that use tar with ssh or other tools.

To create a tar archive and simultaneously compress it with a different tool, you can use standard input and output. For example, if you want to use gzip explicitly:

bash
tar cf - /home/alex/documents | gzip > documents.tar.gz

Here - as the filename tells tar to write the archive to standard output. The pipe | sends it to gzip, and the redirection > writes the compressed data into documents.tar.gz.

Similarly, you can stream an archive over the network, but that belongs in a broader networking context. The key idea is that tar can read from and write to standard streams, which makes it flexible for scripting and remote backups.

Summary

In this chapter you focused on creating tar archives for backup and archival purposes. You saw how to create plain and compressed archives, how to control the paths inside them, how to exclude unnecessary data, and how to verify your archives. These tools form the core of file based backup using tar, and they integrate naturally with the broader backup strategies and tools covered elsewhere.

Views: 7

Comments

Please login to add a comment.

Don't have an account? Register now!