3.2.4 Disk usage tools

Table of Contents

Introduction

Disk usage tools help you understand how storage is being used on your Linux system. They answer questions such as which directories are largest, how full a filesystem is, and where space is being consumed. This chapter focuses on the most common command line tools that report disk and directory usage and how to interpret their output.

The `df` Command: Filesystem Level Usage

The df command reports disk usage at the filesystem level. It tells you how much space is used and available on each mounted filesystem, not in individual directories.

Run it in its simplest form with:

df

The default output is usually in 1 kilobyte blocks, which can be hard to read. Use the -h option to show “human readable” sizes, that is with units like K, M, G:

df -h

Important columns typically include the filesystem device, its total size, used space, available space, and the mount point where it is accessible in the directory tree.

If you want to see how full a particular filesystem is, you can specify the mount point or a path inside it:

df -h /
df -h /home/user

Both commands report usage information for the filesystem that contains the specified path.

To focus on actual disk filesystems and skip pseudo filesystems such as /proc and /sys, use:

df -h -x tmpfs -x devtmpfs

This excludes specific filesystem types that are not regular on disk storage.

Always check filesystem usage with df -h before performing operations that may use a lot of space, such as large file copies, backups, or database imports.

The `du` Command: Directory and File Level Usage

The du command shows disk usage for directories and files. Unlike df, which works at the filesystem level, du walks the directory tree and reports the space used by each subdirectory and file it encounters.

Running du without options in the current directory prints many lines, one per subdirectory:

du

This is rarely what you want. To see a human readable summary of the current directory, use:

du -sh .

Here -s produces a summary for the given path, and -h prints sizes with units. You can list several directories at once:

du -sh /var /home /tmp

To see the size of each immediate subdirectory, use:

du -sh ./*

This is useful for quickly identifying which subdirectory under the current directory uses the most space.

Sometimes you need more detail. If you want a tree of directory sizes from a path downward, try:

du -h /var

The output lists each subdirectory and its cumulative size. This can be very long, so it is common to combine it with sort to find the largest entries, which will be discussed later.

du reports the actual space occupied on disk, which can differ from the apparent file size if there are holes, compression, or sparse files involved. The behavior can vary with different filesystems.

Comparing `df` and `du`

A frequent point of confusion is that df and du often report different numbers.

df shows total used and available space at the filesystem level. It counts everything the filesystem considers allocated, including reserved blocks and sometimes deleted files that are still open by running processes.

du shows the aggregate sizes of files and directories you can see, based on walking the directory tree. It only counts what it can access through the filesystem hierarchy.

Because of this difference, it is common for df -h / to show more used space than the sum of du -sh /* appears to account for. Logged activity, reserved administrative space, or recently deleted but still open files can all contribute to differences.

Finding Large Directories with `du` and `sort`

When a filesystem starts to fill, you often want to find which directories are using the most space. A standard pattern combines du and sort.

To list the largest directories at the top level of the root filesystem:

du -xh / | sort -h | tail -n 20

Here -x tells du to stay on one filesystem so it does not cross mount points, and -h keeps sizes human readable. sort -h sorts by size with units, and tail -n 20 shows the 20 largest entries.

For a faster, higher level overview, you can restrict du to one directory depth. Many du implementations support --max-depth:

du -h --max-depth=1 / | sort -h

This shows sizes of immediate subdirectories under /. You can then drill down into a specific directory by repeating the same pattern:

du -h --max-depth=1 /var | sort -h

This step by step approach helps you locate large areas systematically without being overwhelmed by detail.

Use du and sort together to locate large directories before deleting anything. Always examine contents and confirm that data is safe to remove. Do not delete directories such as /var/log or /usr blindly because they appear large.

Human Readable Sizes and Units

Most disk usage tools have an option to present sizes in human readable form. For df and du this is usually -h. Many tools also support -H, which uses powers of 1000 instead of 1024.

In binary units, the following relationships hold:

$$1 \text{ KiB} = 1024 \text{ bytes}$$
$$1 \text{ MiB} = 1024 \text{ KiB} = 1024^2 \text{ bytes}$$
$$1 \text{ GiB} = 1024 \text{ MiB} = 1024^3 \text{ bytes}$$

Understanding these units helps you interpret output precisely, especially when comparing disk usage against sizes reported by hardware vendors or other tools.

Inspecting Individual Files with `stat`

While du focuses on directories and aggregates, you can check detailed information for a specific file with the stat command.

For example:

stat largefile.img

stat displays the apparent size in bytes and the number of blocks used on disk. Very large files that use fewer blocks than expected may be sparse files, where not all ranges contain actual data. In such cases, ls -lh might show a large size, while du -h reports a smaller disk usage.

This distinction is important when you are trying to understand why df reports a disk as full but individual files appear smaller than expected.

Monitoring Space Usage with `ncdu` (If Available)

On many systems you can install a text based interactive disk usage viewer called ncdu. It is not part of every distribution by default, but when available it provides a more user friendly way to explore disk usage.

After installation, you can launch it on a directory such as / or /home:

ncdu /

ncdu scans the directory tree, then presents a sorted list of directories and files by size. You can move with the keyboard to drill down and see which paths use the most space. It is particularly helpful on servers or when you have a lot of nested directories, because it avoids long command chains.

If you use ncdu on a large filesystem, be patient while it performs the initial scan. The result is a concise, navigable overview of where space is going.

Inodes and `df -i`

Disk usage is not only about bytes. Filesystems also have a limit on the number of inodes, which are structures that represent files and directories. If you run out of inodes, you cannot create new files even if there is free space in terms of bytes.

To check inode usage, use:

df -i

The output resembles df but shows inode totals, used inodes, free inodes, and the usage percentage. Very high inode usage often occurs on filesystems that contain many tiny files. This is common in directories that store caches or temporary data.

Deleting unnecessary files reduces inode usage, but you must ensure that you are not removing important data or directories that belong to the system or applications.

If df -i shows 100% inode usage, do not immediately reformat the filesystem. First track down directories with large numbers of small files and remove only files that are safe to delete, such as known cache or temporary files.

Putting It Together in Practical Workflows

When you notice that a filesystem is nearing capacity, a practical routine combines the tools described in this chapter.

You might start by checking overall filesystem usage:

df -h

If a particular filesystem looks nearly full, you can then examine its directory structure. Suppose / is at 90 percent usage. First inspect top level directories:

du -h --max-depth=1 / | sort -h

If /var is one of the largest, look inside it:

du -h --max-depth=1 /var | sort -h

Continue descending into the most significant directories until you locate the main consumers of space. At each level, decide whether data can be archived, rotated, compressed, or removed, using the appropriate tools from elsewhere in the course. For more interactive exploration, if available, you can switch to ncdu / and navigate to the same directories.

If df -h shows free space but you still cannot create files, check inode usage:

df -i

If inodes are exhausted, identify directories that contain many small files with repeated du or with ncdu, and clean them appropriately.

By combining filesystem level and directory level tools, you can diagnose and respond to most space related issues in a controlled way.

Comments

Please login to add a comment.

Don't have an account? Register now!