3.2 Storage and Filesystems

Table of Contents

Overview

Storage and filesystems in Linux describe how data is physically stored on disks and how it is organized so that the operating system can find and manage it. As an administrator, you need to understand the basic path from a physical device, such as a hard drive or SSD, through partitions, to filesystems, and finally to the directory tree where users see files and directories.

This chapter gives the conceptual background that connects devices, partitions, filesystems, and mounts. Specific details such as how to create partitions, how to mount and unmount, and how to use disk usage tools will be covered in their own sections later.

From Hardware to Files

On a typical Linux system, storage starts from physical hardware. This can be a spinning hard drive, a solid state drive, a USB stick, or a virtual disk provided by a hypervisor in a virtual machine. The Linux kernel detects these devices and exposes them as special device files in the filesystem, usually under /dev. You do not interact with the electronics of the disk directly. Instead, you deal with these device files, which represent the whole disk or parts of it.

On top of the raw device, you usually create partitions. A partition is a logical subdivision of the disk that acts like an independent storage unit. Each partition can hold its own filesystem and can be mounted separately into the Linux directory tree. Once you have a partition, you create a filesystem on it, then you mount that filesystem somewhere, such as / or /home. Only after mounting does the data become visible as normal files and directories.

This layered view can be summarized as:

$$\text{Physical device} \rightarrow \text{Partition} \rightarrow \text{Filesystem} \rightarrow \text{Mounted directory}$$

Linux can also work with more advanced layers such as software RAID or Logical Volume Management, but these advanced abstractions still end up presenting block devices that receive filesystems and mount points in the same basic way.

Block Devices and Device Names

Linux represents disks and similar storage devices as block devices. A block device is a device that reads and writes data in fixed sized blocks instead of as an endless stream of bytes. This matches how disks physically operate, where data is stored in sectors with sizes such as 512 bytes or 4 KiB.

These block devices appear in /dev with names that follow certain patterns. Traditional SATA or older IDE disks are typically called /dev/sda, /dev/sdb, and so on. Each letter identifies a separate disk. Partitions on those disks are then numbered, such as /dev/sda1, /dev/sda2, and so on. Modern systems may also use names like /dev/nvme0n1 for NVMe SSDs, with partitions like /dev/nvme0n1p1.

The important idea is that the kernel gives you a block device file for each disk and partition. Commands and tools that work with storage take these device files as arguments, not the raw hardware address.

Linux can also create more abstract block devices, for example, by combining multiple disks into software RAID or creating logical volumes. From the point of view of filesystems, such logical devices behave just like physical block devices. They still get a device file in /dev and can have filesystems created on them.

Partitions and Partition Tables

A partition table is data stored on a disk that describes how the disk is divided into partitions. The partition table format determines how many partitions are possible and how they are described. Linux supports common partition table formats such as MBR and GPT. Many modern systems use GPT because it supports large disks and many partitions.

Each entry in the partition table records where the partition starts on the disk, where it ends, and what type it is. The kernel reads this table and provides a separate block device for each partition, such as /dev/sda1 or /dev/sda2. Partitions allow you to separate data logically, for example by keeping system files and user files separate, or by dedicating a partition to swap space.

Some installations avoid partitioning altogether when using advanced schemes, but for most systems that follow a straightforward layout, partitions form the basic units on which filesystems and other layers are placed.

Filesystems as Data Structures

A filesystem is a way to organize data and metadata on a block device so that the operating system can store and retrieve files and directories. It is not just a layout of raw bytes. Instead, it is a complex data structure that defines where directory entries live, where file contents are stored, and how free space is tracked.

Different filesystem types have different internal designs. Common choices on Linux include EXT4, XFS, and Btrfs. Each one solves the same basic problem, laying out files and directories, but with different strategies for performance, reliability, and features. For example, some filesystems focus on simplicity and wide compatibility, while others support advanced features like checksums or snapshots.

When you "create" a filesystem on a partition or other block device, you are writing the initial set of data structures that define that filesystem type. After that, files and directories created within it will follow those rules. The actual commands for creating and checking filesystems depend on the type and are discussed in dedicated sections.

A filesystem sits on top of a block device. Creating a filesystem on a device will destroy any existing data and previous filesystem on that device.

Internally, a filesystem manages two broad categories. File data represents the contents of regular files, such as text, images, or executables. Metadata represents information about these files, including their names, permissions, ownership, sizes, and timestamps. When the system lists a directory or checks file permissions, it is reading metadata. When it reads a document or program, it is reading file data.

Mount Points and the Unified Tree

One of the defining characteristics of Unix like systems, including Linux, is that all files live in a single directory tree that starts at the root directory /. Instead of giving each disk its own separate letter or visible root, Linux grafts each filesystem into a specific directory. This directory is called a mount point.

Mounting is the act of attaching a filesystem to a directory. Before you mount, the filesystem is not visible. After you mount, the contents of that filesystem appear as files and subdirectories under the chosen mount point. For example, if a filesystem is mounted at /home, then /home/user and other directories under /home live on that filesystem. If another filesystem is mounted at /var, then /var/log and other data under /var live on that second filesystem.

The root filesystem, mounted on /, is special. It holds the top level directories such as /etc, /bin, /lib, and so on. Additional filesystems are then mounted under it at specific locations. Linux uses a configuration file to describe which filesystems are mounted automatically at boot, and with which options. That configuration and the process of mounting and unmounting manually are explored separately in the sections that focus on mounting.

Because mounting hides the previous contents of the mount point directory while the filesystem is attached, administrators must choose mount points carefully and avoid mounting on directories that already contain important data.

Types of Filesystems in Linux

Linux works with a variety of filesystem types. Some are intended for internal storage on Linux systems. Others are used mainly for compatibility with other operating systems or removable media.

Native Linux filesystems, such as EXT4, XFS, and Btrfs, integrate with Linux features like permissions, ownership, and symbolic links. They are usually chosen for root partitions and other important system data. They differ in how they handle journaling, scalability, quotas, snapshots, and error detection. These differences become important when you design storage for specific workloads, such as databases or large media archives.

Non native filesystems, such as FAT32, exFAT, or NTFS, are often used on removable drives and partitions shared with Windows or other operating systems. Linux can usually read and write these filesystems, but they may not support all Unix style features. For instance, some of them have limited permission models or filename restrictions.

Linux also supports special pseudo filesystems, such as /proc and /sys. These do not store data on a physical disk. Instead, they provide a way for the kernel to expose information and configuration through a filesystem like interface. Although they behave differently internally, they are still mounted into the directory tree and accessed with normal file operations.

Journaling and Reliability

Many modern filesystems support journaling. Journaling is a technique that records changes in a log before applying them to the main data structures. The purpose is to make recovery from crashes and power failures faster and more reliable.

In a journaling filesystem, modifying a file does not immediately rewrite every affected structure in the main filesystem area. Instead, the change is first described in the journal. Once the change is safely recorded in the journal, the filesystem proceeds to apply it to the main structures. If the system crashes partway through, the filesystem can replay the journal when it is mounted again and complete any pending changes. This reduces the risk of corruption and shortens the time needed for checks.

Some filesystems journal only metadata, while others can journal both data and metadata. Journaling all data can increase reliability at the cost of more write operations. Journaling only metadata is a balance between performance and safety. Native Linux filesystems often let you choose specific journaling modes that control this behavior.

Journaling does not replace backups or checksums. It focuses on internal consistency of the filesystem structure. It does not protect against hardware failures, silent data corruption, or accidental deletion. These risks are addressed by other layers of storage strategy.

Performance Factors in Filesystems

The performance of storage on Linux depends on several interacting factors. The physical characteristics of the device, such as seek time on spinning disks or write amplification on SSDs, impose basic limits. On top of that, the filesystem design, mount options, and workload patterns shape how efficiently data is written and read.

Filesystems may optimize for different things. For example, some are tuned for handling many small files, others favor very large files. Allocation strategies, directory indexing methods, and caching behavior all influence how quickly the system can find and access data.

The kernel aggressively caches filesystem data and metadata in memory to reduce the number of physical disk operations. When you read a file repeatedly, it is often served from the page cache after the first read. When you write data, it may be buffered in memory and flushed to disk later. This buffering can improve performance but also means that not all writes reach the disk immediately.

An administrator can influence performance by selecting a filesystem appropriate for the workload, choosing mount options that change caching or access time updates, and by tuning kernel parameters related to the virtual memory and I/O subsystems. These topics become especially important on busy servers and will connect with system monitoring and performance tuning concepts covered later in the course.

Data Integrity and Checksums

Some modern filesystems include checksums for data and metadata. A checksum is a computed value that summarizes the content of a block. If the block is later read and the checksum does not match, the filesystem can detect that corruption has occurred. In some designs, the filesystem can automatically repair the corrupted block if redundant copies are available, for example via mirroring or parity schemes.

This idea corresponds to a function that maps a block of data to a shorter value:

$$\text{checksum} = f(\text{block data})$$

If the checksum is stored and later recomputed, a mismatch signals that the data has changed in an unexpected way. Filesystems that integrate checksums can detect silent data corruption, which normal journaling filesystems cannot reliably notice because they assume the hardware faithfully stores and returns the bits they wrote.

There is a cost to checksumming, since computing and verifying these values uses CPU time and storage. The trade off is improved confidence in data integrity, which can be important for critical systems. Such features are part of a broader strategy that also involves redundancy, monitoring, and good backup practices.

Snapshots and Copy on Write Concepts

Some Linux filesystems support copy on write behavior and snapshots. Copy on write means that when data is modified, the filesystem writes new copies of the affected blocks instead of overwriting them in place. Because of this property, the filesystem can record a snapshot, which is a consistent view of the filesystem at a specific point in time.

A snapshot does not duplicate all data immediately. Instead, the snapshot references existing blocks. Only when blocks are modified later do new copies need to be written. This makes snapshots efficient in terms of space and allows you to keep multiple points in time with relatively low overhead, especially if most data does not change often.

Snapshots are very useful for tasks such as backups, testing upgrades, or recovering from accidental changes. You can take a snapshot just before applying a risky update. If something goes wrong, you can roll back to the snapshot state. Support for snapshots depends on the filesystem type and the tools associated with it. The details of creating and managing snapshots will appear in sections that focus on snapshot systems and advanced filesystems.

Logical Organization and Layout

How you divide storage into partitions, filesystems, and mount points is part of system design. Different layouts suit different roles. A desktop system might place everything under a single filesystem for simplicity. A server might separate system files, user home directories, application data, and logs across multiple filesystems with different options, such as enabling quotas or using specific mount flags.

There is no single correct layout, but there are typical conventions. For instance, some administrators keep /var on its own filesystem so that logs or temporary data cannot fill the root filesystem. Others isolate /home so user data can be moved, backed up, or resized independently. Understanding the relationship between physical disks, logical divisions, filesystem types, and mount points lets you make informed decisions instead of relying entirely on installers.

Over time, you may add new disks or move data to different filesystems. Linux lets you do this without changing the overall appearance of the directory tree. You can attach new storage at any suitable mount point, migrate data, and then unmount or repurpose old storage. This flexibility is one of the strengths of the Unix style approach.

Storage in Virtual and Networked Environments

In modern environments, storage is often abstracted beyond local physical disks. Virtual machines receive virtual disks from a hypervisor. Containers may use filesystems layered on top of host storage. Network attached storage provides remote filesystems that are mounted locally as if they were on a local disk.

From the perspective of Linux, many of these advanced arrangements still end with a block device or a network filesystem that is mounted into the directory tree. Whether the underlying storage is a local SSD, a SAN volume, or a network shared filesystem, you usually interact with it through the same mount and filesystem abstractions.

This layering allows administrators to apply the same basic tools and concepts. You still need to understand how devices map to filesystems, how mount points are organized, and how choices of filesystem type and layout affect performance and reliability. Subsequent chapters on network services and virtualization will build on this understanding.

Summary

Storage and filesystems in Linux form a stack of layers that convert raw physical devices into structured, accessible files and directories. Block devices represent disks. Partition tables divide them into partitions. Filesystems impose order on these partitions. Mount points integrate each filesystem into a single unified directory tree starting at /.

Filesystem types differ in features like journaling, performance characteristics, checksums, and snapshot support. The choices you make about how to partition disks, which filesystems to use, and how to arrange mount points have long term effects on maintainability, safety, and efficiency. The remaining sections in this part of the course will focus in detail on working with devices and partitions, specific filesystem types, mounting and unmounting, disk usage analysis, and common tools for archiving and compression.

3.2.1 Devices and partitions

3.2.2 Filesystems (EXT4, XFS, Btrfs)

3.2.3 Mounting and unmounting

3.2.4 Disk usage tools

3.2.5 Archiving and compression (tar, gzip, xz)