3.2 Storage and Filesystems

Table of Contents

Why Storage Matters in Linux Administration

In day‑to‑day Linux administration, storage is one of the most critical areas you’ll manage. Everything from user data and databases to logs and backups ultimately lives on some kind of storage. Mismanaging it can lead to:

Full disks that crash services
Data loss or corruption
Slow systems and I/O bottlenecks
Failed backups and restores

This chapter gives you the big picture of storage and filesystems on a Linux system, so the later chapters in this section (devices/partitions, mounting, disk tools, archiving) make sense in context.

You won’t learn specific commands in depth here—that’s for the child chapters—but you’ll understand the concepts they operate on.

Basic Storage Building Blocks

Think of Linux storage as a stack of layers:

Physical storage devices
Partitions and volume managers
Filesystems
Mount points and directory layout
Data and applications

Physical Storage Devices

At the lowest level are devices you plug into servers or virtual machines:

HDDs (Hard Disk Drives)

Spinning platters, mechanical head
Cheaper, larger capacities
Higher latency, slower random access
Good for archives, logs, backups

SSDs (Solid State Drives)

Flash memory, no moving parts
Much faster, especially random I/O
Limited write endurance (but good enough for most use)
Ideal for OS, databases, VMs

NVMe drives

SSDs using PCIe instead of SATA
Much higher bandwidth and lower latency
Common in newer servers and laptops

Removable media

USB sticks, external drives, SD cards
Used for installation, offline backups, or transferring data

In Linux, these appear as device files, typically under /dev (details are covered in “Devices and partitions”).

Partitions, LVM, and RAID (Conceptual Overview)

Between the raw device and your filesystem, you often have some abstraction:

Partitions

Divide a physical disk into logical sections
Each partition can hold a filesystem or be part of a volume/RAID
Useful to separate system data from user data or logs

LVM (Logical Volume Manager) (covered deeply later)

Lets you group physical storage into flexible logical volumes
You can resize volumes, create snapshots, and move data between disks

RAID

Combines multiple physical disks for redundancy and/or performance
Levels like RAID 1, 5, 10 have different trade‑offs
Implemented in hardware (RAID controllers) or software (e.g. mdadm)

You don’t need to be an expert yet—just recognize that a filesystem might sit on top of a plain partition, an LVM logical volume, or a RAID array.

What Is a Filesystem (in Practice)?

A filesystem is the structure that organizes how data is stored and retrieved on a storage device or volume.

Conceptually, a filesystem provides:

A directory tree: hierarchical structure (/, /home, /var/log, …)
Metadata: ownership, permissions, timestamps, file size, etc.
Data storage: where the actual content of files lives
Allocation strategies: how to place data on disk for performance and reliability
Consistency mechanisms: journaling, checksums, copy‑on‑write, etc.

Different filesystems make different trade‑offs in performance, reliability, features, and complexity. Later in this section you’ll see specific Linux filesystems (like EXT4, XFS, Btrfs), but here we’ll focus on shared concepts.

Key Filesystem Concepts

You’ll see these terms often when dealing with filesystems:

Blocks

Basic unit of storage managed by a filesystem
Typical size: 4 KiB (but can vary)
Files bigger than one block are stored in multiple blocks

Inodes

Data structures storing metadata:

Owner, group
Permissions (rwx)
Timestamps (created/modified/accessed)
File size
Pointers to data blocks

Directories map filenames to inode numbers

Journaling

A log of filesystem changes, used to recover after crashes
Reduces risk of corruption and long “fsck” times
Many modern Linux filesystems are journaling filesystems (e.g. EXT4, XFS)

Mounting

Attaching a filesystem to a directory (mount point) in the single unified tree
For example, /dev/sdb1 mounted on /data

The chapter “Mounting and unmounting” will handle the practical side; here, you just need to know that a filesystem is “usable” only when it is mounted somewhere.

The Storage Stack in Linux

It helps to visualize the full path from hardware to files:

$$
\text{Application} \rightarrow \text{File} \rightarrow \text{VFS} \rightarrow \text{Filesystem} \rightarrow \text{Block Layer} \rightarrow \text{Device}
$$

Breaking this down:

Applications

Use system calls like open(), read(), write() via libraries and shells
They work with paths like /var/log/syslog, not devices

Virtual Filesystem (VFS)

Kernel layer that provides a common interface to all filesystems
Makes different filesystem types and devices appear uniform

Filesystem driver

Code in the kernel that knows how to read/write a specific filesystem type (e.g. EXT4 driver, XFS driver)

Block layer

Generic layer for reading/writing fixed‑size blocks from/to block devices

Block device

Physical disk, partition, LVM volume, RAID device, etc.

Understanding this hierarchy is important when troubleshooting: a problem could be at any layer—application, filesystem, device, or hardware.

Common Storage Use Cases and Layouts

On a real system, you rarely have “one big disk with one filesystem”. Instead, you design a layout that fits your needs.

Typical Server Layout (Conceptual)

A common simple scheme might look like:

/ on one filesystem (OS and tools)
/home on another filesystem (user data)
/var or /var/log on its own filesystem (logs, variable data)
/srv or /data on a dedicated data filesystem

Why separate?

Prevent logs filling up the whole disk
Allow different mount options (e.g. noexec on some directories)
Place heavy‑I/O data on faster or dedicated storage

Storage Types by Use Case

Different data has different requirements:

OS and system binaries

Need reliability and read speed
Usually on SSD/NVMe
Often standard filesystem like EXT4 or XFS

Databases

Very sensitive to latency and I/O patterns
Often use fast SSD/NVMe
Filesystem tuned for small random writes and fsync performance

Backups and archives

Capacity more important than speed
Often HDDs, possibly large RAID arrays
Sometimes deduplication or compression (via filesystem or tools)

Logs

Constantly appended to
Can fill disks quickly
Usually placed on partitions that won’t impact system stability if full

As an administrator, you design both physical placement (which disk or RAID) and logical layout (which mount points, filesystem options) based on these needs.

Performance, Reliability, and Trade‑offs

Storage decisions always involve trade‑offs between:

Performance

Throughput (MB/s)
IOPS (I/O operations per second)
Latency (time per operation)

Capacity

How much data you can store

Reliability

How well data is preserved despite crashes or failures
Consistency guarantees after unexpected power loss

Complexity and manageability

How hard it is to configure, monitor, and troubleshoot

Performance Factors (High‑Level)

Performance depends on:

Type of media (HDD vs SSD vs NVMe)
Filesystem design and options
Access pattern:

Sequential vs random
Large files vs many small files

Kernel tunables and I/O scheduler
Underlying abstraction:

RAID level
LVM layering
Network storage

Later chapters on disk usage tools and monitoring will help you measure these in practice.

Reliability and Data Integrity

Filesystems and storage stacks use several mechanisms to protect data:

Journaling

Logs metadata (and sometimes data) changes before committing them
Helps recover after crashes with minimal corruption

Checksumming and copy‑on‑write (COW)

Some filesystems verify data with checksums
Copy‑on‑write ensures that old data is preserved until new data is safely written

RAID and redundancy

Protect against disk failure (but not user errors, bugs, or malware)
Still need backups

Snapshots

Point‑in‑time views of data
Useful for quick rollbacks and backups

Long‑term safety requires backups and sometimes offsite copies, which are addressed in the “Backup and Restore” section.

Local vs Network Storage

Not all storage is physically attached to your machine.

Local Storage

Directly connected devices:

SATA/NVMe disks
Local RAID controllers

Lowest latency
Generally simplest to manage
Used for OS, local data, and high‑performance workloads

Network Storage (Overview Only)

Linux can use network‑hosted storage as if it were local:

File‑level protocols

Export directories over the network (NFS, Samba)
Mounted on clients as normal directories
Permissions and performance depend on server and network

Block‑level over network

iSCSI, Fibre Channel, etc.
Appear as block devices to the OS; you create filesystems on them as usual

These are covered in much more detail in later “Network Services” chapters (e.g., NFS, Samba). At this level, you just need to know that a “disk” might be a network device, not a local one.

Managing Storage Over Time

Storage management is not a one‑time activity during installation; it’s an ongoing responsibility.

Key recurring tasks include:

Monitoring free space

Directories like /var, /tmp, and user home directories can grow unexpectedly
Logs and databases are common culprits

Extending capacity

Adding new disks or enlarging LVM volumes
Creating new filesystems and mount points when needed

Resizing filesystems

Some filesystems support online grow or shrink
Often involves adjusting underlying partitions or logical volumes first

Cleaning up

Rotating logs
Removing old backups or temporary files
Archiving old data to cheaper storage

Checking and repairing filesystems

Periodic integrity checks
Running filesystem repair tools after failures or improper shutdowns

Those practical tools and workflows will be covered in the subsequent chapters in this section.

How This Chapter Fits with the Rest

This chapter gave you the “map” of Linux storage:

How physical devices, partitions, LVM, RAID, and filesystems stack together
What a filesystem does conceptually
Why layouts and mount points matter
The main trade‑offs around performance and reliability

Next, the child chapters in “Storage and Filesystems” will drill into specific topics:

Devices and partitions: how Linux represents disks, and how to slice them
Filesystems (EXT4, XFS, Btrfs): concrete types and their strengths
Mounting and unmounting: making storage visible in the directory tree
Disk usage tools: inspecting where your space goes and how disks perform
Archiving and compression: how to store and move data efficiently

Keep this mental model of the storage stack in mind as you learn the individual components—it will make the commands and tools much easier to understand.

3.2.1 Devices and partitions

3.2.2 Filesystems (EXT4, XFS, Btrfs)

3.2.3 Mounting and unmounting

3.2.4 Disk usage tools

3.2.5 Archiving and compression (tar, gzip, xz)