Table of Contents
Understanding RAID Levels
RAID (Redundant Array of Independent Disks) combines multiple physical disks into logical units to improve performance, reliability, or both. In this chapter we focus specifically on how the common RAID levels work, their trade‑offs, and how to choose between them. Implementation tools (like mdadm, hardware RAID controllers, and filesystems with built‑in RAID such as Btrfs/ZFS) are discussed in other chapters; here we stick to concepts that apply across implementations.
We will assume all disks in an array are the same size $S$. When we say “usable capacity” we mean the space available for data, not counting redundancy.
RAID 0 — Striping (Performance, No Redundancy)
RAID 0 splits data into blocks and distributes (stripes) them across all disks.
- Minimum disks: 2
- Redundancy: None
- Usable capacity: $N \times S$ where $N$ is the number of disks
- Tolerated disk failures: 0 — one disk fails, the array is lost
Characteristics:
- Performance:
- Very high sequential read and write throughput (I/O can be parallelized across disks).
- Random I/O improves as well, especially with many disks.
- Reliability:
- Worse than a single disk: probability of array failure is the combined risk of all disks.
- Use cases (with backups and acceptable risk):
- Scratch space (e.g., video rendering, temporary data).
- High‑speed local caches.
- Non‑critical data that can be regenerated.
Key concept:
RAID 0 trades all redundancy for maximum performance and capacity. It is not a “safe” RAID level; backups become even more important.
RAID 1 — Mirroring (Redundancy, Simpler)
RAID 1 duplicates data across two or more disks. Each disk holds a full copy (mirror) of the data.
- Minimum disks: 2
- Redundancy: Full mirroring
- Usable capacity: $S$ (capacity of one disk, regardless of $N$)
- Tolerated disk failures: up to $N - 1$ (as long as at least one disk remains)
Characteristics:
- Performance:
- Reads: Can be faster; reads may be balanced across disks.
- Writes: Similar to a single disk (each write goes to all disks).
- Reliability:
- High — you can lose one disk (or more, if more than two are used) without data loss.
- Rebuild is conceptually simple: copy from a good mirror to a new disk.
- Use cases:
- System/boot disks.
- Small databases or services where simplicity and fast recovery matter.
- Environments where read performance and high availability are more important than capacity efficiency.
Capacity formula:
- With $N$ disks of size $S$:
- Usable: $S$
- Redundancy: $(N - 1) \times S$ (storage spent on extra copies)
RAID 5 — Striping with Distributed Parity (Balanced)
RAID 5 stripes data across disks, like RAID 0, but also stores parity information distributed across all disks. This parity allows reconstruction of data if a single disk fails.
- Minimum disks: 3
- Redundancy: Single‑disk parity
- Usable capacity: $(N - 1) \times S$
- Tolerated disk failures: 1
Parity concept (high level):
Parity is calculated over blocks on each stripe. For a stripe with data blocks $D_1, D_2, \dots, D_{N-1}$, parity $P$ is computed with XOR:
$$
P = D_1 \oplus D_2 \oplus \dots \oplus D_{N-1}
$$
If one block is lost (e.g., disk failure), it can be reconstructed:
$$
D_1 = P \oplus D_2 \oplus \dots \oplus D_{N-1}
$$
This is why RAID 5 can tolerate 1 disk failure.
Characteristics:
- Performance:
- Reads: Good, similar to RAID 0 across $N - 1$ data disks.
- Writes: Slower for small writes due to the read‑modify‑write cycle:
- Read old data block and old parity block.
- Compute new parity.
- Write new data and parity.
- Reliability:
- Can survive one disk failure.
- Vulnerable to a second disk failure during rebuild.
- Rebuild time on large disks can be long, increasing risk.
- Use cases (with caution):
- Read‑heavy workloads with moderate write intensity.
- Archival or bulk storage where capacity efficiency matters.
- Environments where some downtime during rebuild is acceptable.
Limitations and modern concerns:
- With large disks (e.g., multi‑TB), rebuilds take many hours or days.
- During rebuild, performance is degraded and risk is higher.
- Risk of an unrecoverable read error (URE) on another disk during rebuild increases with array size and disk size — this can lead to total array failure.
Because of this, RAID 5 is considered less suitable for very large arrays and high‑capacity disks.
RAID 6 — Striping with Double Distributed Parity
RAID 6 extends RAID 5 by storing two independent parity blocks per stripe, allowing survival of two simultaneous disk failures.
- Minimum disks: 4
- Redundancy: Double‑disk parity
- Usable capacity: $(N - 2) \times S$
- Tolerated disk failures: 2
Characteristics:
- Performance:
- Reads: Similar to RAID 5; often good.
- Writes: Slower than RAID 5; parity calculations and writes are more complex (two parities).
- Reliability:
- Can survive two disk failures, or one failure plus a URE on another disk during rebuild.
- Significantly safer than RAID 5 for large arrays and large disks.
- Use cases:
- Large arrays (many disks).
- High‑capacity disks (multi‑TB).
- Critical data storage where downtime and data loss are unacceptable, but capacity efficiency still matters.
Trade‑off:
You trade more usable capacity (losing 2 disks worth) and more write overhead for improved fault tolerance — often a good trade‑off for modern storage sizes.
RAID 10 — Mirrored Stripes (RAID 1+0)
RAID 10 (often written RAID 1+0) combines mirroring and striping:
- Disks are grouped into mirrored pairs (RAID 1).
- These mirrors are then striped (RAID 0) for performance.
- Minimum disks: 4 (2 mirrored pairs)
- Redundancy: Mirroring per pair
- Usable capacity: $(N / 2) \times S$ (assuming an even number of disks)
- Tolerated disk failures: Depends on which disks fail:
- At least 1 disk failure.
- Potentially more, as long as no mirrored pair loses all its members.
Failure patterns:
- If both disks in the same mirror fail → array fails.
- If one disk fails in each of multiple pairs → array can continue operating.
Characteristics:
- Performance:
- Very good random I/O performance (multiple mirrors can serve reads).
- Writes are faster than RAID 5/6 because no parity computation; just mirrored writes.
- Reliability:
- More resilient to disk failures than RAID 5, especially for random failure patterns.
- Rebuild is faster: only the failed disk’s mirror needs copying, not the whole stripe.
- Use cases:
- Databases and high‑I/O transactional systems.
- Virtualization hosts.
- Workloads needing both high performance and strong redundancy, without parity overhead.
RAID 10 vs RAID 5/6:
- RAID 10:
- Better write performance and rebuild characteristics.
- More predictable under high load.
- Lower capacity efficiency (50% usable).
- RAID 5/6:
- Better capacity efficiency.
- More complex failure modes and slower rebuilds.
Less Common / Special RAID Levels
These are less common or more vendor‑specific, but worth knowing conceptually.
RAID 2, 3, 4 (Rarely Used)
- RAID 2: Uses bit‑level striping and Hamming code error correction; essentially obsolete.
- RAID 3: Byte‑level striping with dedicated parity disk.
- RAID 4: Block‑level striping with a single dedicated parity disk.
- Bottleneck on the parity disk for writes.
- Conceptually simpler than RAID 5 but rarely used because RAID 5 distributes parity and removes that bottleneck.
In practice, you will mostly encounter RAID 0, 1, 5, 6, and 10.
Nested RAID Levels (RAID 0+1, 50, 60)
Nested (or hybrid) RAID levels layer one RAID type on top of another:
- RAID 0+1: The inverse of RAID 10:
- First create stripes, then mirror the stripes.
- Less resilient to certain failure patterns than RAID 10.
- RAID 50 (5+0):
- Multiple RAID 5 groups striped together.
- Improved performance and some resilience; can survive one disk loss in each RAID 5 group.
- RAID 60 (6+0):
- Multiple RAID 6 groups striped together.
- High tolerance for multiple failures, suitable for very large arrays.
These are more common in large storage appliances or hardware RAID setups than in simple Linux servers, but the concepts mirror the basic levels already discussed.
Comparing RAID Levels
Capacity Efficiency
With $N$ disks of size $S$:
- RAID 0: Usable $= N \times S$ (100% efficient, no redundancy)
- RAID 1: Usable $= S$ (for $N \ge 2$), efficiency $= \frac{1}{N}$
- RAID 5: Usable $= (N - 1) \times S$
- RAID 6: Usable $= (N - 2) \times S$
- RAID 10: Usable $= \frac{N}{2} \times S$ (for even $N$)
You can think of redundancy overhead for:
- RAID 1: $(N - 1)$ disks worth of redundancy.
- RAID 5: 1 disk worth.
- RAID 6: 2 disks worth.
- RAID 10: $N/2$ disks worth.
Fault Tolerance Summary
- RAID 0: 0 disk failures.
- RAID 1: Up to $N - 1$ disks, as long as 1 survives.
- RAID 5: 1 disk.
- RAID 6: 2 disks.
- RAID 10: Depends on failure pattern; at least 1, often more.
Performance Characteristics (High‑Level)
Very simplified guide (actual numbers depend on implementation, workload, and hardware):
- RAID 0:
- Reads: Excellent
- Writes: Excellent
- RAID 1:
- Reads: Good to excellent (can load‑balance)
- Writes: Similar to a single disk
- RAID 5:
- Reads: Good
- Writes: Moderate to poor for small writes (parity overhead)
- RAID 6:
- Reads: Good
- Writes: Slower than RAID 5 (double parity overhead)
- RAID 10:
- Reads: Excellent
- Writes: Good (no parity, just mirroring)
Choosing a RAID Level
Choosing the right level is a balance of performance, capacity, and fault tolerance:
- If performance is everything and data is non‑critical:
- RAID 0 (with good backups or ephemeral data).
- If you want simple redundancy, small number of disks:
- RAID 1.
- If you want capacity efficiency and can accept parity overhead:
- RAID 5 (small arrays, smaller disks, lower criticality).
- RAID 6 (large arrays, large disks, higher criticality).
- If you need high performance and strong redundancy:
- RAID 10 (especially for databases, virtualization, transactional workloads).
Remember:
- RAID is not a backup. It protects against disk failure, not accidental deletion, corruption, or disasters.
- Higher RAID levels do not replace a proper backup strategy.
RAID in Software vs Filesystem RAID
Linux can provide RAID using:
- Software RAID (e.g.,
mdadm) at the block device layer. - Hardware RAID controllers (presenting a logical volume to the OS).
- Filesystems with integrated RAID features (e.g., Btrfs, ZFS) that implement mirror/RAID‑Z‑like schemes at the filesystem level.
The fundamental RAID levels and trade‑offs described here apply conceptually to all of these, even if the implementation details differ. Subsequent chapters on Linux storage management will cover how to configure specific RAID types in practice.