9.1 Persistent storage concepts

Table of Contents

Why Persistent Storage Matters in OpenShift

By default, containers and pods in OpenShift use ephemeral storage: data is lost when the pod is deleted, rescheduled to another node, or the node fails. Persistent storage provides a way to keep data beyond the lifecycle of pods and nodes, which is essential for:

Databases (PostgreSQL, MySQL, MongoDB)
Message queues (Kafka, RabbitMQ)
File stores (shared volumes, content repositories)
Any application that must not lose state between restarts

Persistent storage in OpenShift is always about decoupling application lifecycle from data lifecycle.

Key idea:

Pods are ephemeral.
Persistent data must live outside pods, on storage that can be re-attached to new pods.

Ephemeral vs Persistent Storage

Ephemeral Storage

Characteristics:

Lives only as long as the pod or container.
Examples: emptyDir, container filesystem, configMap and secret volumes (for config, not long-term data).
Tied to a single node: if the pod moves to another node, data is lost.
Good for:

Caches
Temporary workspaces
Scratch space during processing jobs

Limitations:

Not suitable for stateful applications.
No guarantees about data survival after pod deletion or node restart.

Persistent Storage

Characteristics:

Data survives pod restarts and rescheduling.
Bound to a Persistent Volume in the cluster (conceptually) and backed by an underlying storage system.
Can often be reused by different pods over time.
Access modes define how many pods and where they can mount the volume (covered in other sections).

Use cases:

Application state (sessions, user-uploaded files)
Database data directories
Log retention (if logs must be kept beyond pod lifetime)

Core Concepts in Persistent Storage

OpenShift follows Kubernetes’ persistent storage model, built around a few key abstractions.

Storage as a Separate Concern

Applications in OpenShift do not talk directly to NFS, SAN, or cloud storage APIs. Instead, applications:

Request storage using a Persistent Volume Claim (PVC).
Mount the PVC into pods.
Read and write to the filesystem as usual.

The cluster administrator or storage platform:

Provides Persistent Volumes (PVs) and StorageClasses.
Integrates with external storage systems (NFS, iSCSI, block storage, cloud volumes, etc.).

This separation:

Keeps application manifests portable.
Allows the same app to run on different environments with different underlying storage.

Persistent Volume (PV) and Persistent Volume Claim (PVC) Model

Conceptually:

A Persistent Volume (PV) represents a piece of storage in the cluster:

It points to a specific storage backend location (e.g., an NFS export, a cloud disk).
It has a fixed capacity, access modes, and other attributes.

A Persistent Volume Claim (PVC) is a request for storage by a user or application:

It specifies the desired capacity, access mode, and sometimes a StorageClass.
When a suitable PV is found (or created), the PVC is bound to that PV.

From the application’s perspective:

You only care about the PVC name.
You mount the PVC into the pod spec.
The actual underlying storage details are abstracted away.

This claim-based model is central to how persistent storage is requested and consumed by workloads.

Access Modes and How They Shape Workloads

Persistent storage in OpenShift supports different access modes, which determine how pods can mount a given volume.

Common access modes you will encounter:

ReadWriteOnce (RWO)
Volume can be mounted as read-write by a single node.

Typical for block storage (e.g., cloud disks, SAN LUNs).
Common in single-instance databases.

ReadOnlyMany (ROX)
Volume can be mounted as read-only by many nodes.

Useful for shared, immutable data (e.g., binaries, reference datasets).

ReadWriteMany (RWX)
Volume can be mounted as read-write by many nodes simultaneously.

Common with shared file systems (e.g., NFS, GlusterFS, CephFS).
Useful for horizontally scaled applications where all replicas need shared access.

Why this matters:

The access mode influences your application design:

If your PVC is RWO, scaling to multiple replicas may require state to be partitioned or handled differently.
RWX volumes can simplify some multi-replica architectures but bring concurrency and locking considerations.

Performance, Durability, and Consistency

When thinking about persistent storage, three dimensions are especially important.

Performance Characteristics

Different backends offer different performance profiles:

Throughput (MB/s): How fast large sequential reads/writes can be done.
IOPS (I/O operations per second): Critical for transactional databases.
Latency: Round-trip time for a single I/O; low latency is important for many OLTP workloads.

Design implications:

Latency-sensitive applications (e.g., relational databases) often favor local or block storage with high IOPS.
Large analytics workloads may favor higher throughput.
Shared filesystems often trade some performance for convenience and shared access.

Durability and Availability

Durability levels depend on the underlying storage:

Local disks on a single node can be fast but may lose data if the node is lost.
Network-attached storage (NAS/SAN, distributed storage systems) usually offers data replication and higher durability.
Cloud providers may offer configurable replication and snapshot features.

Persistent storage concepts in OpenShift assume:

Your data must survive pod and node failures.
For higher levels of durability, you rely on:

Storage system replication.
Application-level replication (e.g., database clustering).
Backup and restore solutions (covered elsewhere).

Consistency and Concurrency

When multiple clients access the same data:

File-based shared storage (RWX) must handle:

File locking.
Concurrent writes.
Potential contention and performance degradation.

Block-based storage (RWO) typically expects a single writer:

Using it in multi-writer scenarios without proper coordination can corrupt data.

At the application level, you must:

Choose storage types that align with your concurrency model.
Use proper locking, transactions, or application protocols when multiple pods interact with the same data.

Pod Scheduling and Data Locality

Persistent storage affects where pods can run:

A pod that uses a PVC bound to a specific PV may only be able to run on nodes that can access that PV.
For some storage types (like local persistent volumes), the PV is physically tied to one node.

Conceptually:

The scheduler must respect storage constraints in addition to CPU, memory, and other resource considerations.
This can affect:

Availability: if a node becomes unavailable and the data is local to it.
Scaling: if only a subset of nodes can access the storage system.

As a result:

Storage decisions can indirectly constrain application placement and scalability.
For high availability, it is often better to use storage that is accessible from multiple nodes.

Persistent Storage and Application Design Patterns

Persistent storage can change how you design applications:

Stateless services:

Do not rely on persistent storage (or use it minimally).
Easier to scale horizontally; pods are interchangeable.

Stateful services:

Rely on persistent storage for correctness.
Often use:

One PVC per instance (e.g., each database replica has its own volume).
Stateful constructs that track which pod owns which volume.

Key patterns:

One-to-one mapping between a pod and a PVC for dedicated storage.
Shared RWX storage for:

Shared assets (web content, shared uploads).
Results aggregation from multiple workers.

Using persistent storage mainly for state, while keeping config and secrets separate and immutable.

Lifecycle of Persistent Data vs. Application

A central concept is that data often has a longer lifecycle than the application instances:

Pods can come and go as part of:

Scaling up/down.
Rolling updates.
Node maintenance.

Persistent volumes are expected to:

Outlive individual pods.
Be reused by new instances of the same application.

Typical lifecycle pattern:

An application (Deployment, StatefulSet, etc.) declares a PVC.
The PVC is bound to a PV (existing or dynamically provisioned).
The pod mounts the PVC and writes data.
The pod is deleted or rescheduled, but the PVC/PV (and data) remain.
A new pod mounts the same PVC, continuing from the existing data.

This separation of compute lifecycle from data lifecycle is a foundational concept for running stateful workloads in OpenShift.

Security and Multi-Tenancy Considerations (Conceptual)

At a conceptual level, persistent storage interacts with OpenShift’s security and multi-tenancy model:

Volumes are namespaced through PVCs:

A PVC belongs to a namespace (project).
This provides isolation at the claim level.

Underlying storage should:

Enforce access control (no cross-tenant data leakage).
Support encryption at rest and in transit where appropriate.

Storage policies can be used to:

Restrict which types of storage different projects can use.
Map different performance or compliance requirements (e.g., “gold” vs “silver”).

Understanding this helps you choose the right storage type and configuration for different workload classes (development vs production, test vs regulated data, etc.).

Summary of Key Persistent Storage Concepts

Pods are ephemeral; storage is not: persistent storage preserves data beyond pod and node lifecycles.
PVC/PV abstraction decouples applications from underlying storage infrastructure.
Access modes (RWO, ROX, RWX) shape how many pods can mount a volume and from where.
Performance, durability, and consistency vary by backend and are critical for workload design.
Storage influences scheduling and availability: pods must be placed where volumes are accessible.
Persistent storage is central to designing and operating stateful applications on OpenShift, while still leveraging cloud-native patterns like orchestration, scaling, and self-healing.

Comments

Please login to add a comment.

Don't have an account? Register now!