9 Storage in OpenShift

Table of Contents

Why Storage Matters in OpenShift

OpenShift is designed to run both stateless and stateful applications. Stateless workloads can be restarted anywhere without caring about previous data. Stateful workloads—databases, message queues, analytics tools, file-processing apps—need their data to survive pod restarts, rescheduling, and even node failures.

Storage in OpenShift provides:

Durability: Data outlives pods and sometimes even clusters.
Availability: Data can be accessed by pods regardless of which node they run on (within defined constraints).
Consistency: The right read/write semantics for the application (single-writer vs multi-writer, block vs file, etc.).
Abstraction: Kubernetes-style APIs that hide vendor-specific details behind a common model.

Subsequent subsections (Persistent Volumes, Persistent Volume Claims, StorageClasses, etc.) explain the objects and mechanisms; this chapter focuses on how these concepts are used together in OpenShift and the key patterns and trade-offs.

Storage Types and Access Modes

OpenShift builds on Kubernetes storage primitives but integrates them with Red Hat’s platform tooling, Operators, and ecosystem. At a high level, storage is categorized by how it is accessed and what it’s good for.

Ephemeral vs Persistent Storage

Ephemeral storage

Tied to the pod or the node.
Examples: emptyDir, container filesystem (/), tmpfs.
Lost when the pod is deleted or rescheduled to another node, or when the node is rebuilt.
Suitable for caches, scratch space, temporary processing.

Persistent storage

Represented by Persistent Volumes (PV) and Persistent Volume Claims (PVC).
Lives independently of the pod lifecycle.
Can move between pods according to rules defined by the underlying storage backend and access modes.
Suitable for databases, shared file repositories, long-lived logs, etc.

OpenShift applications that need durability in production almost always use persistent storage, not ephemeral, for critical data.

Access Modes

Each persistent volume in OpenShift advertises one or more access modes, which dictate how pods can attach to that volume:

ReadWriteOnce (RWO)

Mounted as read/write by a single node at any given time.
Common for block or simple network-attached storage (e.g., most cloud disks).
Typical for single-instance databases or applications that don’t need to share their data disk simultaneously.

ReadOnlyMany (ROX)

Mounted as read-only by many nodes.
Useful for distributing static content (e.g., binaries, reference datasets) across multiple pods without allowing writes.

ReadWriteMany (RWX)

Mounted as read/write by many nodes.
Requires a shared filesystem (e.g., NFS, CephFS, some cloud-managed shared file systems).
Useful for workloads that truly share files (CMS, shared repos, user home directories in workbench environments).

Not every storage backend supports all modes. When designing storage in OpenShift, the chosen backend must match the access patterns of the application.

Storage Backends in OpenShift

OpenShift does not provide “raw disks” by itself; instead, it integrates with storage systems through:

CSI (Container Storage Interface) drivers – modern standard for external storage systems.
In-tree (legacy) plugins – older mechanism, being phased out in favor of CSI.
Platform-native solutions – such as OpenShift Data Foundation (Red Hat Ceph-based), cloud-provider disks and file shares, or on-premises SAN/NAS.

Typical categories:

Block storage

Presents as a raw block device.
Mounted as a filesystem inside the pod.
Good for databases and latency-sensitive workloads.
Usually RWO only.

File storage

Provides a shared filesystem (NFS, CephFS, SMB-like services).
Can support RWX and ROX.
Good for shared documents, code, user home dirs, or apps that rely on POSIX semantics.

Object storage

Accessed via HTTP/HTTPS (e.g., S3-compatible).
Not mounted as a traditional volume; usually accessed via application libraries.
Used for backups, large datasets, logs, or artifacts, but not represented as PV/PVC in the same way.

In practice, OpenShift clusters often use a mixture of these, selecting the right one per workload.

Storage Integration and Operators

OpenShift’s “Operator” model (covered in a separate chapter) is heavily used in storage:

Storage Operators (for example, OpenShift Data Foundation Operator, cloud vendor storage Operators) automate:

Installation and configuration of storage backends.
Exposure of CSI drivers and StorageClasses.
Health checks, scaling, and failure recovery of the storage layer.

Application Operators (e.g., database Operators) often:

Create PVCs automatically with appropriate size and access modes.
Handle backup/restore to object storage.
Manage data layout (e.g., separate PVCs for data, logs, WAL files).

This means that in OpenShift, storage is rarely just a manual PV/PVC pairing; it’s usually part of a larger managed pattern controlled via Operators.

Storage and Pod Scheduling

Storage in OpenShift interacts with scheduling and placement decisions:

Node affinity via storage

A PVC bound to a PV that exists in a given zone or on specific nodes influences where the pod can run.
For RWO volumes, the pod must be scheduled onto a node that can attach that volume (e.g., same availability zone).

Topology-aware storage

Many CSI drivers and StorageClasses support topology constraints, ensuring volumes are provisioned in specific zones/regions.
Prevents cross-zone latency and attachment issues.

Pod rescheduling and volume attachment

If a node fails, the control plane may reschedule pods using RWO volumes to another node.
The storage backend must support detaching the volume from the old node and attaching to the new one.
Some storage backends have a delay or manual step here; others do this automatically.

When designing storage, it’s crucial to understand these scheduling implications, especially in multi-zone clusters.

Security and Isolation for Storage

OpenShift’s security model affects how storage is used:

Security Context Constraints (SCCs) and SELinux

Control what a pod can do with volumes (e.g., run as root, mount hostPath).
SELinux labels ensure that data written by one pod is not accessible from another unless explicitly allowed.
Many StorageClasses or CSI drivers support applying SELinux labels suitable for OpenShift’s default security posture.

Encryption

Encryption at rest can be provided at:

Storage backend level (e.g., encrypted disks).
Filesystem level.

Encryption in transit is often provided by the storage protocol (e.g., TLS on object store, encrypted backends for Ceph).
OpenShift itself does not encrypt application data automatically, but integrates with storage solutions that do.

Multi-tenancy

Namespaces and quotas (covered elsewhere) are used with PVCs to constrain storage usage per team/project.
StorageClasses can be restricted to certain namespaces or bound with RBAC to limit who can request specific storage types.

Building secure multi-tenant storage in OpenShift is about combining these platform features with backend configuration.

Data Protection: Backup, Snapshots, and Disaster Recovery

Data protection in OpenShift relies on features of the storage backend plus orchestrated workflows:

Snapshots

CSI snapshots let you capture point-in-time copies of a PVC.
Useful for quick restore, testing, or cloning environments (e.g., spinning up dev from prod data).
Availability depends on the CSI driver and storage backend.

Backups

Backups usually copy data to separate storage (often object storage).
Application-aware backups may quiesce databases, flush caches, and then snapshot or copy data.
Platform-level tools (including Operators or external backup solutions) can coordinate PVC backups along with Kubernetes objects.

Disaster Recovery (DR)

Cross-cluster or cross-region replication is often implemented through storage-system replication (e.g., Ceph replication, cloud block storage replication).
Higher-level tools can manage failover, ensuring that PVCs and application objects are recreated with their data in the target cluster.

While OpenShift itself doesn’t perform all these tasks, it exposes objects (PVCs, StorageClasses) that are orchestrated by backup/DR systems.

Performance and Capacity Considerations

Storage can be the main bottleneck for many OpenShift applications. Useful considerations:

IOPS and throughput

Different StorageClasses map to different performance tiers (e.g., “fast-ssd”, “throughput-optimized”, “archive”).
Applications with high IO demands (databases, analytics) may need dedicated or higher-performance classes.

Latency

Latency-sensitive apps usually prefer local or low-latency network storage.
Cross-zone or cross-region storage access can significantly slow down applications.

Capacity planning

PVCs specify requested size; cluster admins must ensure the backing storage pool has enough capacity.
Quotas can control how much total storage a namespace can use.

Contention and noisy neighbors

Multiple high-IO workloads on the same shared backend can impact each other.
StorageClasses and backend configuration can isolate workloads into different pools or QoS tiers.

Monitoring tools (covered in another chapter) are essential to watch storage performance metrics and adapt.

Common Storage Patterns in OpenShift

Several common patterns appear repeatedly in OpenShift deployments:

Single-instance database with RWO volume

Database pod uses one PVC with RWO.
Simple and reliable set-up for MySQL/PostgreSQL when HA is handled outside the DB (e.g., app-level retries).

StatefulSets with per-pod volumes

Each replica gets its own PVC.
Useful for sharded or partitioned services, message queues, or replicated databases.
Names and identities are stable (e.g., mydb-0, mydb-1), making stateful operations easier.

Shared content via RWX file storage

Multiple pods share the same RWX volume for common files (assets, configuration, user home directories).
Often backed by NFS- or CephFS-based StorageClasses.

Hybrid object + PV design

Application uses PVCs for hot, frequently accessed data.
Archived or less-frequently accessed data is stored in object storage.
Backups and large artifacts flow to object storage, even though the app runs on PV-backed pods.

Understanding these patterns helps in selecting the right primitives when you design or review an OpenShift architecture.

Operational Responsibilities and Roles

Storage in OpenShift crosses traditional boundaries between teams:

Cluster/platform administrators

Install and maintain storage backends and their Operators.
Define and manage StorageClasses (tiers, performance, reclaim policies).
Monitor capacity and performance.

Application teams

Decide what kind of storage their application actually needs (size, access mode, performance).
Request PVCs using the appropriate StorageClass.
Implement application-level backups and restore tests when necessary.

Security/compliance teams

Ensure encryption, retention, and access controls meet policy requirements.
Audit storage usage and data flows (e.g., where backups go).

OpenShift’s storage model is powerful because it clearly separates concerns: application teams talk mainly to PVCs and StorageClasses, while platform teams manage the underlying complexity.

Summary

Storage in OpenShift ties together Kubernetes primitives (PVs, PVCs, StorageClasses) with:

Multiple storage backends (block, file, object).
Operators that automate provisioning and lifecycle.
Security, scheduling, and performance considerations.
Data protection and DR capabilities.

Subsequent subsections dive into the concrete objects—Persistent Volumes, Claims, StorageClasses, dynamic provisioning, and workloads that make use of them—so you can move from conceptual understanding to hands-on usage in real OpenShift clusters.

9.1 Persistent storage concepts

9.2 Persistent Volumes

9.3 Persistent Volume Claims

9.4 StorageClasses

9.5 Dynamic provisioning

9.6 Stateful applications