9.2 Persistent Volumes

Table of Contents

Concept and Role of Persistent Volumes

In OpenShift, a PersistentVolume (PV) is a cluster-scoped storage resource that represents a piece of storage in the cluster. It is:

Created and managed by cluster administrators (or by a storage provisioner/Operator), not by application developers.
Independent from any particular Pod, Deployment, or Namespace.
A bridge between underlying storage systems (NFS, iSCSI, cloud disks, CSI drivers, etc.) and Kubernetes/OpenShift workloads.

Conceptually, a PV is to storage what a Node is to compute: a resource that can be consumed but is not owned by a specific application.

Key properties of PVs:

Cluster-wide object (not namespaced).
Lifecycle separate from individual Pods.
Backed by some physical or virtual storage.
Bound later to a PersistentVolumeClaim (PVC) created by users.

PersistentVolume Object Structure

A PersistentVolume is defined as a Kubernetes resource with apiVersion, kind, metadata, spec, and status. The most important fields in spec are:

capacity
accessModes
volumeMode
storageClassName
persistentVolumeReclaimPolicy
mountOptions
Backend-specific configuration (e.g., nfs, hostPath, awsElasticBlockStore, csi, etc.)

Example (static NFS-backed PV):

apiVersion: v1
kind: PersistentVolume
metadata:
  name: pv-nfs-example
spec:
  capacity:
    storage: 20Gi
  accessModes:
  - ReadWriteMany
  persistentVolumeReclaimPolicy: Retain
  storageClassName: nfs-manual
  nfs:
    server: nfs-server.example.com
    path: /exports/data

This illustrates that:

The PV advertises 20Gi of writable storage.
Multiple Pods can mount it (ReadWriteMany).
It uses an NFS server as the underlying storage.
It’s associated with a specific storageClassName so that PVCs can target it.

Access Modes

Access modes describe how a PV can be mounted by Pods. Common modes you’ll see in OpenShift:

ReadWriteOnce (RWO):
Mounted as read-write by a single node at a time (typical for block volumes like cloud disks).
ReadOnlyMany (ROX):
Mounted read-only by many nodes (useful for shared reference data).
ReadWriteMany (RWX):
Mounted as read-write by many nodes simultaneously (common with NFS, some CSI drivers, or shared file systems).

Which modes are available depends on the storage backend and driver. For example, many cloud block volumes only support ReadWriteOnce, whereas shared file systems or file-based CSI drivers may support ReadWriteMany.

Volume Modes: Filesystem vs Block

PVs can expose storage to Pods in two main modes:

volumeMode: Filesystem (default)

The volume is formatted with a filesystem and mounted into the Pod.
Pods see a normal directory tree; most typical workloads use this.

volumeMode: Block

The volume is presented as a raw block device.
Advanced workloads (databases, some HPC or storage engines) can manage their own filesystem or data layout on top.

Example snippet:

volumeMode: Block
accessModes:
- ReadWriteOnce
capacity:
  storage: 100Gi

In OpenShift, not every storage backend supports Block mode. You must ensure that the underlying storage and CSI driver support it.

Static vs Dynamic PV Provisioning

OpenShift clusters can expose PVs using two patterns:

Static Provisioning

With static provisioning, administrators pre-create PersistentVolumes that refer to existing storage.

Characteristics:

Admin manually allocates and configures storage on the backend (e.g., creates an NFS export, a LUN, etc.).
Admin creates a PV object pointing to that storage.
Users create PVCs that match those PVs (by size, access mode, storageClassName, etc.).
PVs are finite and explicitly managed; you see the specific PV names and their backing details.

Static provisioning is useful when:

Storage comes from legacy systems or non-automated infrastructure.
You need very specific or curated allocations (e.g., shared project storage, pre-populated datasets).

Dynamic Provisioning (via StorageClasses)

Dynamic provisioning is usually preferred in OpenShift:

Users create PVCs requesting a size and a storageClassName.
A provisioner (often via a CSI driver or built-in plugin) automatically creates the underlying storage and the PV.
Users do not need to know backend details like NFS paths or device names.

In dynamic provisioning:

PVs are created on demand.
PVs usually have opaque auto-generated names.
The lifecycle of the storage is tightly coupled with PVC and the reclaim policy.

From the PV perspective, dynamic provisioning means:

The PV is created automatically after a PVC appears.
The PV’s spec is generated based on the StorageClass parameters.

Binding: Matching PVs and PVCs

PersistentVolumes are not used directly by Pods. Instead, Pods use PVCs, which the cluster matches to PVs. The binding is based on:

storageClassName: Must match between PV and PVC (or both left empty).
accessModes: PV must support all requested modes.
capacity: PV capacity must be >= PVC requested size.
volumeMode: Must match.
Any additional selector constraints on the PVC (labels, etc.).

Binding behavior:

A PV can be bound to exactly one PVC at a time.
Once bound, the PV’s claimRef points to the PVC.
The binding is typically one-way: a PV is not re-bound to another PVC unless it is first released and becomes Available again.

Administrators should ensure there is a reasonable pool or mechanism (dynamic provisioning) so legitimate PVCs find matching PVs.

Reclaim Policies

The persistentVolumeReclaimPolicy determines what happens to the underlying storage when the PVC is deleted. Common policies:

Delete

When the PVC is deleted, the PV and the underlying storage asset are removed.
Typical for dynamically provisioned cloud volumes or CSI-managed disks.
Simpler lifecycle: storage is cleaned up automatically.

Retain

When the PVC is deleted, the PV moves into Released state.
The underlying storage is not automatically deleted.
Administrators must manually handle the reuse or deletion of the actual storage (and often the PV object).
Useful to prevent accidental data loss or for data archival.

Recycle (legacy)

Previously supported, where volumes would be scrubbed and made available again.
Deprecated and generally not used in modern OpenShift setups.

Choosing a policy:

For user-managed or critical data where accidental deletion is risky, Retain is safer.
For ephemeral or non-critical data, Delete provides better automation and fewer “orphaned” resources.

Lifecycle States of a PersistentVolume

PVs move through several phases:

Available:
The PV is ready and has not yet been bound to a claim.
Bound:
The PV is attached to a specific PVC. Only that PVC can use it.
Released:
The PVC has been deleted, but the PV’s underlying storage may still contain data.

With Retain, this is a signal to administrators to clean up or reassign.
With Delete, the storage is often already removed, and PV may disappear.

Failed:
The PV has encountered an error and is not usable in its current state. Admin action is required.

Understanding these phases helps operators track storage usage and cleanup needs in an OpenShift cluster.

Common PersistentVolume Backends in OpenShift

While the exact options depend on your environment and installed Operators/CSI drivers, typical PV backends include:

NFS:

Simple and widely used shared filesystem.
Often supports ReadWriteMany.
Defined in PVs via nfs: configuration.

Cloud provider volumes (e.g., AWS EBS, Azure Disk, GCE PD):

Usually block storage; often ReadWriteOnce.
Typically provisioned dynamically via StorageClasses and CSI drivers.

Shared filesystems and appliances (e.g., CephFS, GlusterFS legacy, vendor NAS systems):

Often provide ReadWriteMany and large capacity.
Common in enterprise OpenShift deployments via CSI Operators.

Local volumes:

Disk or directory on a specific node.
Can offer high performance but are node-bound; failure or eviction of that node can impact availability.

Specialized CSI drivers:

For backup appliances, object gateways (via translation layers), or HPC file systems.

From the PV perspective, each backend is configured in a backend-specific stanza (nfs, csi, etc.), but all appear as PersistentVolume objects to users and workloads.

Security and Access Control Considerations

While detailed security topics are covered elsewhere, a few PV-specific concerns matter in OpenShift:

File permissions and ownership:

PVs often preserve permissions across Pods.
Some storage providers integrate with OpenShift’s Security Context Constraints (SCCs) to adjust ownership automatically.
Misconfigured permissions can prevent containers from writing to the volume.

Shared volumes:

RWX volumes mean multiple Pods (often from different Deployments) can write simultaneously.
Misuse can lead to data corruption or accidental overwrites without additional coordination at the application layer.

Network-attached storage:

When using NFS or similar, network and firewall configuration affects availability and performance.
Export-level permissions (e.g., allowed clients, root squash) must align with cluster needs.

Administrators should verify that PV configuration matches both security policies and workload requirements.

Operational Practices for Persistent Volumes

From an OpenShift operations perspective, some recurring tasks around PVs include:

Capacity planning

Monitoring how many PVs are allocated and how much space they represent on backend systems.
Ensuring StorageClasses and provisioners have enough capacity to serve new PVCs.

Cleaning up Released volumes

For Retain policy, periodically identifying Released PVs and confirming whether the backing storage should be:

Reclaimed (securely wiped and reused).
Archived.
Permanently deleted.

Standardizing with StorageClasses

Using StorageClasses with clear naming (e.g., fast, standard, archive) so PVs and their backing capabilities (performance, redundancy) are predictable.
Ensuring dynamically provisioned PVs inherit appropriate reclaim policies and parameters.

Troubleshooting binding issues

When PVCs remain in Pending, verifying:

That PVs with matching storageClassName and sufficient capacity exist or can be dynamically created.
That access modes and volume modes align.
That any label selectors are correct.

By treating PVs as managed infrastructure resources rather than ad hoc directories or disks, OpenShift provides a predictable and auditable storage layer for stateful applications.

Summary

PersistentVolumes in OpenShift:

Represent cluster-level storage resources, independent of Pods and Namespaces.
Provide a unified abstraction over many different storage backends.
Are matched to user-facing PersistentVolumeClaims through access modes, capacity, volume mode, and storage classes.
Have reclaim policies that control the fate of the underlying storage after PVC deletion.
Require thoughtful operational management for capacity, security, and lifecycle.

Subsequent chapters will focus on how applications request these PVs using PersistentVolumeClaims and how StorageClasses and dynamic provisioning simplify day-to-day usage.

Comments

Please login to add a comment.

Don't have an account? Register now!