9.6 Stateful applications

Understanding Stateful Applications in OpenShift

Stateful applications are those where the data and identity of the application instances matter over time. In OpenShift, this means paying special attention to how pods are named, how storage is attached, and how data survives failures, reschedules, and upgrades.

This chapter focuses on how OpenShift supports stateful workloads on top of the storage primitives already introduced elsewhere (PersistentVolumes, PersistentVolumeClaims, StorageClasses, and dynamic provisioning).

What Makes an Application Stateful?

A workload is typically considered stateful if one or more of these apply:

It stores data that must persist beyond the life of a pod (databases, message queues, file stores).
Individual instances (replicas) are not interchangeable; each has an identity (e.g., db-0, db-1, db-2) and may own specific data.
The application maintains in‑memory or on‑disk state that is important for the correctness of the system (leader elections, consensus logs, checkpoints).
It needs stable network identities or well-known endpoints for peer-to-peer communication.

In contrast to stateless apps, you cannot safely scale or restart stateful apps without considering the impact on data consistency and availability.

Challenges of Running Stateful Workloads on OpenShift

Containers and Kubernetes/OpenShift were originally focused on stateless microservices. Stateful applications introduce additional challenges:

Pod lifecycle vs. data lifecycle: Pods are ephemeral; data must not be.
Scheduling constraints: Data may live on specific nodes or in specific zones; pods might need to follow that data.
Ordering and identity: Some distributed systems require that certain replicas start first, or that nodes keep a stable name over time.
Consistency and durability: Storage performance characteristics (latency, IOPS, durability guarantees) directly affect application behavior.
Backups and restore: You must treat persistent volumes as part of your backup strategy, not just application images.

OpenShift provides patterns and APIs to address these challenges while keeping deployment as declarative as possible.

StatefulSets: Core Pattern for Stateful Workloads

For stateful workloads, the primary OpenShift/Kubernetes workload resource is the StatefulSet. While Deployments/DeploymentConfigs are suitable for stateless applications, StatefulSets add features specifically for stateful scenarios.

Key properties of a StatefulSet:

Stable, ordinal pod identity
Pods are named predictably, e.g., my-db-0, my-db-1, my-db-2. This ordinal index is preserved across restarts and reschedules.
Persistent volume association per replica
Each replica can have its own persistent volume claim, typically using volumeClaimTemplates. The my-db-0 pod always mounts the same PVC, even if it is rescheduled to another node.
Ordered operations
By default, StatefulSets support ordered pod creation, updates, and termination. For example:

Create my-db-0 first, then my-db-1, then my-db-2.
Terminate my-db-2 first, then my-db-1, then my-db-0.
Some stateful systems rely on this for safe cluster bootstrapping and rolling updates.

Stable network identity
Combined with a Headless Service (Service without a cluster IP), each pod gets a stable DNS name such as my-db-0.my-db-headless.my-namespace.svc.

In OpenShift, you create StatefulSets using the same YAML model as Kubernetes, and they integrate with the storage model (PVs, PVCs, StorageClasses) you already learned.

Typical StatefulSet Layout

A minimal pattern for a stateful application looks like this (conceptually):

A StatefulSet with:

A spec.serviceName referencing a headless service for stable DNS.
A spec.volumeClaimTemplates section so that each replica gets its own PVC.
A spec.replicas defining the number of instances.

A Service with clusterIP: None (headless) to provide per-pod DNS.

OpenShift then:

Creates and attaches PVCs from the template, usually using dynamic provisioning.
Ensures that pod-0 always uses pvc-0, pod-1 uses pvc-1, and so forth.
Preserves these relationships across pod restarts and node changes.

Storage Design for Stateful Applications

Because stateful apps rely heavily on storage, your choice of storage backing and configuration is critical.

Access Modes and Their Impact

Different applications have different expectations about how their volumes can be used:

RWO (ReadWriteOnce): Only one node can mount the volume for read-write at a time.

Common for single-node databases like postgres, mysql.
Works well when each replica has its own volume.

RWX (ReadWriteMany): Multiple nodes can mount the volume read-write simultaneously.

Useful for shared file stores, content repositories, or apps requiring shared storage.
Often backed by NFS, CephFS, or similar.

ROX (ReadOnlyMany): Multiple nodes can mount read-only.

Less common for typical application data, more for shared reference data.

When defining volumeClaimTemplates in StatefulSets, ensure the requested accessModes match the application design and the capabilities of the underlying StorageClass.

Performance and Consistency Considerations

For stateful workloads, you should match the storage to the workload characteristics:

Databases: Usually need low-latency, high-IOPS, durable storage. Use storage classes designed for database workloads.
Message queues: Benefit from fast writes; evaluate throughput and durability trade-offs.
Analytics and batch: Often use large volumes; throughput may matter more than latency.

On OpenShift, this often means:

Choosing specialized StorageClasses (e.g., “fast-ssd”, “gp2/gp3”, “replicated”).
Tuning resources.requests and resources.limits on pods so the storage backend is not overloaded.
Understanding the replication and failure behavior of the underlying storage (e.g., Ceph, cloud block storage).

Pod Placement and Data Locality

For some stateful systems, where data is stored locally on nodes or where network topology matters, pod scheduling is important.

Common techniques:

Node affinity / anti-affinity
Ensure certain replicas run on different nodes or zones for availability, or close to other services for performance.
Pod anti-affinity
Spread replicas across nodes to avoid a single node failure impacting all instances.
Topology-aware provisioning
Use storage classes that are aware of zones/regions, so volumes and pods are co-located to avoid cross-zone latency penalties.

In OpenShift, these are configured via the standard affinity and topologySpreadConstraints fields on StatefulSet pods, combined with storage classes that support topology.

Patterns for Common Stateful Applications

Relational Databases (MySQL/PostgreSQL)

Common pattern:

StatefulSet with 1 or a small number of replicas (master/primary and read replicas).
One PVC per replica backed by RWO block storage.
Application-specific replication (e.g., streaming replication) configured via environment variables and init scripts.
A Service pointing to the primary instance, possibly supplemented by Services for read replicas.

NoSQL Stores (Cassandra, MongoDB, etc.)

Typical needs:

Multiple replicas for data distribution.
Each replica with its own PVC and stable identity (e.g. cassandra-0, cassandra-1).
Peer discovery via headless services and the stable DNS names that StatefulSets provide.
Explicit handling of scaling up/down to avoid data loss or cluster imbalance.

File Services and Shared Storage

For workloads requiring shared file access:

Use RWX volumes (via appropriate StorageClasses).
Sometimes Deployment or DeploymentConfig is sufficient if the app does not need per-instance identity.
For applications combining shared and per-instance data, mix RWX and RWO volumes as needed.

Backup, Restore, and Data Protection

Stateful workloads require a strategy beyond just backing up container images and manifests.

Key aspects:

Volume snapshots
Some storage backends support PVC snapshots via VolumeSnapshot resources. Use them to capture point-in-time copies of data volumes.
Application-consistent backups
For databases and similar systems, coordinate snapshots with the application:

Flush or pause writes before snapshot.
Use native backup tools (e.g., pg_dump, mysqldump, logical backups) when consistency guarantees are required.

Disaster recovery strategies
Common patterns on OpenShift:

Periodic export of application data to object storage (e.g., S3, internal object store).
Replication between clusters or regions, if the storage solution supports it.

Testing restores
For stateful apps, regularly test restore procedures in a separate namespace or test cluster to ensure that both data and configuration restore correctly.

Scaling and Upgrades for Stateful Workloads

Scaling stateful apps is more complex than stateless ones.

Horizontal Scaling

When increasing replicas:

New pods will get new PVCs (for StatefulSets with volumeClaimTemplates).
The application may need explicit rebalancing (e.g., data redistribution in Cassandra, Kafka).
Some systems require configuration changes or cluster membership updates.

When decreasing replicas:

Removing replicas may leave behind PVCs, which is by design for data safety.
You must decide whether and when to delete these PVCs; automatic deletion can cause irreversible data loss.

Rolling Updates

For stateful workloads:

Ensure the update strategy is compatible with the application’s high-availability model.
Many StatefulSets use RollingUpdate strategy with ordered updates. This can:

Update pod-0 first, verify health, then pod-1, etc.
Minimize the risk to the entire cluster.

Respect application-specific steps:

Some databases require failover or leader changes before upgrading.
Quorum-based systems must maintain a majority during updates.

Operating Stateful Apps on OpenShift

Running stateful applications successfully is as much about operational practices as it is about YAML definitions.

Recommended practices:

Isolate stateful workloads
Use dedicated namespaces and potentially dedicated nodes or node pools, especially for I/O-intensive workloads.
Monitor storage-specific metrics
Pay attention to:

Volume latency and throughput.
Node disk pressure and filesystem health.
Storage backend capacity and performance.

Define clear ownership
Clarify who manages:

Application configuration and replication.
Storage class selection and capacity planning.
Backup and restore procedures.

Use Operators where available
For complex stateful systems (databases, message queues, data platforms), Operators can:

Automate cluster creation, scaling, and upgrades.
Encode safe operational procedures.
Integrate backup/restore workflows with OpenShift primitives.

When Not to Use StatefulSets

Not all applications that touch storage are truly stateful in the cluster sense. You might not need a StatefulSet if:

All replicas are identical and interchangeable, and shared storage (e.g., RWX volume) is sufficient.
The application’s state is entirely external (e.g., uses a managed database service outside the cluster).
You only need ephemeral storage for caching or temporary data; emptyDir or ephemeral volumes are enough.

In these cases, a Deployment or DeploymentConfig plus appropriate volumes may be simpler and more appropriate.

Summary

Stateful applications on OpenShift rely on:

StatefulSets to provide stable identities and ordered management.
PersistentVolumes and PVCs to persist data across pod and node lifecycles.
Appropriate StorageClasses and access modes to meet performance and semantics needs.
Operational practices—backup, restore, scaling, and upgrades—tailored to the specifics of each stateful system.

Understanding these patterns allows you to confidently design and run databases, queues, and other data-intensive workloads on OpenShift in a reliable and maintainable way.

Comments

Please login to add a comment.

Don't have an account? Register now!