11 Scaling and High Availability

Table of Contents

Key Concepts and Goals

Scaling and high availability (HA) in OpenShift are about answering three questions:

How many instances of my application do I need right now? (scaling)
What happens when something fails? (resilience and self‑healing)
How do I keep my applications available across failures and load spikes? (high availability)

This chapter gives the conceptual and practical foundation for how OpenShift helps you:

Scale applications up and down.
Recover automatically from many types of failures.
Design clusters and applications that tolerate node, zone, or even region outages.

The later subsections in this part of the course will go deeper into individual mechanisms (like horizontal pod autoscaling and multi‑zone clusters). Here, you’ll learn how all the pieces fit together.

Types of Scaling in OpenShift

Scaling in OpenShift generally happens at two levels:

Application (pod) scaling: how many instances (pods) of a workload run.
Cluster (node) scaling: how many worker nodes exist to host those pods.

Within applications, you will see three main patterns (each covered in its own subsection later):

Horizontal scaling (scale out / in)

Change the number of pod replicas.
Typical for stateless and many cloud‑native applications.
Implemented via Deployment, DeploymentConfig, StatefulSets (for stateful apps), and autoscalers.

Vertical scaling (scale up / down)

Change the resources per pod (CPU and memory requests/limits).
Helpful when a single instance needs more capacity (e.g., more RAM for a JVM).

Cluster scaling

Adding or removing worker nodes (manually or automatically via machine APIs and cloud integrations).
Must be coordinated with pod scaling so there is capacity to run the desired number of pods.

In practice, effective scaling usually combines:

Horizontal pod scaling for elasticity and resilience.
Reasonable vertical sizing to avoid wasted resources or resource starvation.
Node scaling so the cluster has enough capacity to host all scaled workloads.

Key Building Blocks for Scaling and HA

Several OpenShift/Kubernetes primitives work together to provide scaling and HA. At this point in the course you’ve seen the basics; here we’ll focus on how they contribute to resilience and elastic capacity.

Controllers: Ensuring Desired State

Controllers continuously reconcile actual cluster state to the desired state specified in manifests. For scaling and HA, the most important are:

Deployment / DeploymentConfig controllers

Maintain the requested number of replicas (spec.replicas) for stateless applications.
Handle rollout and rollback while preserving availability.

StatefulSet controller

Used for ordered, uniquely identified pods (e.g., databases, queues).
Balances some HA features with state and identity requirements.

DaemonSet controller

Ensures one (or more) pod per node; important for cluster‑level services like logging agents.
Helps keep cluster‑level functionality available as nodes come and go.

These controllers, by design, give you self‑healing behavior: if a pod dies, the controller creates a replacement automatically.

Services and Load Balancing

A Service in Kubernetes/OpenShift provides a stable virtual IP and DNS name for a set of pods. From a scaling and HA perspective, Services:

Distribute traffic across all ready pod endpoints (basic load balancing).
Hide individual pod failures or rescheduling events from clients.
Provide the abstraction that makes horizontal scaling useful: as you add more pods, they are automatically added behind the Service.

For external access, Routes or Ingress extend this with HTTP(S) load balancing and features like TLS termination and path‑based routing.

Readiness, Liveness, and Startup Probes

Probes are small but critical to both scaling and HA:

Readiness probes

Indicate when a pod is ready to serve traffic.
The Service only routes traffic to pods that pass readiness.
During scaling or rollouts, this avoids sending traffic to uninitialized pods.

Liveness probes

Detect when a pod is stuck or unhealthy.
The kubelet can restart the container automatically.

Startup probes

Help for slow‑starting applications, preventing premature liveness failures during long boot times.

Probes directly influence effective availability: you can have many replicas, but if they are not correctly probed, users will still see failures.

Resource Requests, Limits, and Scheduling

CPU and memory requests and limits affect both scaling and HA:

Requests are used by the scheduler to place pods on nodes with enough capacity.
Limits cap the maximum usage; exceeding them (especially memory) can cause OOM (Out Of Memory) kills.

Poorly configured resources can lead to:

Pods being unschedulable (under‑provisioned cluster).
Resource contention on nodes, causing instability.
Reduced density (over‑conservative requests) and wasted capacity.

Getting resource sizing roughly correct is a prerequisite for effective scaling and predictable availability.

High Availability at Different Layers

HA is not a single feature; it’s achieved by combining patterns at multiple layers. In OpenShift, you can think in terms of:

Application HA: Your app design and deployment configuration.
Platform HA: How the OpenShift cluster is built and operated.
Data HA: How stateful data is protected and replicated.

Application-Layer HA

At the application level, HA comes from:

Multiple replicas

Run more than one pod across different nodes.
Avoid single points of failure (e.g., replicas: 1 for critical workloads is a red flag).

Spread across failure domains

Use topology spreading and anti‑affinity rules to distribute pods across nodes and zones.
Prevent all replicas from ending up on the same node or in the same zone.

Graceful shutdown and startup

Handle termination signals so pods can finish in‑flight work.
Combined with rolling updates to avoid downtime during deployments.

Backoff and retry in clients

Clients should be resilient to transient failures, restarts, and rescheduling.
Make use of DNS and Service abstraction rather than hard‑coding pod IPs.

In this chapter’s later subsections, you’ll see explicit mechanisms like self‑healing controllers and autoscaling, but their effect depends heavily on these app‑level design choices.

Platform HA: Cluster and Control Plane

OpenShift clusters themselves can be highly available. Conceptually, this includes:

Redundant control plane nodes

Multiple API server and etcd instances (deployed across nodes and, ideally, zones).
Load balanced API endpoint so clients don’t rely on a single control plane node.

Multiple worker nodes

Workloads spread across at least two or three worker nodes.
Node eviction and rescheduling to survive worker failures.

Node health and remediation

Node status and taints reflect health; unhealthy nodes are avoided by the scheduler.
Integrations with cloud providers or hardware to replace or repair failed nodes.

Although cluster installation and topology are covered elsewhere, it’s important to understand that application‑level HA assumes the platform itself is not a single point of failure.

Data-Layer HA

For stateful applications, HA also depends on how data is stored:

Distributed or replicated storage backends

Storage systems that replicate data across nodes or zones.
Avoids data loss if a node or disk fails.

Backup and restore

Disaster‑recovery strategies for critical data.
Application‑consistent backups where necessary.

StatefulSet patterns

Pods with stable identities and persistent volumes, allowing recoveries without losing data.

Storage and stateful applications have their own dedicated section later; here the key message is: scaling and HA for stateful apps are constrained and shaped by how data is managed.

Failure Modes and How OpenShift Responds

To understand the value of scaling and HA features, it’s useful to think through typical failure scenarios and how OpenShift behaves.

Pod-Level Failures

Examples:

Application process crashes.
Container image bug causing frequent exits.
Liveness probe failure.

OpenShift/Kubernetes actions:

The controller (Deployment, etc.) creates a new pod to maintain the desired replica count.
The kubelet may restart the container if configured (restartPolicy: Always).
Readiness probes ensure only healthy pods receive traffic.

Result: Short‑lived impact if your application is resilient to restarts and has multiple replicas.

Node-Level Failures

Examples:

Worker node crashes or becomes unreachable.
Node is drained for maintenance.

OpenShift/Kubernetes actions:

Pods on the failed node are marked as not ready; traffic stops flowing to them.
After a timeout, pods are rescheduled onto other healthy nodes, assuming capacity exists.
If autoscaling is configured at the node level, additional nodes may be added.

Result: If the cluster has spare capacity and you run multiple replicas, the impact can be limited or invisible to users.

Capacity and Load Spikes

Examples:

Sudden increase in user traffic.
Batch jobs starting simultaneously.

OpenShift/Kubernetes actions (when autoscaling is enabled):

Horizontal pod autoscaler increases the number of pod replicas based on metrics (e.g., CPU, custom metrics).
Cluster autoscaler (in supported environments) adds worker nodes to host the additional pods.

Result: Applications can handle peak load more gracefully, assuming resource requests/limits and autoscaler policies are reasonably tuned.

Rolling Updates and Configuration Changes

Even without “failures,” deployments and config changes can cause downtime if not managed carefully:

Rolling updates allow a controlled transition from old to new versions.
Readiness probes and surge/unavailable settings determine how many old vs. new pods run during the rollout.
Rollbacks revert to a previous version if issues are detected.

This is part of operational high availability: minimizing or eliminating downtime during normal lifecycle events.

Design Principles for Highly Available, Scalable Workloads

Putting the pieces together, a few guiding principles help you design workloads that benefit from OpenShift’s scaling and HA capabilities:

Prefer stateless and horizontally scalable designs where possible

Makes use of replicas, Services, and autoscaling.
Simplifies node failure handling and rollouts.

Use multiple replicas for critical services

At least 2–3 replicas for front‑end and API layers.
Combined with spread across nodes/zones to avoid correlated failures.

Define good health probes

A cheap, reliable readiness check that actually reflects ability to serve requests.
Liveness checks that catch real deadlocks or hangs, not just slow operations.

Right‑size resource requests and limits

They should reflect typical and peak usage, not arbitrary defaults.
Review metrics to refine them over time.

Separate concerns across tiers

Stateless front ends, middle‑tier services, and stateful backends each have different HA and scaling strategies.
Use StatefulSets and appropriate storage only where state is required.

Plan for failure domains

Understand which nodes belong to which zones/racks.
Use scheduling constraints and topology spread to distribute replicas.

Combine autoscaling with quotas and limits

Autoscalers need room to work, but quotas prevent noisy neighbors from consuming an entire cluster.
Balance flexibility with protection.

How This Chapter Connects to Subsequent Sections

The remaining sections under “Scaling and High Availability” zoom into specific mechanisms:

Horizontal pod autoscaling: automatic adjustment of pod counts based on metrics.
Vertical scaling: tuning CPU/memory per pod and when it makes sense.
Self‑healing and pod restarts: deeper look at how controllers and probes work together for resilience.
High availability concepts: formal patterns and terminology (RPO, RTO, redundancy models).
Multi‑zone and multi‑region clusters: extending HA across larger failure domains.

Keep in mind the layered view introduced here—application, platform, and data—while you learn each mechanism; effective HA and scalable systems emerge from how these pieces are combined, not from any single feature on its own.

11.1 Horizontal pod autoscaling

11.2 Vertical scaling

11.3 Self-healing and pod restarts

11.4 High availability concepts

11.5 Multi-zone and multi-region clusters