Kahibaro
Discord Login Register

Scaling and High Availability

Key Concepts and Goals

Scaling and high availability (HA) in OpenShift are about answering three questions:

  1. How many instances of my application do I need right now? (scaling)
  2. What happens when something fails? (resilience and self‑healing)
  3. How do I keep my applications available across failures and load spikes? (high availability)

This chapter gives the conceptual and practical foundation for how OpenShift helps you:

The later subsections in this part of the course will go deeper into individual mechanisms (like horizontal pod autoscaling and multi‑zone clusters). Here, you’ll learn how all the pieces fit together.

Types of Scaling in OpenShift

Scaling in OpenShift generally happens at two levels:

Within applications, you will see three main patterns (each covered in its own subsection later):

  1. Horizontal scaling (scale out / in)
    • Change the number of pod replicas.
    • Typical for stateless and many cloud‑native applications.
    • Implemented via Deployment, DeploymentConfig, StatefulSets (for stateful apps), and autoscalers.
  2. Vertical scaling (scale up / down)
    • Change the resources per pod (CPU and memory requests/limits).
    • Helpful when a single instance needs more capacity (e.g., more RAM for a JVM).
  3. Cluster scaling
    • Adding or removing worker nodes (manually or automatically via machine APIs and cloud integrations).
    • Must be coordinated with pod scaling so there is capacity to run the desired number of pods.

In practice, effective scaling usually combines:

Key Building Blocks for Scaling and HA

Several OpenShift/Kubernetes primitives work together to provide scaling and HA. At this point in the course you’ve seen the basics; here we’ll focus on how they contribute to resilience and elastic capacity.

Controllers: Ensuring Desired State

Controllers continuously reconcile actual cluster state to the desired state specified in manifests. For scaling and HA, the most important are:

These controllers, by design, give you self‑healing behavior: if a pod dies, the controller creates a replacement automatically.

Services and Load Balancing

A Service in Kubernetes/OpenShift provides a stable virtual IP and DNS name for a set of pods. From a scaling and HA perspective, Services:

For external access, Routes or Ingress extend this with HTTP(S) load balancing and features like TLS termination and path‑based routing.

Readiness, Liveness, and Startup Probes

Probes are small but critical to both scaling and HA:

Probes directly influence effective availability: you can have many replicas, but if they are not correctly probed, users will still see failures.

Resource Requests, Limits, and Scheduling

CPU and memory requests and limits affect both scaling and HA:

Poorly configured resources can lead to:

Getting resource sizing roughly correct is a prerequisite for effective scaling and predictable availability.

High Availability at Different Layers

HA is not a single feature; it’s achieved by combining patterns at multiple layers. In OpenShift, you can think in terms of:

  1. Application HA: Your app design and deployment configuration.
  2. Platform HA: How the OpenShift cluster is built and operated.
  3. Data HA: How stateful data is protected and replicated.

Application-Layer HA

At the application level, HA comes from:

In this chapter’s later subsections, you’ll see explicit mechanisms like self‑healing controllers and autoscaling, but their effect depends heavily on these app‑level design choices.

Platform HA: Cluster and Control Plane

OpenShift clusters themselves can be highly available. Conceptually, this includes:

Although cluster installation and topology are covered elsewhere, it’s important to understand that application‑level HA assumes the platform itself is not a single point of failure.

Data-Layer HA

For stateful applications, HA also depends on how data is stored:

Storage and stateful applications have their own dedicated section later; here the key message is: scaling and HA for stateful apps are constrained and shaped by how data is managed.

Failure Modes and How OpenShift Responds

To understand the value of scaling and HA features, it’s useful to think through typical failure scenarios and how OpenShift behaves.

Pod-Level Failures

Examples:

OpenShift/Kubernetes actions:

Result: Short‑lived impact if your application is resilient to restarts and has multiple replicas.

Node-Level Failures

Examples:

OpenShift/Kubernetes actions:

Result: If the cluster has spare capacity and you run multiple replicas, the impact can be limited or invisible to users.

Capacity and Load Spikes

Examples:

OpenShift/Kubernetes actions (when autoscaling is enabled):

Result: Applications can handle peak load more gracefully, assuming resource requests/limits and autoscaler policies are reasonably tuned.

Rolling Updates and Configuration Changes

Even without “failures,” deployments and config changes can cause downtime if not managed carefully:

This is part of operational high availability: minimizing or eliminating downtime during normal lifecycle events.

Design Principles for Highly Available, Scalable Workloads

Putting the pieces together, a few guiding principles help you design workloads that benefit from OpenShift’s scaling and HA capabilities:

  1. Prefer stateless and horizontally scalable designs where possible
    • Makes use of replicas, Services, and autoscaling.
    • Simplifies node failure handling and rollouts.
  2. Use multiple replicas for critical services
    • At least 2–3 replicas for front‑end and API layers.
    • Combined with spread across nodes/zones to avoid correlated failures.
  3. Define good health probes
    • A cheap, reliable readiness check that actually reflects ability to serve requests.
    • Liveness checks that catch real deadlocks or hangs, not just slow operations.
  4. Right‑size resource requests and limits
    • They should reflect typical and peak usage, not arbitrary defaults.
    • Review metrics to refine them over time.
  5. Separate concerns across tiers
    • Stateless front ends, middle‑tier services, and stateful backends each have different HA and scaling strategies.
    • Use StatefulSets and appropriate storage only where state is required.
  6. Plan for failure domains
    • Understand which nodes belong to which zones/racks.
    • Use scheduling constraints and topology spread to distribute replicas.
  7. Combine autoscaling with quotas and limits
    • Autoscalers need room to work, but quotas prevent noisy neighbors from consuming an entire cluster.
    • Balance flexibility with protection.

How This Chapter Connects to Subsequent Sections

The remaining sections under “Scaling and High Availability” zoom into specific mechanisms:

Keep in mind the layered view introduced here—application, platform, and data—while you learn each mechanism; effective HA and scalable systems emerge from how these pieces are combined, not from any single feature on its own.

Views: 21

Comments

Please login to add a comment.

Don't have an account? Register now!