8.4 Load balancing

Types of Load Balancing in OpenShift

OpenShift builds on Kubernetes primitives to provide several layers of load balancing. Each layer solves a different problem: distributing traffic between nodes, between pods, and sometimes across clusters or external systems. Understanding which layer to use in which situation is key to designing reliable applications.

At a high level, you’ll encounter:

Cluster-internal load balancing: between pods (via Service objects).
Node-level load balancing: through node ports and external load balancers.
Edge/load balancing for external clients: via Routes/Ingress and infrastructure load balancers.
Specialized load balancing: for TCP, UDP, WebSockets, gRPC, and sticky sessions.

This chapter focuses on what is specific to load balancing in OpenShift, not on the basic definitions of Services, Routes, or Ingress.

Cluster-Internal Load Balancing (Service-Level)

Within the cluster, OpenShift uses Kubernetes Service objects to distribute traffic to pods that match a label selector. The most common types and their load-balancing behavior:

ClusterIP Services

ClusterIP is the default Service type and provides:

A virtual IP (VIP) reachable only inside the cluster.
Built-in load balancing to backend pods via kube-proxy and iptables/IPVS rules.

When clients inside the cluster send traffic to the Service IP:

Connections are distributed across all Ready pods that match the Service selector.
The load-balancing algorithm is typically a simple round-robin at the connection level, although the exact behavior can differ slightly depending on the backend (iptables vs IPVS).

Use ClusterIP when:

Only internal workloads consume the Service.
You don’t need per-client session affinity (or you configure it explicitly, see below).

Example (simplified):

apiVersion: v1
kind: Service
metadata:
  name: my-app
spec:
  selector:
    app: my-app
  ports:
    - port: 80
      targetPort: 8080

This Service will load-balance connections to all my-app pods exposing port 8080.

Session Affinity (Client IP Stickiness)

Some applications require that all requests from a client go to the same pod (session stickiness). OpenShift/Kubernetes services can provide session affinity based on client IP.

Key points:

Set sessionAffinity: ClientIP on the Service.
Optionally, configure how long a session should be considered active with sessionAffinityConfig.

Example:

apiVersion: v1
kind: Service
metadata:
  name: cart-service
spec:
  selector:
    app: web-shop
  ports:
    - port: 80
      targetPort: 8080
  sessionAffinity: ClientIP
  sessionAffinityConfig:
    clientIP:
      timeoutSeconds: 10800   # 3 hours

Trade-offs:

Works well when clients keep a stable IP.
Can cause uneven load if some client IPs send much more traffic than others.
Not suitable if clients are behind large NATs with many users sharing a single IP.

Use this when application-level session handling (e.g., sticky cookies, stateless design) is not available or would be too complex to add.

Node-Level Load Balancing (NodePort and External Load Balancers)

For traffic entering the cluster from outside, OpenShift clusters usually sit behind external load balancers or cloud provider load-balancer services. The low-level Kubernetes concepts still apply, but OpenShift often abstracts or automates parts of the setup.

NodePort Behavior

A NodePort Service:

Exposes a static port (e.g., 30080) on every node.
Any traffic sent to nodeIP:nodePort is forwarded into the cluster and load-balanced across the Service’s pods.

This is mainly used when:

You are integrating with an external hardware/load balancer that sends traffic to specific node ports.
You are in an environment without integrated cloud load balancers and need a stable external entry point.

Example:

apiVersion: v1
kind: Service
metadata:
  name: nodeport-web
spec:
  type: NodePort
  selector:
    app: my-app
  ports:
    - port: 80
      targetPort: 8080
      nodePort: 30080

Considerations:

Clients talk to node IPs, not directly to pod IPs.
If a node is down, the external load balancer must stop sending traffic to it.
NodePort by itself doesn’t provide global health checks; that’s the responsibility of whatever is sending traffic to the nodes.

External Load Balancers

In cloud or virtualized environments, OpenShift clusters often use infrastructure load balancers:

Cloud load balancers (e.g., AWS NLB/ALB, Azure Load Balancer, GCP Load Balancer).
On-premises load balancers (e.g., F5, HAProxy, NGINX, hardware appliances).

Typical pattern:

External load balancer distributes traffic across worker node IPs (or across dedicated ingress nodes).
On each node, kube-proxy forwards incoming traffic to the appropriate Service.
That Service then load-balances to its pods.

Important aspects:

External load balancers often use health checks to detect node-level failures.
They may apply their own load-balancing algorithms (round-robin, least connections, etc.), in addition to the internal Service load balancing.

OpenShift clusters installed with Installer-Provisioned Infrastructure (IPI) often set up these external load balancers automatically, especially for the API and Ingress endpoints.

Application Ingress and HTTP(S) Load Balancing

At the HTTP(S) layer, OpenShift uses Ingress Controllers (usually HAProxy-based) to provide routing and load balancing for web applications that use Routes or Ingress resources.

Ingress Controller Role

The OpenShift Ingress Controller:

Runs as pods on one or more nodes.
Listens on standard ports (80/443) for external traffic (usually via a cloud or external load balancer).
Terminates or passes through TLS according to Route/Ingress configuration.
Load-balances traffic to backend pod endpoints.

While Services handle L3/L4 distribution, Ingress Controllers add L7 (HTTP/HTTPS) awareness:

URL-based routing (host, path).
Request header inspection.
More advanced load-balancing options (e.g., sticky cookies, backend weights).

Load Balancing via Routes

OpenShift Routes are the primary way to expose HTTP/S applications. Each Route references a Service, and the Ingress Controller load-balances traffic from clients to that Service’s pods.

Key behaviors:

Round-robin load balancing by default across all available pod endpoints.
Ability to configure sticky sessions using HAProxy cookies for HTTP(s).
Support for multiple endpoints with weights (for A/B testing, blue-green patterns, canary-like rollouts).

Example Route with simple round-robin:

apiVersion: route.openshift.io/v1
kind: Route
metadata:
  name: my-app
spec:
  host: my-app.apps.example.com
  to:
    kind: Service
    name: my-app
    weight: 100
  port:
    targetPort: 8080
  tls:
    termination: edge

In this case:

Incoming HTTPS traffic to my-app.apps.example.com terminates at the Ingress Controller (edge TLS termination).
The controller balances requests across pods behind the my-app Service.

Sticky Sessions with Routes (Cookie-Based Affinity)

For web sessions, you often want stickiness at HTTP level, not IP level. OpenShift Routes support cookie-based session affinity.

You configure this using annotations or route-specific options (version-dependent). Typical concept:

The Ingress Controller injects a cookie into the client’s response.
Future requests with that cookie are routed to the same backend pod (as long as it’s healthy).

Behavior and drawbacks:

More precise than IP-based affinity (works behind large NATs).
If a pod goes away, its sticky sessions need to be rebalanced.
Can complicate scaling and failure scenarios if the application isn’t prepared for session migration.

Load Balancing for Different Protocols

While HTTP(S) dominates most workloads, OpenShift supports load balancing for various protocols.

HTTP, HTTPS, and WebSockets

Web traffic (HTTP/1.1, HTTP/2, gRPC over HTTP/2, WebSockets) can be balanced via Routes or Ingress.
The Ingress Controller typically handles connection upgrades for WebSockets.
You can tune timeouts and keepalive settings depending on the type of connection.

For gRPC:

It runs over HTTP/2; ensure your Route/Ingress and Ingress Controller version support HTTP/2 handling.
Some advanced gRPC load-balancing patterns (like client-side load balancing) might not use traditional L7 load balancers as heavily.

TCP and UDP Load Balancing

For raw TCP or UDP protocols:

Use Services (ClusterIP, NodePort) for basic L4 load balancing inside the cluster or for node-exposed services.
For edge-level TCP/UDP load balancing:

Some Ingress Controllers support TCP/UDP passthrough configuration.
Otherwise rely on external load balancers that balance TCP/UDP to node or pod IPs.

Scenarios:

Databases using TCP only.
Custom or legacy protocols over TCP/UDP.
Message queues or streaming systems.

Considerations:

No HTTP-specific features (no URL-based routing, cookies, etc.).
Rely on connection-based load balancing (new connections generally distributed among backends).

Load Balancer Health Checks and Failure Handling

Effective load balancing depends on accurate detection of failing nodes or pods.

Node and Pod Health in Load Balancing

On the pod level:

Readiness probes determine if a pod should receive traffic.
Failing readiness means:

The pod is removed from Service endpoint lists.
The Ingress Controller stops forwarding requests to it.

On the node level:

OpenShift/Kubernetes marks a node as NotReady if it fails checks.
kube-proxy and the control plane remove that node’s pod IPs from endpoint lists.
External load balancers can health-check nodes (often via TCP checks on the Ingress ports or HTTP /health endpoints).

This layering helps avoid sending traffic:

To pods that are starting, failing, or shutting down.
To nodes that are offline or partitioned.

Graceful Draining and Rolling Updates

During rolling updates or node maintenance:

Pods should be drained gracefully:

Mark pods as not ready (e.g., preStop hooks, readiness probe failure).
Allow in-flight connections to finish, within a configured grace period.

Ingress Controller and Services stop routing new connections to draining pods.

Implications:

With HTTP workloads, this usually works smoothly, especially with short-lived requests.
With long-lived connections (WebSockets, streaming), you may need:

Longer termination grace periods.
Application logic to re-establish connections.

Properly configured draining is critical to achieve zero-downtime deployments and minimize user-visible errors.

Load Balancing in Multi-Zone and Hybrid Setups

OpenShift clusters can span multiple availability zones or be part of hybrid (on-prem + cloud) environments. Load balancing has additional considerations here.

Zone-Aware Load Balancing

External load balancers and Ingress Controllers can be configured to:

Prefer sending traffic to pods in the same zone as the client, reducing latency.
Spread traffic across zones for high availability.

Depending on your infrastructure:

Cloud providers may offer automatic zone-aware load balancers.
You can influence where Ingress Controllers run via node labels and affinity rules.
You might run separate Ingress Controllers per zone and front them with a global load balancer or DNS-based load balancing (e.g., weighted or latency-based DNS records).

Global and Cross-Cluster Load Balancing (High-Level)

For multi-cluster or hybrid strategies (e.g., DR, blue/green across clusters):

DNS-based methods are common:

Weighted DNS to send more traffic to one cluster.
Geo or latency-based DNS to route users to the nearest cluster.

External traffic managers or global load balancers:

Can distribute incoming requests across multiple OpenShift clusters’ Ingress endpoints.

Within each cluster, the mechanisms described above still apply; the new complexity is how traffic is split between clusters.

Performance and Tuning Considerations

While exact tuning depends on environment and scale, a few general principles are specific to load balancing on OpenShift:

Ingress Controller scaling:

Increase the number of Ingress Controller replicas to handle more concurrent connections.
Use pod anti-affinity and node selectors to spread them across nodes.

Resource requests/limits:

Ingress Controllers need enough CPU and memory to handle TLS, compression, and routing.

Keepalive and timeout settings:

Short timeouts free resources faster but can break long-running requests.
Long timeouts support streaming and WebSockets but use more memory and connections.

Connection reuse:

HTTP/2 and keepalive can reduce connection overhead but may create uneven load if long-lived connections are pinned to specific pods.

Logging and observability:

Enable access logging and metrics from Ingress Controllers and external load balancers.
Monitor request rates, error rates, and latency per backend to detect imbalances.

Tuning these parameters is usually part of capacity planning and operations, but understanding their impact on load balancing behavior helps you design applications that behave well under varying load.

Designing Applications for OpenShift Load Balancing

To make best use of OpenShift’s load-balancing features:

Prefer stateless services when possible:

Any pod can handle any request, making load balancing straightforward.

For stateful needs:

Use shared storage, external state stores, or distributed caches so that load balancing doesn’t depend on single pods.
If you must use stickiness, choose the right level:

IP-based session affinity at the Service level.
Cookie-based stickiness at the Route/Ingress level.

Handle pod and node failures gracefully:

Implement retries with backoff in clients.
Ensure idempotent or compensating operations where possible.

Consider horizontal scaling and understand how increased pod counts interact with:

Service load balancing.
Ingress Controller capacity.
External load balancers and their connection limits.

By combining these design practices with OpenShift’s layering of load-balancing mechanisms, you can build applications that are resilient, scalable, and efficient in real-world traffic conditions.

Comments

Please login to add a comment.

Don't have an account? Register now!