5.5 Load Balancing

Table of Contents

Why Load Balancing Matters

In server environments, load balancing lets you:

Scale horizontally by adding more servers instead of upgrading one big one.
Increase availability by avoiding a single point of failure (when combined with redundancy).
Improve performance by distributing client requests intelligently.
Hide internal topology so you can maintain/replace backends transparently.

At a high level, a load balancer sits between clients and your backend servers:

$$
\text{Client} \rightarrow \text{Load Balancer} \rightarrow \text{Backend Servers}
$$

The rest of this chapter focuses on concepts and Linux-oriented implementation details that are common across different load balancer technologies (HAProxy, Nginx, etc.), without going into configuration specifics that belong in their dedicated chapters.

Load Balancing Architectures

Layer 4 vs Layer 7 Load Balancing

Load balancers typically operate at:

Layer 4 (Transport layer):

Makes decisions based on IP addresses and ports (TCP/UDP).
Does not inspect HTTP headers, cookies, URLs, etc.
Often faster and more efficient; less CPU and memory usage.
Example use: TCP load balancing for databases, generic TCP services.

Layer 7 (Application layer):

Understands application protocols (most commonly HTTP/HTTPS).
Can route based on:

HTTP headers (e.g., Host, User-Agent)
URL path (e.g., /api/ vs /static/)
Cookies
Request method (GET/POST/PUT…)

Enables advanced logic: A/B testing, canary releases, API vs UI routing.

On Linux, both types are implemented using user-space software like HAProxy or Nginx, sometimes combined with kernel features like IPVS (IP Virtual Server) for high-performance L4 balancing.

Single vs Multi-Tier Load Balancing

You will often see more than one load balancer in a production architecture:

Edge / front load balancer:

Terminates TLS.
Handles WAF (Web Application Firewall), rate limiting, caching.
Routes by domain / path to different internal services.

Internal load balancers:

Distribute traffic among multiple instances of a specific service (e.g., app servers, microservices).
May operate at L4 for lower overhead.

Example pattern:

Internet → Edge LB → App LB(s) → App servers → DB LB → DB servers.

Linux boxes can serve in any of these roles using the same core tools but with different configuration.

Traffic Distribution Algorithms

Algorithms define how a load balancer assigns new connections/requests to backends. Common strategies:

Round Robin

Requests are distributed sequentially across backends: S1 → S2 → S3 → S1 → …
Simple and stateless.
Works best when:

Backends are similarly sized.
Requests are roughly similar in cost.

Weighted Round Robin

Each backend is assigned a weight (capacity indicator).
Backends with higher weight receive more requests.

Example:

Server A: weight 4
Server B: weight 2
Server C: weight 1

Over time, ratio of requests:

$$
A : B : C = 4 : 2 : 1
$$

Use this to accommodate different server sizes or temporarily reduce load on a weaker node.

Least Connections / Least Load

New connection goes to the backend with the fewest active connections.
Better when:

Requests vary a lot in cost.
Some connections are long-lived (e.g., WebSocket, SSH).

Variants:

Least response time
Least bandwidth

These require more metrics from backends.

Source / Hash-Based

Backend is chosen based on a hash of:

Client IP
HTTP header
Cookie
URL path

Primary use: session stickiness (send same client consistently to the same backend without formal session persistence).
Also used for consistent hashing so that when backends are added/removed, only a small subset of clients get remapped.

Random (With or Without Weights)

Randomly assign to a backend, optionally factoring in weights.
Surprisingly effective in many scenarios with enough traffic volume.
Simpler internal state than round robin in some implementations.

Health Checks and Failover

The usefulness of a load balancer depends heavily on detecting when backends are unhealthy and avoiding them.

Types of Health Checks

TCP checks:

Attempts to establish a TCP connection (e.g., tcp port 80).
Simple, but can’t detect application-level errors (e.g., HTTP 500).

HTTP/HTTPS checks:

Sends a specific HTTP request (e.g., GET /health).
Expects certain status codes (often 200) or even content.
Common in web and API stacks.

Scripted / Agent-based checks:

External script or local agent reports status (e.g., via exit codes or metrics).
Used for non-HTTP services like databases, or to test deeper functionality.

Health Check Behavior

Typical parameters you’ll tune:

Interval: how often to check (e.g., every 2–10 seconds).
Timeout: how long to wait before considering the check failed.
Failure threshold: how many consecutive failures before marking a server DOWN.
Recovery threshold: how many successful checks to mark it UP again (avoid flapping).

Operational patterns:

When all backends in a pool are down:

Some load balancers return a generic error (e.g., 503).
Others can fall back to another pool or an error page server.

When a backend comes back:

You may want a warm-up phase (reduced traffic, smaller weight) so caches can fill and JIT compilers warm up.

Passive vs Active Checks

Active checks: load balancer initiates the probes.
Passive checks: detect errors from normal client traffic:

Too many 5xx responses.
TCP timeouts / resets.
Mark backend degraded or down based on error rate.

Usually you’ll combine both.

Session Persistence (Stickiness)

Stateful applications often need the same client to keep hitting the same backend (e.g., in-memory sessions, shopping carts without shared storage).

Common persistence methods:

IP-Based Persistence

Maps client IP (or subnet) → backend.
Easy but fragile:

Many clients share an IP behind NAT.
Mobile users may change IPs frequently.

Cookie-Based Persistence

Load balancer injects a cookie (e.g., LBID) that encodes the backend choice.
Subsequent requests presenting that cookie are routed to the same backend.
More reliable for HTTP use cases than IP.

Variations:

Application-managed cookies (app writes Set-Cookie, load balancer uses it).
Load-balancer-managed cookies (transparent to the app).

URL / Header-Based Persistence

Use a specific header (e.g., X-User-ID) or part of the URL as a hash key.
Useful for APIs or multi-tenant scenarios.

Trade-Offs

Stickiness can hurt load distribution:

Some backends may receive disproportional traffic.

When a backend fails:

Sticky mapping must be broken, and clients should be migrated gracefully.

Long-lived sticky sessions make rolling deployments more complex; often combined with:

Shared state (session stores, shared caches).
Short session lifetimes.

SSL/TLS Termination and Offloading

Load balancers are often the first point where TLS is handled.

TLS Termination

Client ↔ LB: HTTPS
LB ↔ Backend: HTTP (or HTTPS)

Benefits:

Backends can handle plain HTTP, simplifying configuration.
Centralized TLS configuration:

Certificates
Protocol versions (TLS 1.2/1.3)
Cipher suites

Offload CPU work of encryption from backends.

Risks / considerations:

Traffic within your internal network is unencrypted; mitigate with:

Network segmentation.
Mutual TLS between tiers if needed.
Firewalls controlling east–west traffic.

TLS Passthrough

LB forwards encrypted traffic without decrypting.
Backends terminate TLS directly.
LB usually operates at Layer 4 in this mode.

Use when:

You need true end-to-end TLS.
Backend needs to see the client certificate.
Regulatory or compliance requirements demand it.

Trade-offs:

Less visibility for the load balancer (no HTTP headers or URL paths).
Harder to do advanced L7 routing, WAF, or content-based policies.

Re-Encryption

TLS is terminated at LB, processed for inspection/routing, then re-encrypted to backend.

Pattern:

Client → TLS → LB → TLS → Backend

Used when you want both:

Centralized policies and inspection.
Encrypted traffic on internal links.

High Availability for Load Balancers

A single load balancer is itself a single point of failure. High-availability (HA) load balancing ensures the LB tier is redundant.

Active-Passive vs Active-Active

Active-Passive:

One LB node handles all traffic.
A second node stands by with equivalent config and takes over VIP (Virtual IP) if primary fails.
Easier to reason about, but wastes some capacity.

Active-Active:

Multiple LBs share the load simultaneously.
Typically fronted by:

Anycast routing.
DNS load balancing (multiple A/AAAA records).
Hardware or virtual routers.

Better scalability, but requires careful session and state handling.

Techniques on Linux

Common components:

Virtual IPs (VIPs):

Shared IP address that can move between nodes.
Managed by tools like keepalived (using VRRP) or pacemaker/corosync.

VRRP (Virtual Router Redundancy Protocol):

Redundancy mechanism where routers (or servers) share a virtual router IP.
One acts as MASTER, others as BACKUP.
Linux implementation commonly done via keepalived.

ARP announcements / gratuitous ARP:

When a VIP moves, the new active LB advertises its MAC to update switches and ARP caches.

Config Synchronization

For consistent behavior:

Synchronize config files across LB nodes (e.g., rsync, configuration management tools).
Optionally synchronize runtime state (connection tables, stickiness maps):

Some load balancers can share stick tables or session maps.
Without this, failover may drop existing connections and break stickiness.

Load Balancing and DNS

DNS is not itself a full load balancer, but it’s often used in conjunction:

Simple DNS Round Robin

Multiple A (IPv4) or AAAA (IPv6) records for the same name:

app.example.com → IP1, IP2, IP3

Clients pick one of the returned IPs (often pseudo-randomly).

Characteristics:

No health checks by default; dead IPs still get handed out.
Client-side caching: entries may stick for TTL seconds.
Uneven load depending on resolver behavior.

GeoDNS / Latency-Based DNS

DNS provider returns nearest or lowest-latency IP based on client location or real-time measurements.
Useful for global traffic distribution across regions or data centers.

Integration with Traditional Load Balancers

Typical patterns:

DNS balances between multiple load balancers (each of which balances to backend servers).
Global DNS-based distribution between regions, each with its own LB tiers.

Key limits:

DNS changes are not instantaneous (cached).
DNS can’t see actual HTTP headers or paths, so decisions are coarse-grained.

Load Balancing Special Cases

TCP and UDP Services

Not all services are HTTP:

TCP load balancing:

Databases (e.g., PostgreSQL, MySQL).
SSH jump hosts.
Message queues.
Requires careful failover strategy to avoid mid-transaction breakage.

UDP load balancing:

DNS servers.
RADIUS.
Some streaming protocols.
Typically stateless from LB point of view; may use IP hash to keep packets ordered to the same backend.

Long-Lived Connections

WebSockets, gRPC streams, or streaming protocols:

Consume connection slots for longer periods.
Can skew least-connections algorithms if not tuned.
Stickiness may be inherent (one TCP connection = one backend).

Strategies:

Limit max concurrent connections per backend.
Use health checks that gracefully drain long-lived connections before shutdown.
Plan for connection draining during deployments and restarts.

Observability and Tuning

To operate load balancing in production, you need visibility and tuning capabilities.

Key Metrics

Useful metrics per backend and per frontend/listener:

Request/connection rate (RPS/CPS).
Active connections.
Response times (p50, p95, p99).
Error rates:

HTTP 4xx/5xx breakdown.
TCP resets, timeouts.

Queue length / pending connections.
Resource use on LB node:

CPU, memory, network (bandwidth, packet drops).

Logs

LB logs can be very detailed:

Access logs (similar to web server logs):

Client IP, timestamp, method, URL, status code, bytes sent.
Backend server chosen.
Request and response timings.

Error logs:

Health check failures.
Backend downtime/up events.
Internal LB issues.

Centralize and analyze these logs to:

Detect imbalances.
Identify slow backends.
Correlate deploys with error spikes.

Performance Considerations

On Linux:

Kernel networking tunables:

Connection tracking, backlog sizes, ephemeral port ranges.
TCP options like tcp_tw_reuse, tcp_fin_timeout (careful with side effects).

File descriptors:

Load balancers can consume many fds; increase nofile limits.

NIC offloads and multi-queue:

RSS (Receive Side Scaling) to distribute interrupts across CPU cores.
Correctly size and pin worker processes/threads.

Design Patterns and Practical Scenarios

Blue-Green and Canary Deployments

Load balancers are central in deployment strategies:

Blue-Green:

Two identical environments: blue and green.
LB routes 100% of traffic to one, then switches to the other after deployment.
On Linux you might:

Maintain two backend pools.
Flip which pool is active.

Canary:

Route a small percentage of traffic to a new version.
Gradually increase if metrics look good.
Requires:

Weighted backends or pools.
Good monitoring to compare canary vs baseline.

Multi-Tenancy

Different tenants/apps behind one LB:

Route by domain (SNI/Host header).
Route by path prefix.

Important for:

SSL certificate management (per domain).
Isolation of failure domains (rate limiting, resource limits).

Graceful Maintenance

When you need to take a backend out:

Mark it as “draining”:

Stop sending new connections.
Allow existing connections to finish (up to a timeout).

After all connections close:

Stop the service or reboot the server safely.

This is often implemented directly in LB configuration or via an API/CLI.

Summary

Load balancing on Linux is a combination of:

Traffic distribution algorithms (round robin, least connections, hashing).
Health checking and failover.
Session persistence strategies.
TLS termination/offloading.
High availability of the load balancer tier itself.
DNS and global distribution integration.
Observability and performance tuning.

The concrete implementations (HAProxy, Nginx, etc.) build on these concepts; mastering them conceptually makes it easier to read, design, and debug any specific configuration in later chapters.

5.5.1 HAProxy fundamentals

5.5.2 Nginx as a load balancer

5.5.3 Session persistence

5.5.4 High availability concepts

Comments

Please login to add a comment.

Don't have an account? Register now!