Table of Contents
Why Load Balancing Matters
In server environments, load balancing lets you:
- Scale horizontally by adding more servers instead of upgrading one big one.
- Increase availability by avoiding a single point of failure (when combined with redundancy).
- Improve performance by distributing client requests intelligently.
- Hide internal topology so you can maintain/replace backends transparently.
At a high level, a load balancer sits between clients and your backend servers:
$$
\text{Client} \rightarrow \text{Load Balancer} \rightarrow \text{Backend Servers}
$$
The rest of this chapter focuses on concepts and Linux-oriented implementation details that are common across different load balancer technologies (HAProxy, Nginx, etc.), without going into configuration specifics that belong in their dedicated chapters.
Load Balancing Architectures
Layer 4 vs Layer 7 Load Balancing
Load balancers typically operate at:
- Layer 4 (Transport layer):
- Makes decisions based on IP addresses and ports (TCP/UDP).
- Does not inspect HTTP headers, cookies, URLs, etc.
- Often faster and more efficient; less CPU and memory usage.
- Example use: TCP load balancing for databases, generic TCP services.
- Layer 7 (Application layer):
- Understands application protocols (most commonly HTTP/HTTPS).
- Can route based on:
- HTTP headers (e.g.,
Host,User-Agent) - URL path (e.g.,
/api/vs/static/) - Cookies
- Request method (GET/POST/PUT…)
- Enables advanced logic: A/B testing, canary releases, API vs UI routing.
On Linux, both types are implemented using user-space software like HAProxy or Nginx, sometimes combined with kernel features like IPVS (IP Virtual Server) for high-performance L4 balancing.
Single vs Multi-Tier Load Balancing
You will often see more than one load balancer in a production architecture:
- Edge / front load balancer:
- Terminates TLS.
- Handles WAF (Web Application Firewall), rate limiting, caching.
- Routes by domain / path to different internal services.
- Internal load balancers:
- Distribute traffic among multiple instances of a specific service (e.g., app servers, microservices).
- May operate at L4 for lower overhead.
Example pattern:
- Internet → Edge LB → App LB(s) → App servers → DB LB → DB servers.
Linux boxes can serve in any of these roles using the same core tools but with different configuration.
Traffic Distribution Algorithms
Algorithms define how a load balancer assigns new connections/requests to backends. Common strategies:
Round Robin
- Requests are distributed sequentially across backends: S1 → S2 → S3 → S1 → …
- Simple and stateless.
- Works best when:
- Backends are similarly sized.
- Requests are roughly similar in cost.
Weighted Round Robin
- Each backend is assigned a weight (capacity indicator).
- Backends with higher weight receive more requests.
Example:
- Server A: weight 4
- Server B: weight 2
- Server C: weight 1
Over time, ratio of requests:
$$
A : B : C = 4 : 2 : 1
$$
Use this to accommodate different server sizes or temporarily reduce load on a weaker node.
Least Connections / Least Load
- New connection goes to the backend with the fewest active connections.
- Better when:
- Requests vary a lot in cost.
- Some connections are long-lived (e.g., WebSocket, SSH).
Variants:
- Least response time
- Least bandwidth
These require more metrics from backends.
Source / Hash-Based
- Backend is chosen based on a hash of:
- Client IP
- HTTP header
- Cookie
- URL path
- Primary use: session stickiness (send same client consistently to the same backend without formal session persistence).
- Also used for consistent hashing so that when backends are added/removed, only a small subset of clients get remapped.
Random (With or Without Weights)
- Randomly assign to a backend, optionally factoring in weights.
- Surprisingly effective in many scenarios with enough traffic volume.
- Simpler internal state than round robin in some implementations.
Health Checks and Failover
The usefulness of a load balancer depends heavily on detecting when backends are unhealthy and avoiding them.
Types of Health Checks
- TCP checks:
- Attempts to establish a TCP connection (e.g.,
tcp port 80). - Simple, but can’t detect application-level errors (e.g., HTTP 500).
- HTTP/HTTPS checks:
- Sends a specific HTTP request (e.g.,
GET /health). - Expects certain status codes (often 200) or even content.
- Common in web and API stacks.
- Scripted / Agent-based checks:
- External script or local agent reports status (e.g., via exit codes or metrics).
- Used for non-HTTP services like databases, or to test deeper functionality.
Health Check Behavior
Typical parameters you’ll tune:
- Interval: how often to check (e.g., every 2–10 seconds).
- Timeout: how long to wait before considering the check failed.
- Failure threshold: how many consecutive failures before marking a server
DOWN. - Recovery threshold: how many successful checks to mark it
UPagain (avoid flapping).
Operational patterns:
- When all backends in a pool are down:
- Some load balancers return a generic error (e.g., 503).
- Others can fall back to another pool or an error page server.
- When a backend comes back:
- You may want a warm-up phase (reduced traffic, smaller weight) so caches can fill and JIT compilers warm up.
Passive vs Active Checks
- Active checks: load balancer initiates the probes.
- Passive checks: detect errors from normal client traffic:
- Too many 5xx responses.
- TCP timeouts / resets.
- Mark backend degraded or down based on error rate.
Usually you’ll combine both.
Session Persistence (Stickiness)
Stateful applications often need the same client to keep hitting the same backend (e.g., in-memory sessions, shopping carts without shared storage).
Common persistence methods:
IP-Based Persistence
- Maps client IP (or subnet) → backend.
- Easy but fragile:
- Many clients share an IP behind NAT.
- Mobile users may change IPs frequently.
Cookie-Based Persistence
- Load balancer injects a cookie (e.g.,
LBID) that encodes the backend choice. - Subsequent requests presenting that cookie are routed to the same backend.
- More reliable for HTTP use cases than IP.
Variations:
- Application-managed cookies (app writes
Set-Cookie, load balancer uses it). - Load-balancer-managed cookies (transparent to the app).
URL / Header-Based Persistence
- Use a specific header (e.g.,
X-User-ID) or part of the URL as a hash key. - Useful for APIs or multi-tenant scenarios.
Trade-Offs
- Stickiness can hurt load distribution:
- Some backends may receive disproportional traffic.
- When a backend fails:
- Sticky mapping must be broken, and clients should be migrated gracefully.
- Long-lived sticky sessions make rolling deployments more complex; often combined with:
- Shared state (session stores, shared caches).
- Short session lifetimes.
SSL/TLS Termination and Offloading
Load balancers are often the first point where TLS is handled.
TLS Termination
- Client ↔ LB: HTTPS
- LB ↔ Backend: HTTP (or HTTPS)
Benefits:
- Backends can handle plain HTTP, simplifying configuration.
- Centralized TLS configuration:
- Certificates
- Protocol versions (TLS 1.2/1.3)
- Cipher suites
- Offload CPU work of encryption from backends.
Risks / considerations:
- Traffic within your internal network is unencrypted; mitigate with:
- Network segmentation.
- Mutual TLS between tiers if needed.
- Firewalls controlling east–west traffic.
TLS Passthrough
- LB forwards encrypted traffic without decrypting.
- Backends terminate TLS directly.
- LB usually operates at Layer 4 in this mode.
Use when:
- You need true end-to-end TLS.
- Backend needs to see the client certificate.
- Regulatory or compliance requirements demand it.
Trade-offs:
- Less visibility for the load balancer (no HTTP headers or URL paths).
- Harder to do advanced L7 routing, WAF, or content-based policies.
Re-Encryption
- TLS is terminated at LB, processed for inspection/routing, then re-encrypted to backend.
Pattern:
- Client → TLS → LB → TLS → Backend
Used when you want both:
- Centralized policies and inspection.
- Encrypted traffic on internal links.
High Availability for Load Balancers
A single load balancer is itself a single point of failure. High-availability (HA) load balancing ensures the LB tier is redundant.
Active-Passive vs Active-Active
- Active-Passive:
- One LB node handles all traffic.
- A second node stands by with equivalent config and takes over VIP (Virtual IP) if primary fails.
- Easier to reason about, but wastes some capacity.
- Active-Active:
- Multiple LBs share the load simultaneously.
- Typically fronted by:
- Anycast routing.
- DNS load balancing (multiple A/AAAA records).
- Hardware or virtual routers.
- Better scalability, but requires careful session and state handling.
Techniques on Linux
Common components:
- Virtual IPs (VIPs):
- Shared IP address that can move between nodes.
- Managed by tools like
keepalived(using VRRP) orpacemaker/corosync. - VRRP (Virtual Router Redundancy Protocol):
- Redundancy mechanism where routers (or servers) share a virtual router IP.
- One acts as MASTER, others as BACKUP.
- Linux implementation commonly done via
keepalived. - ARP announcements / gratuitous ARP:
- When a VIP moves, the new active LB advertises its MAC to update switches and ARP caches.
Config Synchronization
For consistent behavior:
- Synchronize config files across LB nodes (e.g.,
rsync, configuration management tools). - Optionally synchronize runtime state (connection tables, stickiness maps):
- Some load balancers can share stick tables or session maps.
- Without this, failover may drop existing connections and break stickiness.
Load Balancing and DNS
DNS is not itself a full load balancer, but it’s often used in conjunction:
Simple DNS Round Robin
- Multiple
A(IPv4) orAAAA(IPv6) records for the same name: app.example.com→ IP1, IP2, IP3- Clients pick one of the returned IPs (often pseudo-randomly).
Characteristics:
- No health checks by default; dead IPs still get handed out.
- Client-side caching: entries may stick for
TTLseconds. - Uneven load depending on resolver behavior.
GeoDNS / Latency-Based DNS
- DNS provider returns nearest or lowest-latency IP based on client location or real-time measurements.
- Useful for global traffic distribution across regions or data centers.
Integration with Traditional Load Balancers
Typical patterns:
- DNS balances between multiple load balancers (each of which balances to backend servers).
- Global DNS-based distribution between regions, each with its own LB tiers.
Key limits:
- DNS changes are not instantaneous (cached).
- DNS can’t see actual HTTP headers or paths, so decisions are coarse-grained.
Load Balancing Special Cases
TCP and UDP Services
Not all services are HTTP:
- TCP load balancing:
- Databases (e.g., PostgreSQL, MySQL).
- SSH jump hosts.
- Message queues.
- Requires careful failover strategy to avoid mid-transaction breakage.
- UDP load balancing:
- DNS servers.
- RADIUS.
- Some streaming protocols.
- Typically stateless from LB point of view; may use IP hash to keep packets ordered to the same backend.
Long-Lived Connections
WebSockets, gRPC streams, or streaming protocols:
- Consume connection slots for longer periods.
- Can skew least-connections algorithms if not tuned.
- Stickiness may be inherent (one TCP connection = one backend).
Strategies:
- Limit max concurrent connections per backend.
- Use health checks that gracefully drain long-lived connections before shutdown.
- Plan for connection draining during deployments and restarts.
Observability and Tuning
To operate load balancing in production, you need visibility and tuning capabilities.
Key Metrics
Useful metrics per backend and per frontend/listener:
- Request/connection rate (RPS/CPS).
- Active connections.
- Response times (p50, p95, p99).
- Error rates:
- HTTP 4xx/5xx breakdown.
- TCP resets, timeouts.
- Queue length / pending connections.
- Resource use on LB node:
- CPU, memory, network (bandwidth, packet drops).
Logs
LB logs can be very detailed:
- Access logs (similar to web server logs):
- Client IP, timestamp, method, URL, status code, bytes sent.
- Backend server chosen.
- Request and response timings.
- Error logs:
- Health check failures.
- Backend downtime/up events.
- Internal LB issues.
Centralize and analyze these logs to:
- Detect imbalances.
- Identify slow backends.
- Correlate deploys with error spikes.
Performance Considerations
On Linux:
- Kernel networking tunables:
- Connection tracking, backlog sizes, ephemeral port ranges.
- TCP options like
tcp_tw_reuse,tcp_fin_timeout(careful with side effects). - File descriptors:
- Load balancers can consume many fds; increase
nofilelimits. - NIC offloads and multi-queue:
- RSS (Receive Side Scaling) to distribute interrupts across CPU cores.
- Correctly size and pin worker processes/threads.
Design Patterns and Practical Scenarios
Blue-Green and Canary Deployments
Load balancers are central in deployment strategies:
- Blue-Green:
- Two identical environments: blue and green.
- LB routes 100% of traffic to one, then switches to the other after deployment.
- On Linux you might:
- Maintain two backend pools.
- Flip which pool is active.
- Canary:
- Route a small percentage of traffic to a new version.
- Gradually increase if metrics look good.
- Requires:
- Weighted backends or pools.
- Good monitoring to compare canary vs baseline.
Multi-Tenancy
- Different tenants/apps behind one LB:
- Route by domain (SNI/Host header).
- Route by path prefix.
- Important for:
- SSL certificate management (per domain).
- Isolation of failure domains (rate limiting, resource limits).
Graceful Maintenance
When you need to take a backend out:
- Mark it as “draining”:
- Stop sending new connections.
- Allow existing connections to finish (up to a timeout).
- After all connections close:
- Stop the service or reboot the server safely.
This is often implemented directly in LB configuration or via an API/CLI.
Summary
Load balancing on Linux is a combination of:
- Traffic distribution algorithms (round robin, least connections, hashing).
- Health checking and failover.
- Session persistence strategies.
- TLS termination/offloading.
- High availability of the load balancer tier itself.
- DNS and global distribution integration.
- Observability and performance tuning.
The concrete implementations (HAProxy, Nginx, etc.) build on these concepts; mastering them conceptually makes it easier to read, design, and debug any specific configuration in later chapters.