5.5 Load Balancing

Table of Contents

Introduction

Load balancing is a technique to distribute incoming network traffic across multiple backend systems in order to improve performance, reliability, and scalability. In the context of Linux servers, load balancing is usually implemented with specialized software that accepts client connections, then forwards these connections to one of several backend servers according to specific rules.

This chapter introduces the core ideas that are common to all later topics in this part, such as HAProxy fundamentals, Nginx as a load balancer, session persistence, and high availability concepts. You will see what problems load balancing solves, how traffic typically flows through a load balanced system, the main algorithms involved, and the basic terminology that you will need to understand the following chapters.

Why Load Balancing Is Used

A single server that handles all client requests has several limitations. It has finite CPU, memory, disk, and network capacity. Once that capacity is reached, the server becomes slow or stops responding. It is also a single point of failure. If that one machine fails, every service it provides becomes unavailable.

Load balancing addresses these problems by placing a component in front of multiple backend servers. This component receives all incoming requests and distributes them among the available backends. If one backend fails, the load balancer can detect it and stop sending traffic to that machine. If you need more capacity, you can add more backend servers behind the load balancer.

From a client point of view, there is still a single hostname or IP address. Internally, that single address hides a pool of servers. This separation allows you to scale the backend side independently of what clients see.

Basic Load Balancing Architecture

A typical load balanced architecture has three main parts: clients, a load balancer, and backend servers. Clients connect to the load balancer using a virtual IP address or a hostname that resolves to that IP. The load balancer then chooses a backend server and forwards the request. The backend processes the request and returns the response to the load balancer, which sends it back to the client.

In most software based setups on Linux, this traffic flow occurs at the application level, for example at HTTP level for web traffic or at TCP level for generic TCP services. The load balancer usually listens on a well known port such as port 80 for HTTP or port 443 for HTTPS, and each backend runs its own instance of the application server software.

The load balancer can run on a separate machine, or in smaller setups it can run on the same machine as one of the backends. In more advanced high availability topologies, there may be multiple load balancers with some mechanism that provides a shared virtual IP address, so that one load balancer can take over if another one fails. Those high availability structures are discussed later in this part of the course.

Layer 4 vs Layer 7 Load Balancing

Load balancing is often described using the OSI model terminology. Two important levels for server administration are layer 4 and layer 7.

Layer 4 load balancing works at the transport layer, usually with TCP or UDP. The load balancer looks at information such as source IP, destination IP, source port, destination port, and protocol. It does not need to understand the application data. It forwards packets or connections according to configured rules. This style is generally more lightweight and can be very efficient. It is suitable when you only need connection based distribution, for example balancing generic TCP services or simple protocols.

Layer 7 load balancing works at the application layer, such as HTTP, HTTPS, or SMTP. The load balancer inspects the application data, for example HTTP headers, request paths, cookies, or hostnames. It uses this information to make routing decisions. For instance, an HTTP load balancer can send requests for one domain to a certain backend pool and requests for another domain to a different pool. It can also route based on URL paths, user agents, or other application fields.

Layer 7 load balancing is more flexible but also more resource intensive, because the load balancer must parse and understand the higher level protocol. Modern tools such as HAProxy and Nginx can operate in both ways, either as simple TCP load balancers or as full application aware proxies.

Core Load Balancing Algorithms

The way a load balancer chooses which backend server to use for each request is defined by its load balancing algorithm. Several algorithms are common in Linux based tools.

The simplest is round robin. In a round robin method, the load balancer cycles through the list of backend servers in order. The first request goes to the first server, the second request goes to the second server, and so on, then the cycle repeats. Round robin assumes that all servers have roughly equal capacity and that all requests consume similar resources.

Weighted round robin extends this. Each backend server is assigned a weight value. Servers with higher weights receive proportionally more requests. For example, if one server has weight 2 and another has weight 1, the first will receive roughly two thirds of the traffic while the second receives one third. In formula form, if server $i$ has weight $w_i$, then its expected share of requests is

$$ \text{share}_i = \frac{w_i}{\sum_j w_j}. $$

Weighted round robin uses weights to control the fraction of traffic each backend receives. The traffic share is given by the formula
$$ \text{share}_i = \frac{w_i}{\sum_j w_j}. $$

Another common algorithm is least connections. With this method, the load balancer tracks how many active connections each backend currently has. New connections are sent to the server with the fewest connections. This approach works well when requests have varied processing times, because it tends to equalize the load by preferring less busy servers.

Weighted least connections combines weights with the least connections approach. A more powerful server can be given a higher weight, so it is allowed to have more active connections than a weaker one, while still using the least connections logic.

There are also algorithms that consider response times. A load balancer can record how long backends take to respond and prefer faster ones. Some tools support hash based methods, where the choice of backend is made using a hash of client IP, session identifier, or other data. These hash methods are closely related to session persistence ideas discussed later.

Health Checks and Failover Behavior

A reliable load balancer needs to know whether each backend server is healthy. Health checks are periodic tests that the load balancer performs. For a simple TCP service, the health check might be an attempt to open a TCP connection to a specific port. For an HTTP service, the load balancer might request a specific URL and check for a certain status code.

If a backend fails a health check several times in a row, the load balancer marks it as down and stops sending new requests to it. When the health checks start succeeding again, the backend can be marked as up and traffic can be resumed. This mechanism provides automatic failover from unhealthy backends to healthy ones.

Health check intervals and thresholds must be tuned carefully. Checks that are too infrequent might delay failover when a backend fails. Checks that are too frequent might add unnecessary load. Many tools also support different health check types, from simple TCP tests to complex HTTP checks that verify database connectivity or application state indirectly.

Basic Concepts of Session Persistence

Some applications maintain state on a particular backend server. For example, a web application might store user session data in the memory of the web server process rather than in a shared database. In such scenarios, it is often important to keep all requests from the same client going to the same backend. Otherwise, sessions may appear to be lost when traffic moves to a different server.

Session persistence, sometimes called sticky sessions, refers to techniques that make the load balancer send requests from the same client to the same backend over time. A simple method uses client source IP. The load balancer can map each client IP to one backend using a hash function. As long as the hash mapping does not change, the same IP will reach the same backend. This approach fails when many clients share a single IP, such as users behind a corporate proxy.

A more robust method involves cookies. For HTTP traffic, the load balancer or the application can set a cookie that encodes which backend the client is assigned to. On later requests, the load balancer reads that cookie and routes the request to the appropriate server. The detailed configuration of these mechanisms depends on the specific load balancer and is covered when session persistence is discussed explicitly.

Load Balancing and Security Considerations

Placing a load balancer between clients and backends introduces security aspects. The load balancer is often the only part of the system directly exposed to the internet, so it must be hardened, patched, and monitored carefully. It can terminate TLS connections, which means it handles encryption and decryption of HTTPS traffic. This centralizes certificate management but also concentrates sensitive cryptographic material on the load balancer.

Since the load balancer sees all inbound requests, it is a natural point to apply access control and filtering. Basic rate limiting, connection limits, or IP based restrictions are often implemented here. Because the load balancer forwards traffic to backends, there is a trust boundary between the public side and the internal network segment that hosts the backend servers.

Logging is also important. A load balancer can log incoming requests, chosen backends, response times, and error conditions. These logs are valuable for troubleshooting, performance analysis, and security investigations. The specific methods for handling and rotating such logs are treated elsewhere in the course, but as an administrator you should be aware that the load balancer becomes a central observability component.

Relation to High Availability

Although a single load balancer can improve reliability of backend servers, the load balancer itself may become a point of failure. High availability for load balancing usually involves having more than one load balancer instance and mechanisms to share or fail over a virtual IP address among them.

At a conceptual level, this means there is a top tier of load balancers which themselves are protected with a failover system, and behind them a pool of backend servers. Client traffic always targets the virtual IP address. If one load balancer becomes unavailable, the other one takes over the virtual IP and continues serving connections. The details of these high availability concepts and the specific tools used belong to later chapters in this part of the course.

Summary

Load balancing is a fundamental technique for scaling services horizontally, improving resilience, and separating client facing addresses from internal infrastructure. It works by distributing connections or requests across multiple backend servers according to a chosen algorithm, while monitoring backend health and sometimes preserving client sessions. Whether using HAProxy, Nginx, or other tools, these architectural ideas and core algorithms underlie nearly every practical load balanced setup you will configure on Linux servers.

5.5.1 HAProxy fundamentals

5.5.2 Nginx as a load balancer

5.5.3 Session persistence

5.5.4 High availability concepts