Kahibaro
Discord Login Register

5.5.1 HAProxy fundamentals

Introduction

HAProxy is a popular open source TCP and HTTP load balancer and reverse proxy that is widely used in production environments. It is valued for its high performance, flexibility, and extensive configuration options. In this chapter you will focus on the essential concepts and configuration elements that are specific to HAProxy itself, rather than load balancing as a general topic.

HAProxy’s Role in a Server Architecture

HAProxy typically sits between clients and backend servers. Clients connect to HAProxy, and HAProxy then forwards requests to one of several backend servers, according to rules you define. From the client’s perspective, HAProxy looks like a single server. From the backend’s perspective, HAProxy behaves as a client that distributes incoming traffic.

For HTTP traffic, HAProxy understands headers, methods, and status codes. For TCP traffic, HAProxy simply moves bytes without inspecting the protocol in detail. This dual role makes it suitable for web applications, APIs, databases that support proxying, and other network services.

HAProxy Operating Modes

HAProxy has two primary modes of operation that you select explicitly: TCP mode and HTTP mode. The chosen mode determines what HAProxy can inspect and manipulate in each connection.

In TCP mode, configured with mode tcp, HAProxy handles raw connections. It does not parse application data and therefore cannot apply HTTP specific rules. This is commonly used for TLS passthrough where the TLS connection terminates at the backend, or for protocols like SMTP or database connections.

In HTTP mode, configured with mode http, HAProxy understands the HTTP protocol and can read headers, methods, paths, and status codes. HTTP mode allows features such as header rewriting, routing based on URL paths or hostnames, and advanced health checks that evaluate HTTP responses. When you want HAProxy to terminate TLS and then inspect HTTP, you typically configure HTTPS on the frontend and plain HTTP on the backend while using HTTP mode.

Choose mode http only when you need HTTP level features. For non HTTP protocols or pure TLS passthrough, use mode tcp.

Core Configuration Structure

HAProxy is configured through a text file, most commonly /etc/haproxy/haproxy.cfg. This file is divided into sections. Each section has a specific purpose and scope, and some settings are inherited.

The global section defines process wide settings such as logging configuration, maximum connections, user and group under which HAProxy runs, and SSL related options at the process level. These parameters affect how HAProxy itself behaves, not the details of traffic handling.

The defaults section provides default settings that apply to multiple frontends and backends. Typical items include mode (HTTP or TCP), timeouts, logging options, and general connection handling behavior. Any option not overridden in a specific frontend or backend will use the value from defaults.

The frontend section defines how HAProxy accepts incoming connections. It binds to IP addresses and ports, selects the operating mode, and usually defines rules that decide which backend should handle a request. You can have multiple frontends for different ports or protocols.

The backend section defines one or more servers that will actually process the traffic, along with balancing algorithms, health check options, and per server settings. A single frontend may route requests to different backends depending on conditions.

There is also a listen section type that combines frontend and backend behavior in one place. It is conceptually a shorthand, but in more complex setups administrators often prefer explicit separate frontend and backend sections for clarity.

Basic Example Configuration

A minimal HTTP example ties these concepts together. Consider the following simple configuration:

global
    maxconn 2048
    log /dev/log local0
defaults
    mode http
    timeout connect 5s
    timeout client  30s
    timeout server  30s
frontend http_in
    bind *:80
    default_backend web_servers
backend web_servers
    balance roundrobin
    server web1 192.168.1.10:80 check
    server web2 192.168.1.11:80 check

Here, the global section sets system wide parameters and logging, defaults defines HTTP mode and timeouts, frontend http_in listens on port 80 for all interfaces and sends traffic to web_servers, and the backend web_servers section declares two servers and a load balancing policy.

Binding and Frontend Settings

The key directive in a frontend is bind. It determines where and how HAProxy accepts connections. A simple bind *:80 listens on all local addresses on TCP port 80. You can also bind to specific IPs, multiple ports, or use TLS configuration in the bind line.

TLS termination is configured by adding options to bind, such as:

frontend https_in
    bind 203.0.113.10:443 ssl crt /etc/haproxy/certs/site.pem
    default_backend web_servers

Here, HAProxy terminates TLS itself by using the certificate specified with crt. After decryption, it forwards plain HTTP to the backend. In HTTP mode you can then manipulate headers, redirect HTTP to HTTPS, or route based on hostnames.

Frontends can also use access control lists, or ACLs, to match conditions. For example, you might match a specific hostname or path and send those requests to a dedicated backend. ACL logic itself will be explored elsewhere, but in HAProxy configuration it is tightly integrated with frontends.

Backends and Server Definitions

A backend describes how requests should be distributed to servers. Each server is declared with the server directive, followed by a logical name, a host and port, and additional parameters.

For example:

backend api_servers
    balance roundrobin
    server api1 192.168.1.20:8080 check
    server api2 192.168.1.21:8080 check

Here, balance roundrobin indicates a simple rotation strategy. Each server line names the server and its address. The check keyword enables health checking, which lets HAProxy automatically mark servers as down when they stop responding. Additional options on server lines can control connection limits, weights, and SSL settings for backend connections.

Backends can also contain directives that modify traffic before it reaches servers. For HTTP mode this may include header modification or cookie handling. For TCP mode backends are more limited to connection handling.

Balancing Algorithms

HAProxy supports several algorithms to choose which backend server receives each new request. You select the algorithm with the balance directive inside a backend.

roundrobin is the default and most common option in simple setups. It sends each connection to the next server in sequence, which provides an even distribution when servers are similar.

leastconn sends new connections to the server with the fewest active connections. This is often chosen for applications where connection durations vary, since it can better equalize load.

source uses the client IP address to compute a hash, which then selects the backend server. This tends to send the same client IP to the same server consistently, which can provide a basic form of persistence without cookies for TCP or HTTP.

There are other more advanced options, such as uri or hdr, which hash on request properties, but the three above cover common fundamentals. The choice of algorithm depends on the nature of your application, how stateful it is, and the relative capacity of your backend servers.

If your application stores user session state only in memory on individual servers, use a consistent algorithm such as source or configure explicit stickiness. Otherwise users may lose session data when their requests move between servers.

Health Checks and Server States

Health checking is central to HAProxy’s reliability. The check keyword on a server line instructs HAProxy to probe the server regularly. If the server fails several checks in a row, HAProxy marks it as down and stops sending it traffic until it recovers.

In TCP mode, health checks are usually simple connection attempts on the configured port. In HTTP mode, you can perform HTTP specific checks that examine status codes or even certain response headers or bodies. For example, you can send a GET request to a dedicated health endpoint and consider a 200 status as healthy.

Health check intervals, timeouts, and thresholds can be tuned with additional options on server lines or in backend settings. Through these, you control how quickly HAProxy reacts to failures and how cautious it is before returning a server to service.

Timeouts and Connection Handling

Timeouts in HAProxy control how long it waits for certain events before giving up on a connection. They are usually configured in the defaults section, although they can be overridden where needed.

Common timeout settings include timeout connect, which limits how long HAProxy waits to establish a connection to a backend server, timeout client, which sets a maximum inactivity period on the client side, and timeout server, which does the same for the server side. There are additional timeouts for HTTP request headers and other stages.

These values must balance responsiveness and tolerance. Very short timeouts can cause HAProxy to break connections unnecessarily during transient slowdowns. Very long timeouts can make failures or hung connections linger and consume resources. For beginners, the default examples often provide a reasonable starting point, but in production environments timeouts are often tuned to match the application behavior.

Logging and Observability

HAProxy emits detailed logs that help you understand client behavior, errors, and performance. Logging is typically configured in the global and defaults sections and sent to the system logger, then stored in files such as /var/log/haproxy.log depending on your system.

HTTP mode logs usually contain the client IP, request line, HTTP status code, bytes sent, and timing fields that show where delays occurred. This granularity is useful for troubleshooting, especially when combined with health checks and backend statistics. TCP mode logs contain lower level connection information.

HAProxy also exposes a statistics interface if you enable it. This can be an HTTP page that displays information about each frontend, backend, and server, including current sessions, errors, and health check results. Enabling this interface is helpful for administration and is often restricted to trusted networks.

Running and Managing HAProxy

On most Linux distributions HAProxy is installed through the package manager and integrated with the system’s service manager. The configuration file is usually at /etc/haproxy/haproxy.cfg. Before restarting the service, you can verify the configuration with a built in check:

haproxy -c -f /etc/haproxy/haproxy.cfg

If the syntax is valid, you can then start, restart, or reload HAProxy using the system’s service commands. A reload, when supported by the build and configuration, allows HAProxy to pick up changes with minimal interruption by starting a new process with the new configuration and gracefully shutting down the old one.

Always test configuration changes with haproxy -c before reloading or restarting in production to avoid downtime caused by syntax errors.

Typical Beginner Pitfalls

New HAProxy users often make a few predictable mistakes. One is forgetting that mode must be consistent within a proxy. If a backend is declared in mode http, but you send non HTTP traffic to it, unexpected behavior will result. Another is using defaults from example configurations without adjusting timeouts or health checks for the specific application, which can lead to false positives or slow failure detection.

Incorrect or missing bind directives can also cause HAProxy to listen on the wrong interfaces or ports. Similarly, neglecting to open the corresponding ports in the system firewall prevents clients from connecting, even though HAProxy appears to be running.

Finally, it is easy to misconfigure backends by pointing to wrong IP addresses or ports, which only becomes obvious when examining logs and health check statuses. For this reason, it is helpful to test backends individually from the HAProxy host using command line tools before referencing them in the configuration.

Summary

HAProxy provides a flexible, high performance way to distribute connections across multiple servers by defining frontends that accept traffic and backends that contain pools of servers. Its operating modes, configuration sections, balancing algorithms, health checks, and timeout settings work together to deliver reliable and controlled load balancing behavior. Understanding these fundamentals prepares you to build more advanced configurations and integrate HAProxy into complex architectures that require high availability and scalability.

Views: 7

Comments

Please login to add a comment.

Don't have an account? Register now!