Table of Contents
Introduction
Session persistence is the set of techniques that make sure a client continues to talk to the same backend server across multiple requests, even though a load balancer sits in front of a pool of servers. For many stateful applications this is critical, because application state, in‑memory cache entries, or file handles may live only on one backend. This chapter focuses on what session persistence is, why it matters, and the concrete mechanisms typically used with Linux based load balancers such as HAProxy and Nginx. It assumes that the general ideas of load balancing have already been introduced elsewhere.
Why Session Persistence Matters
In a simple stateless HTTP API, any request from a client can be handled by any backend server. In that case the load balancer can freely distribute requests to maximize throughput and availability. Many real applications are not fully stateless. A shopping cart may be stored in server memory. A login session may be represented by a structure in a local cache. A file upload might be stored in a temporary directory while it is being processed.
Without persistence, a user might authenticate against one backend, then their next request might reach another backend that has no knowledge of that login. The user appears to be randomly logged out or to lose data. Session persistence solves this by creating a stable mapping between a client session and a backend, at least for some duration.
There is always a trade‑off. Stronger persistence usually improves user experience for stateful applications, but can lead to uneven load distribution and can make failover more complex. Designing persistence involves balancing user experience, performance, and resilience.
Core Concepts and Terminology
Session persistence is often called “sticky sessions.” The basic elements are a session identifier, a mapping from that identifier to a backend server, and a policy that defines how long the mapping remains valid.
A session identifier can take several forms. It might be the client IP address, a value in a cookie, some data in the HTTP headers, or even a field inside a TLS session. Once the load balancer sees a particular identifier, it uses an internal table or a deterministic algorithm to route all matching requests to the same backend.
The validity of this mapping is defined by a timeout or lifetime. If a session is idle for too long, the mapping is removed. If a backend fails, mappings to that backend must be cleared or migrated depending on what the application can tolerate.
The more entropy or variety there is in the session identifier, the more evenly clients can be distributed across backends. The less entropy there is, the more risk of “hot spots” where one backend receives much more traffic than others.
IP Based Persistence
One of the simplest forms of session persistence is based on the client’s IP address. The load balancer uses the source IP as the session key, and always sends traffic from that IP to the same backend, at least while the backend is healthy.
This can be implemented as a direct table mapping IP addresses to backends, or with a hash function. In a hash approach, the load balancer calculates something like
$$
\text{backend\_index} = \text{hash}(\text{client\_ip}) \bmod N
$$
where $N$ is the number of backend servers. As long as the set of backends does not change, the same IP will hash to the same backend.
IP based persistence works best when each real user has a stable, distinct IP. In practice many users sit behind network address translation (NAT) devices or proxies. Hundreds or thousands of users can appear to share one public IP. In that case, IP persistence collapses many users onto a single backend, causing uneven load. In mobile networks, client IPs can also change, which breaks persistence.
Because of these limitations, IP based persistence is often used only where you have control over the client networks, for example between services in a data center, or as a fallback when more precise identifiers are not available.
Cookie Based Persistence
Cookies are the most common mechanism for HTTP session persistence. The idea is straightforward. When the load balancer first sees a client, it selects a backend, forwards the request, and also injects or rewrites an HTTP cookie in the response that identifies that backend. On subsequent requests, the client’s browser sends the cookie. The load balancer reads the cookie and routes the request to the correct backend.
There are two main strategies. In server insert mode, the load balancer creates its own cookie, such as LB_ROUTE=backend2. The application is not aware of this cookie. The load balancer maintains a mapping between cookie values and backends. In rewrite mode, the load balancer modifies or tracks an existing application cookie, such as a session ID, and maps that to a backend.
A cookie based approach supports HTTPS traffic because the cookie is in the HTTP layer after TLS is terminated at the load balancer. It can also survive client IP changes, because the identifier is independent of the network address.
Persistence lifetime depends on cookie configuration. A session cookie without an explicit expiration disappears when the browser is closed, which is appropriate for many interactive sessions. A persistent cookie with an expiration time allows the user to resume a session after closing and reopening the browser, but raises security considerations and needs careful timeout settings.
You can secure cookies with attributes such as Secure and HttpOnly so they are only sent over HTTPS and not accessible to client side scripts. That reduces the risk that the persistence identifier is stolen or misused.
Cookie based session persistence must be designed with security in mind. Never store sensitive information directly in the cookie without encryption, and always validate session data on the backend.
Header and URL Based Persistence
In some environments, especially with APIs or microservices, persistence is based on custom headers rather than cookies. A client or upstream gateway might send a header such as X-Session-ID or X-User-ID. The load balancer uses the header value as the session key.
This method is useful when you have non browser clients that do not support cookies, such as mobile apps or command line tools. It also gives application developers more explicit control over how sessions are identified. The load balancer configuration simply names the header that should be hashed or mapped.
URL based persistence is similar, but uses part of the request path or query string. For example, a URL might contain /tenant/42/ and the load balancer can route all traffic for tenant 42 to a particular backend. This is handy in multi tenant applications, but it exposes the identifier in the URL, which might be logged or cached by proxies and browsers.
Both header and URL based persistence make the identifier visible anywhere the request is logged. That can be helpful for tracing, but you must avoid including confidential data. They are best used with opaque, random identifiers that have no direct meaning outside the application.
Hash Based Persistence
Hash based persistence uses a deterministic function over one or more request attributes to choose a backend, without maintaining a large session table. Instead of remembering that client A is mapped to backend B, the system computes a function like
$$
\text{backend\_index} = \text{hash}(\text{key}) \bmod N
$$
on every request. The key can be the client IP, a cookie, a header, or a combination of fields. As long as the key and the backend list remain the same, the same client flows to the same backend.
To improve stability when backends are added or removed, some load balancers support consistent hashing. With consistent hashing, only a subset of mappings change when you add or remove a server, which reduces the scale of disruption across active sessions.
Hash based persistence avoids the memory overhead of explicit session tables and can be more robust in very high traffic environments. It also integrates well with caching layers where shard boundaries are defined by hash ranges.
The main limitation is that you cannot “forget” a session entry selectively without changing the key or the cluster membership. Session expiry is not explicit. If you need strict session lifecycle control, table based persistence with timeouts is often preferable.
Stateful vs Stateless Application Sessions
The way you design persistence depends heavily on whether the application can be treated as stateless or not. If an application keeps all user state in a shared database or cache that is reachable from every backend, then the load balancer can be fully stateless. You can often remove persistence entirely and rely on pure load distribution algorithms.
If an application keeps significant state in memory on one server, then you must use persistence or refactor the application. Sticky sessions are a practical tool, but they hide underlying coupling. Over time many architectures move from strict session persistence to more stateless designs to improve scalability and resiliency.
One common intermediate approach is to combine a small amount of persistence with a shared session store. For example, you might keep short lived in progress operations on a single backend while the core authentication state lives in a database or key value store. The persistence timeout can then be shorter, reducing long term imbalance.
Session Timeouts and Lifetime Management
Session persistence must always include rules for expiry. Without timeouts, mappings would accumulate indefinitely and could hold onto resources or enforce outdated routing decisions long after the user has gone away.
There are usually two relevant timers. One is an idle timeout, which specifies how long a session can be inactive before its mapping can be removed. The other is a hard maximum lifetime, which limits how long a session can exist regardless of activity. For sensitive services, this might be set to a few hours. For less sensitive, long lived sessions, it might be longer.
Timeouts must be coordinated between the load balancer and the application. If the application’s authentication token expires earlier than the load balancer’s mapping, the user might remain sticky to a given backend but still need to re authenticate. The persistence layer should not conflict with the security model.
Always define explicit session timeouts on both the load balancer and the application. Unbounded session lifetimes increase security risk and can lead to resource leaks and uneven load.
Handling Failover with Sticky Sessions
Session persistence interacts closely with failover. When a backend goes down, all sessions pinned to that backend must be handled. There are several strategies, each with different implications.
You can simply drop the mappings that pointed to the failed backend. New requests are then assigned to healthy servers, and users may need to log in again or repeat in progress operations. This is the simplest approach, and is often acceptable if downtime is rare and sessions are short.
You can also design the application so that after failover, another backend can rebuild user state from a shared store, such as a database or distributed cache. In that case, persistence can be relaxed after a failure. Most load balancers will remove the failed backend from the pool, and sessions will be reconstructed on first contact with a healthy node.
In more advanced setups, you can migrate session mappings, especially when the persistence key is a cookie or token independent of the backend identity. Consistent hashing can also minimize remapping, but cannot guarantee that each specific former mapping will survive a failure.
The critical point is that load balancer persistence does not magically replicate state. It only routes traffic. High availability in stateful systems still requires application level planning and data replication.
Impact on Load Distribution and Scaling
Sticky sessions influence how evenly traffic is spread across backend servers. In a simple round robin distribution without persistence, each new request is assigned in turn, and the distribution tends to be even. With persistence, the distribution depends on the characteristics of the session identifiers.
If some users are much more active than others, the backends that end up hosting those users can become hotspots. If persistence is based on IP, and some IPs represent entire corporate networks or large NAT pools, those backends can be overwhelmed compared to others.
One way to mitigate this is to combine persistence with load based algorithms. Some load balancers support “least connections” or similar strategies but still respect stickiness when possible. Another approach is to apply persistence only to specific paths such as login or checkout flows, and remain stateless for static content and read only endpoints.
Autoscaling can also be complicated. If you dynamically add or remove backends, hash based persistence may reshuffle mappings. Table based persistence might slowly adapt as sessions expire, but new servers can sit mostly idle until more sessions attach. Monitoring is essential so that you know when persistence is harming balance.
Security Considerations
Session persistence introduces additional security surfaces. The persistence identifier, whether cookie, header, or token, becomes a new target. Attackers may try to guess or steal identifiers to hijack sessions or to cause unbalanced load.
Identifiers should be random or derived from cryptographic material, not from predictable patterns like user IDs. For example, a cookie that encodes the backend ID should be signed or encrypted by the load balancer so it cannot be forged. The application should never trust the persistence cookie as proof of identity; it only indicates routing information.
Logging configuration must also be reviewed. If identifiers allow correlation of user behavior, they may fall under privacy regulations. You may need to anonymize or minimize log content, and define retention policies accordingly.
Rate limiting and connection caps should still apply even when persistence is active. A “sticky” attacker should not be able to overload one backend indefinitely. Many load balancers allow per IP or per session rate thresholds that coexist with session persistence.
Testing and Observability
To use session persistence effectively, you must be able to test and observe how it behaves. Basic checks involve verifying that after a login, successive requests from the same client go to the same backend. You can expose backend identity in a debug header or a small diagnostic endpoint, then watch it as you refresh the page or perform actions.
From the load balancer itself, you should monitor the number of active sessions per backend, session table size, and distribution of keys. If you use cookies, you can inspect them in the browser’s developer tools. For APIs, you can script requests with tools such as curl or HTTP clients and confirm that headers or cookies persist across calls.
In production, imbalance across backends can indicate problems with the persistence configuration or with the user population. Correlate backend CPU, memory, and connection counts with session mappings. Many load balancers expose statistics endpoints or logs containing sticky session events.
When changing persistence strategy, perform controlled rollouts. For example, switch a small percentage of traffic from IP based to cookie based persistence, then validate session continuity, error rates, and backend utilization before completing the migration.
Design Guidelines
When deciding how to implement session persistence, consider the following questions. How stateful is the application, and can you reduce statefulness over time. What identifiers are available from clients, and are they stable enough. What are the security requirements for session handling, and do they dictate particular token structures or cookie settings. How much skew in backend load can you tolerate, and can you compensate with additional instances or different algorithms.
For browser facing web applications, cookie based persistence at the load balancer is usually the most flexible choice. For API and service to service traffic, header or token based persistence is often more explicit and controllable. IP based persistence should be reserved for special cases where network topology is known and stable.
::::::::::::::::::::::::::::::danger
Session persistence is a routing tool, not a substitute for proper session management in the application. Design application sessions, storage, and security first, then choose a persistence method that supports that design.
::::::::::::::::::::::::::::::