Table of Contents
Why Session Persistence Matters in Load Balancing
Session persistence (also called “sticky sessions”) is about making sure that a user keeps hitting the same backend server across multiple requests, instead of being freely bounced between nodes by the load balancer.
At HTTP level, most applications maintain “session state” in memory or on local disk (for example, login status, cart contents, wizard steps). If the next request is routed to a different backend that doesn’t know that state, you see:
- Users “randomly” logged out
- Lost items in shopping carts
- Broken multi‑step forms or wizards
- Inconsistent application behavior
Session persistence is a workaround to keep a stateful app working behind a load balancer, by tying a client to a specific backend server for some period of time.
It’s not the same as:
- Session replication: sharing session data between backends
- Stateless app design: storing all state externally (DB, cache, token)
Those are longer‑term, more scalable solutions. Session persistence is a simpler, infrastructure‑side solution, but it has trade‑offs.
Key Concepts and Trade‑offs
Stickiness duration
Persistence doesn’t usually last forever. Common timeframes:
- Short‑lived sessions: 5–30 minutes (e.g., simple web apps)
- Longer sessions: hours (user portals, dashboards)
Important points:
- Too short: users may be re‑assigned mid‑workflow.
- Too long: load gets “stuck” on particular backends, and draining nodes takes longer.
- Timeouts should be aligned with application session lifetimes (e.g., if app session expires at 30 minutes, cookie stickiness of ~30–45 min is typical).
Granularity of affinity
What is “stuck” to what:
- Client ↔ backend: Most common model: a client IP or cookie maps to a specific backend.
- Client ↔ backend subset (pool): Sometimes you only need to keep a user in a subset (e.g., same shard or AZ), not on a single node.
- Session ↔ backend: In some systems, individual logical sessions (IDs) choose a backend.
The more tightly you bind, the more you risk uneven workloads if clients are not evenly distributed.
Impact on load distribution
Session persistence restricts the load balancer’s freedom to pick the best server on each request:
- Pros
- Simpler for stateful apps
- Fewer cross‑node session data issues
- Cons
- Uneven load: if a few heavy users stick to a server, it gets hot
- Scaling out is less effective until new stickiness assignments are made
- Draining or replacing nodes is slower and trickier
In multi‑LB setups (e.g., anycast, multiple HAProxy instances), you must ensure that all LBs share enough information (or use a scheme that doesn’t require sharing) for persistence to work correctly.
Common Strategies for Session Persistence
1. Source IP–based persistence
The simplest technique: the load balancer chooses a backend based on the client IP address and keeps using that mapping.
How it typically works
- The LB uses a hash function of the source IP (optionally plus port) to select a backend.
- The mapping is cached in memory with a timeout.
- Subsequent connections from the same IP during that timeout go to the same backend.
Conceptually:
$$
\text{backend} = \text{hash}(\text{client\_ip}) \mod N
$$
where $N$ is the number of backends.
Advantages
- No cookies or app changes needed.
- Works for non‑HTTP protocols (TCP, some UDP).
- Simple to deploy and reason about.
Limitations
- NAT and proxies: Many users appear as a single source IP (e.g., corporate proxy, mobile carrier CGNAT), causing extremely uneven load.
- Dynamic IPs: Users with changing IPs (mobile users, some ISPs) may lose stickiness.
- IPv6 considerations: Entire /64 may represent a single client; you may want to hash on a prefix.
When to use
- Internal environments where each client has a distinct IP.
- Lightweight internal apps without cookie control.
- Non‑HTTP protocols relying on TCP‑level load balancing.
2. Cookie‑based persistence
For HTTP/HTTPS, cookies are the most common and flexible form of stickiness.
Two main patterns:
- LB‑generated cookie: The load balancer sets its own cookie and uses it to select a backend.
- App‑generated cookie: The LB reads a cookie set by the application to decide which backend to use.
LB‑generated cookie
Flow:
- First request: LB picks a backend (via its normal algorithm, e.g., round‑robin).
- LB adds a cookie to the HTTP response, e.g.
Set-Cookie: SRV_ID=backend3; Path=/; HttpOnly. - Client sends that cookie on future requests.
- LB reads
SRV_IDand routes to the corresponding backend.
Properties:
- The application doesn’t need to know or care about this cookie.
- The LB controls the cookie name, value, lifetime, and attributes.
You typically configure:
- Cookie name (e.g.,
SRV_ID,ROUTEID) - Cookie lifetime (session vs persistent cookie)
- Cookie scope (path, domain, secure/httponly attributes)
App‑generated cookie (routing based on app session ID)
Sometimes the app already has a cookie that uniquely identifies the session, such as JSESSIONID, PHPSESSID, or a custom ID. The LB can:
- Extract the session ID from the cookie.
- Hash the ID to select a backend.
- Optionally cache the mapping in a stick table.
This lets the app own the cookie semantics while the LB provides consistent routing.
Advantages:
- Works across multiple load balancers without sharing internal LB cookies if the mapping is purely hash‑based.
- No extra “mystery” cookie visible to the application.
Caveats:
- You must avoid leaking backend identity in the cookie if you don’t want users to infer topology.
- Regenerating session IDs mid‑session can cause re‑routing.
Failure scenarios and behavior
You must decide what happens if:
- A backend goes down but cookies still point to it.
- Backends are scaled up/down and cookie mappings are outdated.
Typical options:
- Failover to new backend and issue new cookie (most common).
- Optionally try to recover the session via shared storage or replication at app level.
From the user’s perspective, failing over may:
- Log them out
- Lose cart contents
- Break in‑flight workflows
You should test failure behavior explicitly.
3. URL parameter–based persistence
In some legacy or special cases, the session ID lives in the URL (e.g., ;jsessionid=... or ?session_id=...). The LB can:
- Parse the URL path or query string.
- Extract the session identifier.
- Use it to select a backend (hash or lookup).
This is less common today due to:
- Security concerns (session IDs in logs, referrers, bookmarks).
- Practical issues (users sharing URLs that include session IDs).
Use only if cookies are not an option and you fully understand the risks.
4. SSL/TLS session and connection‑based stickiness
For HTTPS, if the LB terminates TLS, you’re back to HTTP cookies/IP/etc. If it doesn’t terminate (TCP pass‑through), you have fewer tools:
- Connection persistence: Keep a single TCP connection pinned to a backend. Works for a stream of requests (HTTP/1.1 with keep‑alive, websockets).
- TLS session ID / session tickets: Some advanced proxies can use SSL session properties for affinity, but this is less common and more complex.
In long‑lived connections (websockets, HTTP/2 multiplexed over a single TCP connection), the connection itself is the unit of stickiness.
Implementation Patterns in Practice
This section focuses on design choices rather than configs for specific software (those are covered in the Apache/Nginx/HAProxy chapters).
Combining persistence with load‑balancing algorithms
Persistence sits “on top of” your base algorithm:
- For a request without an existing stick key:
- Use the base algorithm (round‑robin, least‑connections, etc.).
- Assign a stick key (cookie, IP, session hash).
- For subsequent requests with a stick key:
- Route directly to the associated backend, bypassing algorithm choice.
As a result:
- Base algorithm mostly affects initial distribution.
- Persistence dominates long‑lived sessions.
For highly stateful apps, “least connections” matters less once stickiness is in place. For very short sessions or many anonymous requests (static content), persistence may not be necessary and can be disabled.
Persistence and high availability
Multi‑node LB setups introduce questions:
- If you have two or more LBs in front of the same backends:
- Do they share any persistence table/state?
- Or can each LB independently compute routing from shared information (e.g., hashing a session ID)?
Options:
- Shared state (e.g., stick tables replicated between HAProxy nodes)
- More complex to configure.
- Supports true failover mid‑session if the first LB crashes.
- Stateless deterministic mapping (hashing a session ID or IP)
- No shared state needed.
- Works well if you keep the backend set stable or use consistent hashing.
For active‑passive LB pairs (VRRP, keepalived, etc.), a simple approach is:
- Primary LB holds persistence table in memory.
- If it fails, sessions may be lost and users may be re‑routed as if new.
- In many apps that’s acceptable.
For active‑active setups (BGP, anycast), you typically need either:
- Deterministic hash‑based routing independent of in‑memory tables, or
- Some form of shared/replicated persistence.
Session persistence and scaling
Persistence interacts with scaling in several ways:
- Scaling out (adding servers):
- Existing sticky clients often remain pinned to old servers.
- New servers mostly receive new sessions only.
- The immediate effect can be minimal if most load is from long‑lived sessions.
- Scaling in (removing servers):
- Stickiness tables and cookies may still reference removed servers.
- You need a drain phase: stop sending new sessions to a node but keep old ones until they expire.
- After a timeout, remove the node.
Operational patterns:
- Use connection draining or maintenance mode features on the LB to gracefully remove backends.
- Align drain timeouts with sticky session lifetimes.
- Monitor “sessions still attached to node X” before actually shutting it down.
Session persistence and microservices / APIs
Many APIs and microservices aim to be stateless. In that case, you often:
- Avoid persistence deliberately to get the best load distribution.
- Display caution using IP‑based persistence because it harms fairness without benefit.
Cases where persistence is still useful in service architectures:
- Services that maintain in‑memory caches keyed by user/session (for performance).
- Long‑running operations that store temporary context on a specific node.
- WebSocket or streaming services where connection locality matters.
Evaluate:
- Is your service truly stateless?
- Is state stored in external systems (DB, Redis, etc.)?
- Can you redesign stateful parts to avoid persistence needs?
When you can, dropping persistence simplifies ops and scaling enormously.
Security and Privacy Considerations
Cookie attributes
For LB‑managed cookies:
- Use
Securefor HTTPS sites to prevent sending cookies over HTTP. - Use
HttpOnlyif the cookie doesn’t need to be accessed by client‑side scripts (it usually doesn’t). - Set a reasonable expiration aligned with your policy.
- Avoid embedding sensitive information (user IDs, internal hostnames).
Example attributes:
SRV_ID=abc123; Path=/; HttpOnly; Secure; SameSite=Lax
(Exact syntax varies; see your load balancer docs.)
Information leakage
Don’t expose internal topology or server names in:
- Cookie values
- URL parameters
- Custom headers visible to the client
Instead of SRV_ID=backend1.example.internal, prefer an opaque ID or hash that only the LB can interpret.
Predictable stickiness and abuse
Attackers may exploit persistence for:
- Targeted DoS: Flooding a single backend by crafting stick keys that all map to it (if hash function and mapping are predictable).
- Enumeration of infrastructure: Inferring the number of backends from cookie patterns.
Mitigations:
- Use robust, non‑trivial hashing or mapping.
- Avoid direct
mod Non obvious identifiers if you fear adversarial traffic. - Rate‑limit per client or IP regardless of persistence.
Observability and Troubleshooting
What to log
To debug and monitor session persistence, capture:
- Stick key (cookie value, IP, session ID hash)
- Backend chosen
- Reason (persistent vs new routing)
- Session start and end times, if supported
For HTTP logs, you might add fields like:
- Chosen backend name
- LB cookie value
- Application session ID (hashed or truncated for privacy)
Common issues and how to reason about them
Users randomly logged out or losing carts
- Is persistence enabled at all?
- Are you terminating TLS at the LB (can it see the cookie)?
- Are there multiple LBs not sharing persistence data and not using deterministic hashing?
- Are cookies being stripped or modified by some intermediary?
Uneven load despite round‑robin configuration
- Are sticky sessions enabled and dominating routing?
- Are many users behind a small set of NAT IPs?
- Are there a few very heavy sessions stuck on a subset of servers?
You might:
- Move from IP to cookie‑based persistence.
- Shorten persistence timeouts.
- Move more state into shared external storage so you can remove or reduce stickiness.
Issues during deployments
- Users hit errors when a backend is drained or restarted:
- Perhaps LB does not properly detect backend down and still honors stickiness until TCP failures.
- You might need better health checks, draining configuration, or app‑level graceful shutdown.
You should integrate:
- Health checks that fail before killing the process.
- Draining so existing sessions finish, but no new sessions are assigned.
Designing a Strategy for a Real Application
When deciding on a session persistence strategy, answer these questions:
- How does the application manage state?
- In‑process memory, local disk, external DB/cache, tokens, etc.
- Can the app be made stateless (or “less stateful”)?
- Sometimes moving just login state out of memory avoids the need for persistence for most endpoints.
- What is the typical session lifetime and workflow?
- Short, bursty interactions vs long‑running sessions.
- What are the client patterns?
- Many NAT users, mobile networks, internal LAN, mix?
- How critical is perfect session continuity?
- Is it acceptable that on rare failure events users must re‑login?
Then choose:
- No persistence for truly stateless APIs and content delivery.
- Cookie‑based persistence for web apps where login and cart continuity matter.
- IP‑based or other methods for non‑HTTP or constrained environments.
Finally:
- Explicitly document and test failure scenarios: backend crash, LB failover, scaling events.
- Align all timeouts (LB persistence, app session timeout, backend drain timeouts).
- Monitor stickiness behavior in production and revisit as the app evolves.
This chapter’s goal is to help you recognize when and how to apply session persistence in load‑balanced environments, and what operational and architectural consequences that choice brings.