Kahibaro
Discord Login Register

5.5.4 High availability concepts

Goals of High Availability

High availability (HA) aims to keep services accessible despite failures and planned maintenance. For load balancers, HA means:

Typical goals are expressed as an “availability percentage”:

The higher the target, the more redundancy, complexity, and cost you accept.

Redundancy and Single Points of Failure

An HA design removes or mitigates single points of failure (SPOFs). In the load balancing context, look at:

Common strategies:

In load balancers, dual or multi load balancers are common: if any one fails, the VIP is moved or traffic is re-routed.

Failure Models and Failure Domains

You design HA based on which failures you expect and want to survive:

A failure domain is the scope of impact from a given failure:

HA design tries to contain failures inside small domains and ensures components that must operate together (e.g., active and standby load balancers) are not in the same easily-failing domain when possible.

Active/Passive vs Active/Active Architectures

Active/Passive

Pros:

Cons:

Used often with:

Active/Active

Pros:

Cons:

Often combined with:

Failover and Fencing

Failover Mechanisms

Failover is the process of moving service responsibility from a failed node to a healthy node. For load balancers, tasks include:

Detection mechanisms:

Fencing (STONITH)

Fencing ensures a failed or partitioned node cannot still be serving traffic or writing data after being declared dead by the cluster. This avoids split-brain, where two nodes both believe they are active.

Common fencing actions:

STONITH (“Shoot The Other Node In The Head”) is a classic term for forcibly removing a node to protect data and consistency. Even for load balancing, fencing is useful to avoid two nodes answering as the same VIP at once.

Quorum and Split-Brain

In multi-node clusters, quorum is how the cluster decides who is allowed to operate when communication is partially lost.

Common strategies:

In HA load balancer clusters, you might:

Health Checking and Failure Detection

The quality of your health checks directly influences HA behavior:

Considerations:

For load balancers:

Session State and HA

Load balancers often deal with stateful clients (e.g., web sessions). HA must consider what happens on failover or when an LB node disappears:

High availability options:

You must define acceptable client impact on failover:

High Availability vs Fault Tolerance vs Disaster Recovery

These terms are related but distinct:

In a load balancing context:

Availability Metrics: RTO and RPO

When designing HA, especially with multiple sites, you must define:

For pure load balancing layers (stateless), RPO often doesn’t apply directly, but:

Maintenance, Upgrades, and Testing

High availability is not just about unexpected failures; it also enables planned work without downtime.

Key practices:

Testing is critical:

Design Trade-Offs and Practical Guidelines

When applying HA concepts to load balancers:

These concepts form the foundation for building reliable, production-grade load balancing setups, whether you use HAProxy, Nginx, cloud-native load balancers, or a combination.

Views: 78

Comments

Please login to add a comment.

Don't have an account? Register now!