Kahibaro
Discord Login Register

InfiniBand

What Makes InfiniBand Different in HPC

InfiniBand is a high‑performance network technology widely used to connect nodes in HPC clusters. Within the broader topic of interconnects, InfiniBand is specifically designed for:

For HPC users, you’ll mostly encounter InfiniBand indirectly—through job scripts, MPI runs, and filesystem access—rather than configuring it yourself. Understanding its basic properties helps explain why some options and performance behaviors look the way they do.

Core Components and Terminology

Host Channel Adapter (HCA)

On each compute node, InfiniBand connectivity is provided by a Host Channel Adapter:

As a user, you might see references to InfiniBand devices in commands like ibstat or in MPI settings that refer to “HCAs” or specific device names (e.g., mlx5_0).

Switches and Fabric

InfiniBand switches connect nodes to form a “fabric”:

From a user’s viewpoint, this matters because:

Queue Pairs and Verbs (Conceptual)

Communication in InfiniBand is based on queue pairs (QPs) and a low‑level API often called “verbs”:

You typically do not program with verbs directly as a beginner; MPI and other libraries use them under the hood. But this model underlies InfiniBand’s efficiency and the advanced communication modes described next.

Communication Modes Relevant to HPC

Reliable Connection (RC) vs Unreliable Datagram (UD)

InfiniBand can support several communication modes. HPC MPI implementations most commonly use:

For you as a user, this is mostly invisible, but helps explain that InfiniBand is not just “faster Ethernet”—it provides richer communication semantics that MPI can exploit.

RDMA: Remote Direct Memory Access

Remote Direct Memory Access (RDMA) is one of InfiniBand’s key features:

In HPC:

You’re unlikely to enable RDMA explicitly as a beginner; instead, you benefit when using MPI libraries configured to exploit InfiniBand RDMA.

Connection vs Connectionless Use

At scale, maintaining a separate connection (RC) between every pair of processes can be expensive. MPI libraries might:

Users might see tuning options in MPI documentation about “dynamic connections,” “connectionless InfiniBand,” or “on‑demand connection setup,” which are strategies to manage these trade‑offs.

Performance Characteristics

Bandwidth and Latency

InfiniBand is engineered for high bandwidth and low latency:

Implications for HPC applications:

Message Size and Performance Regimes

InfiniBand performance typically shows:

When you benchmark your code, you might observe a “knee” in performance where the effective bandwidth increases significantly once messages become large enough.

InfiniBand and MPI/OpenMP/Hybrid Codes

MPI over InfiniBand

On InfiniBand clusters, MPI is typically built to use InfiniBand transport layers:

Practical pointers:

Node‑Local vs Network Communication

Modern InfiniBand stacks and MPI implementations can:

From a user perspective:

Practical User Interactions with InfiniBand

Recognizing InfiniBand on a Cluster

Common signs that a cluster uses InfiniBand:

You generally don’t configure these; system administrators handle it. But knowing that InfiniBand exists helps interpret performance expectations and tuning advice.

IP over InfiniBand (IPoIB)

Clusters may provide IP over InfiniBand:

However, for HPC code:

Job Scripts and Resource Selection

Some schedulers and environments expose InfiniBand‑related options:

When scaling up jobs, it’s good practice to:

Reliability, Congestion, and QoS

Reliability and Flow Control

InfiniBand implements hardware flow control and error detection:

As a user, this contributes to:

Congestion and Oversubscription

Despite high bandwidth, InfiniBand fabrics can still experience congestion:

Site documentation might:

Partitions and Quality of Service (QoS)

InfiniBand supports:

While typically configured by administrators, this can affect:

InfiniBand Generations and Compatibility

Evolution of InfiniBand Speeds

Major InfiniBand generations relevant to HPC:

For users:

Backward Compatibility

InfiniBand is designed with some degree of backward compatibility:

When comparing performance between systems, ensure you’re aware of:

InfiniBand and Parallel Filesystems

Parallel filesystems such as Lustre or GPFS may be deployed over InfiniBand:

For users, practical points include:

Summary: What You Should Remember as a Beginner

As you move on to topics like MPI and parallel filesystems, you’ll see how these software layers make use of InfiniBand’s features to deliver scalable HPC performance.

Views: 11

Comments

Please login to add a comment.

Don't have an account? Register now!