Kahibaro
Discord Login Register

Compute nodes

Role of Compute Nodes in an HPC Cluster

Compute nodes are the workhorses of an HPC cluster. Unlike login or head nodes, they are not meant for interactive work, compiling large software stacks, or running services. Their primary role is to execute user jobs scheduled by the batch system, typically in a non-interactive, highly controlled way.

Key characteristics:

You will rarely log in to compute nodes directly; instead, you request them via the job scheduler.

Typical Hardware Layout of a Compute Node

Compute nodes are designed around a balance of:

A common schematic for a CPU-only compute node:

Cores, Sockets, and NUMA Domains

Inside a compute node, you will encounter:

Even without going deep into microarchitecture, you need to be aware of:

NUMA (Non-Uniform Memory Access) is common on multi-socket nodes:

Tools you might encounter for examining this layout:

Accelerated Compute Nodes (GPU and Other Accelerators)

Many clusters have specialized compute nodes with GPUs or other accelerators.

Common features of GPU nodes:

You typically request these nodes via scheduler options like:

Important differences from CPU-only nodes:

Memory Characteristics on Compute Nodes

Compute nodes are configured with enough memory for large simulations and data processing workloads, but memory is still a finite, shared resource.

Total Memory and Per-Node Capacity

Typical scheduler options:

If you request too little memory:

If you request too much:

Large-Memory / “Fat” Nodes

Some clusters provide special large-memory compute nodes:

These nodes might be scarce and heavily contended; use them only when justified.

Local Storage on Compute Nodes

Compute nodes might have:

Typical uses for local storage:

Things to watch:

Network Connectivity of Compute Nodes

Compute nodes are connected to:

Common interconnect types include Ethernet and InfiniBand. For you as a user on compute nodes:

Some clusters have:

Software Environment on Compute Nodes

Compute nodes typically share the same software environment as login nodes, but with some important differences:

Common patterns:

Node Allocation and Usage Models

You rarely think about a single compute node in isolation; instead, you think about:

How you request resources affects:

CPU, Memory, and GPU Binding

When you run code on compute nodes, especially hybrid or multi-threaded codes, placement matters:

The job scheduler and MPI libraries often provide options to control this, such as:

Efficient use of compute nodes requires aligning your job’s parallel structure with the node’s hardware layout.

Accessing and Inspecting Compute Nodes

While you normally do not login directly to compute nodes, you might:

  srun hostname
  srun lscpu
  srun free -h

Typical constraints:

Best Practices When Using Compute Nodes

To use compute nodes effectively and fairly:

Understanding what compute nodes provide—and how they differ from login and management nodes—helps you craft job scripts and workflows that run efficiently at scale.

Views: 12

Comments

Please login to add a comment.

Don't have an account? Register now!