Kahibaro
Discord Login Register

2.2.3 Main memory (RAM)

Role of Main Memory in the Hierarchy

Main memory (RAM) sits between the fast but tiny levels (registers and cache) and the large but slow storage (SSDs/HDDs). In the hierarchy, RAM is:

For HPC, RAM size and bandwidth often limit:

Basic Characteristics of RAM

Volatile storage

RAM is volatile: its contents are lost when power is removed. This is why files must be saved to non-volatile storage (e.g. filesystem) for persistence.

Addressable memory

RAM is organized as a large array of bytes, each with a unique address:

Conceptually, you can think of it as:

$$
\text{RAM} \approx [\text{byte}_0, \text{byte}_1, \dots, \text{byte}_{N-1}]
$$

where $N$ is the total size in bytes.

Capacity vs. bandwidth vs. latency

Three key properties:

In HPC, it’s common for performance to be limited by bandwidth or latency rather than pure CPU speed.

Types of RAM Relevant to HPC

DRAM vs. SRAM (at a high level)

Main memory in modern systems is almost always DRAM.

DDR generations

Most HPC nodes use some generation of DDR (Double Data Rate) DRAM, such as:

Each new DDR generation generally increases:

You don’t usually program DDR directly, but the generation affects the node’s memory bandwidth and therefore performance.

High-Bandwidth Memory (HBM)

On some accelerators and newer CPUs, you may encounter HBM:

While HBM is covered more deeply in GPU/accelerator discussions, for this chapter it’s important to recognize it as an additional kind of main memory with different characteristics.

Organization of Main Memory in Nodes

Channels, DIMMs, and sockets

Main memory is physically organized into:

Bandwidth scales roughly with the number of active channels. For HPC:

Memory controllers

Each CPU socket contains one or more memory controllers:

From a programmer’s perspective, you don’t directly control the controller, but:

NUMA: Non-Uniform Memory Access

On modern multi-socket servers (and even some single-socket CPUs with chiplets), main memory is typically NUMA-organized:

This leads to:

Typical effects:

Memory Pages and Virtual Memory (HPC-Relevant Aspects)

Virtual vs. physical memory

The OS uses virtual memory:

For HPC users, notable consequences:

Page size and TLB effects

Memory is managed in units called pages (commonly 4 KB, but “huge pages” like 2 MB or 1 GB can also be used):

Implications:

Configuration of huge pages is typically a system-level concern, but many HPC applications and libraries provide options to use them.

Memory Bandwidth and Access Patterns

Sequential vs. random access

Performance of RAM depends strongly on how you access it:

Many HPC codes are designed to use data structures and loops that favor sequential access in the innermost loops.

Strided and scattered access

Common patterns with performance impact:

In performance analysis, these patterns show up clearly in memory bandwidth metrics.

Capacity Constraints in HPC

Memory per core and per node

HPC nodes are often described by:

Why this matters:

When designing jobs:

Memory footprint of applications

For HPC codes, typical major contributors to memory usage:

Memory reduction strategies belong to other chapters, but at this level:

Main Memory and Parallelism on a Node

Shared main memory for threads

On a single node:

Considerations:

Memory bandwidth as a scaling limit

When you increase the number of threads or processes per node:

This is why node-level performance studies often measure:

Practical Considerations for HPC Users

Checking node memory and usage

On HPC systems, you’ll frequently need to know:

Typical tools/approaches (names will vary by system):

Requesting memory in job schedulers

Schedulers typically allow you to request memory:

Good practice:

Details of the syntax and policies are handled in job scheduling chapters, but they are fundamentally about allocating access to main memory.

Avoiding swapping

When a node runs out of RAM:

For HPC runs:

Summary

Main memory (RAM) in HPC:

A basic understanding of these aspects is essential for reasoning about performance and for making sensible choices about job configuration and code structure on HPC systems.

Views: 37

Comments

Please login to add a comment.

Don't have an account? Register now!