Kahibaro
Discord Login Register

Fundamentals of Computer Architecture

Big picture: why architecture matters in HPC

High-performance computing is fundamentally about how fast we can turn data and algorithms into results. Computer architecture is the “shape” and organization of the hardware that does this work. For HPC users, you don’t need to be a hardware engineer, but you do need enough architectural understanding to:

This chapter gives a conceptual map of modern processor-based systems as they relate to HPC. Later chapters will go deeper into topics such as memory hierarchies, GPUs, and vectorization.

Basic components of a compute node

An HPC compute node (or any modern server) is typically built from a small number of recurring components:

Logically, you can think of a node as a hierarchy of:

  1. Processing units
  2. Memory and storage levels
  3. Communication paths

HPC performance is about using all three effectively.

From instructions to execution

At the lowest level, all of your compiled program is a sequence of instructions that the CPU understands. A modern processor goes through several conceptual stages to turn your code into operations on data:

  1. Fetch: Get the next instruction from memory (usually from the instruction cache).
  2. Decode: Interpret the bits to determine the operation and its operands.
  3. Execute: Perform the computation (e.g., addition, multiplication, comparison).
  4. Memory access: Load data from or store results to memory.
  5. Write-back: Save the result into a register or memory.

In reality, processors overlap these stages and execute many instructions concurrently, using multiple techniques:

For HPC developers, you don’t control these mechanisms directly, but they set the context for:

Latency, bandwidth, and throughput

Three fundamental performance notions recur across architecture levels:

These appear at many layers:

HPC codes are often limited by:

Architectural features like caches, wide memory buses, vector units, and fast networks are all attempts to reduce effective latency and/or increase bandwidth and throughput.

Parallelism in modern architectures

Modern processors are deeply parallel at multiple levels. HPC performance comes from exploiting these levels together:

  1. Instruction-level parallelism (ILP)
    The CPU executes multiple independent instructions overlapping in time. This is largely automatic and handled by the hardware and compiler.
  2. Data-level parallelism (DLP)
    The same operation applied to many data elements (e.g. vector instructions, GPUs). This is the focus of SIMD/vectorization and GPU programming.
  3. Thread-level parallelism (TLP)
    Multiple threads running on different cores of the same CPU, sharing memory. Used by e.g. OpenMP.
  4. Process-level parallelism
    Many processes, possibly on many nodes, communicating via messages. Used by MPI.

The architecture of an HPC system directly determines what types of parallelism are available and how efficient they are.

Shared vs. distributed memory at the node scale

Architecturally, we often categorize systems by how processors access memory:

Modern HPC clusters are usually:

This architectural split is what drives the need for different programming approaches (threading vs. message passing vs. hybrid).

The memory hierarchy as an architectural principle

Although details are covered separately, one key architectural idea is that memory is organized hierarchically:

As you move away from the core, both latency increases and bandwidth typically decreases, but capacity increases. Architectures are designed so that:

Much of HPC optimization is about aligning code with this hierarchy.

Interconnects and system-level architecture

Beyond a single CPU, architecture includes how components are wired together:

The topology (how things are connected) and properties (latency, bandwidth, contention) of these interconnects affect:

Architecturally, clusters can be organized as:

For an HPC user, the main takeaway is that communication cost is highly non-uniform: talking to your own core’s registers is vastly cheaper than talking to a remote node.

Architectural trends relevant to HPC

Several long-term trends shape the design of modern HPC architectures:

Understanding these trends helps explain why:

Architectural abstractions for the programmer

To work effectively with architecture, it is helpful to adopt a few mental models:

These abstractions will underpin later discussions on parallel programming models, performance optimization, and the specific hardware components covered in subsequent chapters.

Views: 25

Comments

Please login to add a comment.

Don't have an account? Register now!