Kahibaro
Discord Login Register

Shared-Memory Parallel Programming

Goals of this Chapter

In this chapter you learn:

Details of specific programming models (like OpenMP syntax, thread directives, etc.) are covered in later subsections.

What Is Shared-Memory Parallel Programming?

In shared-memory parallel programming, multiple threads (or lightweight execution units) run on the same physical node and all access a single, shared address space. In practice:

Conceptually, you have:

This is in contrast to distributed-memory models, where each process has its own memory and data must be exchanged explicitly via messages.

Typical Hardware for Shared Memory

Shared-memory parallel programming typically targets:

As core counts increase on modern processors, shared-memory parallelism is the natural way to exploit all cores within a single node.

Shared vs. Distributed Memory: Conceptual Contrast

You will see both shared-memory and distributed-memory models in HPC. At a high level:

In practice, many real HPC applications use both models together (hybrid programming), but this chapter focuses only on the intra-node shared-memory aspect.

Basic Concepts and Mental Model

Before working with specific APIs, it is useful to adopt a mental model for shared-memory parallel programs.

Threads as Workers

Think of a shared-memory program as:

Unlike separate processes:

Shared Address Space

In the shared-memory model:

A core part of programming in this model is deciding which data is shared and which is private, to ensure correctness and performance.

Parallel Regions and Work Sharing

Most shared-memory models follow a pattern along these lines:

  1. Start as a single thread (serial code).
  2. Encounter a parallel region declaration.
  3. Create multiple threads to execute code within that region, possibly splitting the work among them.
  4. Join back into a single thread at the end of the region.

Pseudo-structure:

// serial code here
// begin parallel region
create a team of threads
  each thread executes code in the region
join threads
// back to serial code

How exactly work is split (“work sharing”) is defined by constructs in the programming model (e.g., for/parallel for in OpenMP) and is covered later.

Synchronization and Ordering

Because threads can access shared data concurrently, you must:

Conceptually, this is done using:

Synchronization details, race conditions, and correctness aspects are covered in the later subsections, but recognizing their necessity is essential before writing any shared-memory code.

Advantages of Shared-Memory Parallel Programming

Shared-memory models are popular in HPC for several reasons:

Within a single node, shared-memory programming is usually the most natural way to exploit all the cores.

Limitations and Challenges

Shared-memory parallelism also has important limitations that influence how and when you use it.

Scalability Limits

The total parallelism you can exploit is bounded by:

Beyond a certain point:

These issues motivate using distributed-memory or hybrid approaches for very large systems.

Contention and False Sharing

Shared-memory codes can suffer from:

Such problems can significantly degrade performance even when algorithms are otherwise well designed.

Correctness Issues

Because threads share data and run concurrently:

Later subsections on synchronization, race conditions, and debugging will address this in more detail.

Common Use Cases in HPC

Shared-memory programming is widely used in:

A typical pattern is:

Programming Models for Shared-Memory

There are several ways to program shared memory; in this course we focus on the models most relevant for HPC.

Common approaches include:

This chapter introduces the concepts common to all these models; later sections concentrate on OpenMP, which is the standard shared-memory model in HPC.

Basic Design Patterns

Several conceptual patterns recur in shared-memory parallel code:

Parallel Loops

Split iterations of a loop among threads:

Parallel Sections / Tasks

Different threads run different independent tasks:

Master-Worker Pattern

One thread acts as a coordinator:

This is a conceptual pattern; specific programming models provide different ways to implement it (e.g., task constructs, work queues).

Performance Considerations at a High Level

Even before diving into the details, it is important to keep some performance principles in mind:

These themes recur throughout the subsections on parallel regions, work-sharing, and performance in shared-memory codes.

How This Fits into the Overall Course

Within the larger HPC context:

In the following subsections, you will apply the conceptual model from this chapter to:

Views: 14

Comments

Please login to add a comment.

Don't have an account? Register now!