Kahibaro
Discord Login Register

Threads and thread management

What is a Thread in Shared-Memory Programming

A thread is a lightweight execution unit within a process. In the shared‑memory setting:

For OpenMP in particular, you typically have:

Thread-based shared memory parallelism is about controlling:

Thread Lifecycle

A typical thread in a shared-memory program goes through these stages:

  1. Creation
    • The runtime (e.g., OpenMP library or pthreads) allocates resources for a new thread and starts its execution at a given function or code region.
  2. Running
    • The thread executes code, may enter and exit parallel regions, may synchronize with other threads, and may be scheduled onto different cores by the OS.
  3. Blocking / Waiting
    • The thread might wait at a barrier, lock, condition variable, or busy-wait loop, not doing useful work while it waits.
  4. Completion
    • The thread finishes its assigned work and exits the parallel region or its start routine.
  5. Joining / Reaping
    • Another thread (often the master) waits for the finishing thread and cleans up its resources.

In OpenMP, most of this lifecycle is hidden. You mark regions to run in parallel and the runtime handles creation, scheduling, and joining of threads.

Threads in OpenMP: The Basic Model

In OpenMP, threads are created and managed via parallel regions (covered elsewhere). Here we only focus on how the threads themselves are structured and controlled.

Inside a parallel region:

Example (C) to see thread identities:

#include <omp.h>
#include <stdio.h>
int main() {
    #pragma omp parallel
    {
        int tid  = omp_get_thread_num();
        int nt   = omp_get_num_threads();
        printf("Hello from thread %d of %d\n", tid, nt);
    }
    return 0;
}

Key ideas:

Controlling the Number of Threads

Choosing and controlling thread counts is a central part of thread management.

Global Thread Count Settings

You can set a default number of threads in several ways:

The runtime uses this as a hint. It may not always be honored exactly, depending on:

Per-Region Thread Control

You can also specify the team size for a particular parallel region using a clause:

#pragma omp parallel num_threads(4)
{
    // parallel work with exactly 4 threads (if possible)
}

Common patterns:

Master and Worker Threads

In a typical OpenMP program:

Roles:

You can restrict work to the master thread within a parallel region:

#pragma omp parallel
{
    // code executed by all threads
    #pragma omp master
    {
        // executed only by the master thread (no implicit barrier)
    }
    // code executed by all threads again
}

Note: master does not imply an implicit barrier at the end, unlike single (details of constructs are covered elsewhere).

Thread Affinity and Core Binding (Conceptual)

Thread affinity is about where threads run:

Common mechanisms (exact details are system and compiler dependent):

Basic idea:

You usually:

Nested Parallelism and Thread Teams

Nested parallelism means having parallel regions inside parallel regions.

Key terms:

For example:

#pragma omp parallel num_threads(2)
{
    int outer_tid = omp_get_thread_num();
    // Outer team: 2 threads
    #pragma omp parallel num_threads(3)
    {
        int inner_tid = omp_get_thread_num();
        // Inner team: potentially 3 threads *per outer thread*
    }
}

This can quickly multiply the total number of threads (up to 2×3 = 6 here, but often much more in real programs).

Control:

Typical HPC practice:

Dynamic vs Static Number of Threads

Some runtimes allow dynamically adjusting the number of threads during execution.

OpenMP controls:

In HPC:

Thread Management Overheads

Managing threads is not free. Important cost components:

  1. Creation and Destruction
    • Allocating stacks, setting up OS structures.
    • High cost if done repeatedly.
  2. Context Switching
    • When the OS switches the CPU from one thread to another.
    • Too many threads per core cause high context-switch overhead.
  3. Synchronization
    • Locks, barriers, atomics all have overhead.
    • Improper use can lead to more overhead than parallel speedup.
  4. Scheduling Within Regions
    • OpenMP must distribute loop iterations and tasks among threads.
    • Different scheduling strategies trade off overhead and load balance.

Practical implications for beginners:

Oversubscription and Resource Limits

Oversubscription occurs when you run more runnable threads than hardware execution contexts (e.g., cores or hardware threads).

Consequences:

Common oversubscription causes:

Management strategies:

Thread Safety and Library Use

Not all libraries are thread-safe. Thread safety issues include:

Thread management implications:

You should also be aware that:

Practical Thread Management Tips for Beginners

  1. Start Simple
    • Use a single outer parallel region to cover major work.
    • Set OMP_NUM_THREADS equal to the number of physical cores per node (unless advised otherwise).
  2. Avoid Nesting Until Needed
    • Keep nested parallelism off at first.
    • Only enable it when you have a clear design and understand the resource implications.
  3. Watch Affinity on HPC Systems
    • Use system or site documentation for recommended OMP_PROC_BIND and OMP_PLACES settings.
    • Test performance with and without binding.
  4. Minimize Parallel Region Overheads
    • Avoid repeatedly entering and exiting very small parallel regions in tight loops.
    • Group work into fewer, more substantial parallel regions.
  5. Coordinate with MPI and Other Libraries
    • Ensure the product of MPI ranks and threads per rank stays within hardware limits.
    • Check documentation of numerical libraries for their threading behavior.
  6. Measure, Don’t Guess
    • Use timing and profiling to see whether thread counts and management choices improve or degrade performance.
    • Adjust thread numbers, affinity, and scheduling policies based on evidence from runs on your target HPC system.

Summary

Thread and thread management in shared-memory programming involve:

These concepts provide the foundation needed to use OpenMP constructs effectively and to reason about performance and correctness in shared-memory parallel programs.

Views: 11

Comments

Please login to add a comment.

Don't have an account? Register now!