7.4 Work-sharing constructs

Table of Contents

What “work sharing” means in OpenMP

In shared-memory programming with OpenMP, a work-sharing construct tells the runtime how to divide the work of a region among the threads in a team. Unlike a parallel region (which creates threads), work-sharing constructs:

Do not create or destroy threads by themselves.
Simply distribute iterations or tasks to the threads that already exist in the current team.
Are usually used inside a parallel region or combined with it in a single directive.

The main OpenMP work-sharing constructs are:

for / do (loop work sharing)
sections
single
task (more advanced, sometimes treated separately, but conceptually similar)
workshare (Fortran-specific, for array syntax and similar constructs)

This chapter focuses on how these constructs divide work and the typical usage patterns, not on general OpenMP setup or threading basics.

The `for` / `do` construct (loop work sharing)

Loop work sharing is the most common pattern in OpenMP. Its purpose is to split loop iterations among threads.

Basic C/C++ form:

#pragma omp parallel
{
    #pragma omp for
    for (int i = 0; i < N; ++i) {
        // each iteration i is executed by exactly one thread
    }
}

Basic Fortran form:

!$omp parallel
!$omp do
do i = 1, N
    ! each iteration i is executed by exactly one thread
end do
!$omp end do
!$omp end parallel

Key properties:

Each iteration of the loop body is executed exactly once.
Iterations are partitioned among the threads.
There is an implicit barrier at the end of the for/do construct unless you add nowait.

Scheduling policies

for / do supports several schedule options that control how iterations are split:

schedule(static[, chunk])
schedule(dynamic[, chunk])
schedule(guided[, chunk])
schedule(runtime)
schedule(auto)

Only the work-sharing behavior is covered here; detailed performance implications belong to the performance chapter.

Static scheduling

#pragma omp for schedule(static)
for (int i = 0; i < N; ++i) { ... }

Iterations are divided before execution into contiguous blocks.
Each thread gets a fixed set of iterations.
With schedule(static, chunk), iterations are split into chunks of size chunk and assigned to threads in round-robin order.

Useful when:

Each iteration does similar work.
You want predictable mapping from iteration to thread.

Dynamic scheduling

#pragma omp for schedule(dynamic, chunk)
for (int i = 0; i < N; ++i) { ... }

Threads request chunks of iterations on demand at runtime.
When a thread finishes its current chunk, it takes the next available chunk.

Useful when:

Iteration work is irregular or unpredictable.
You want simple load balancing between threads.

Guided scheduling

#pragma omp for schedule(guided, chunk)
for (int i = 0; i < N; ++i) { ... }

Like dynamic, but chunks start large and get smaller over time.
Reduces scheduling overhead compared to pure dynamic while still balancing load.

Useful when:

Iteration times vary significantly.
N is large, and dynamic scheduling overhead would be noticeable.

Runtime and auto

#pragma omp for schedule(runtime)
for (int i = 0; i < N; ++i) { ... }

runtime: schedule is taken from environment variable OMP_SCHEDULE or runtime settings.
auto: lets the implementation choose the schedule; behavior is implementation-dependent.

These are mostly for tuning without recompiling or relying on compiler/runtime heuristics.

Using `nowait` to omit the barrier

By default, for / do has an implicit barrier at the end: all threads wait until every thread finishes its iterations.

You can skip this barrier with nowait:

#pragma omp parallel
{
    #pragma omp for nowait
    for (int i = 0; i < N; ++i) {
        // work on part 1
    }
    // No barrier here; threads may continue immediately
    #pragma omp for
    for (int i = 0; i < N; ++i) {
        // work on part 2
    }
}

You should only use nowait when:

Later code does not depend on all loop iterations being finished.
It is safe if some threads progress further while others are still in the loop.

Combined `parallel for` / `parallel do`

A shorthand combines creating the parallel region and distributing loop iterations:

#pragma omp parallel for schedule(static)
for (int i = 0; i < N; ++i) { ... }

Fortran:

!$omp parallel do schedule(static)
do i = 1, N
    ...
end do
!$omp end parallel do

This behaves (roughly) like:

#pragma omp parallel
{
    #pragma omp for schedule(static)
    for (int i = 0; i < N; ++i) { ... }
}

Use combined forms for simple loop-only parallel regions where you do not need extra code inside the parallel region outside the loop.

The `sections` construct

sections is a work-sharing construct for dividing different code blocks (not loop iterations) among threads.

C/C++:

#pragma omp parallel
{
    #pragma omp sections
    {
        #pragma omp section
        {
            // Work A
        }
        #pragma omp section
        {
            // Work B
        }
        #pragma omp section
        {
            // Work C
        }
    }
}

Fortran:

!$omp parallel
!$omp sections
!$omp section
    ! Work A
!$omp section
    ! Work B
!$omp section
    ! Work C
!$omp end sections
!$omp end parallel

Key properties:

Each section block is executed by exactly one thread in the team.
Every section is executed once.
Threads that do not get a section simply skip to the end (subject to the barrier).
There is an implicit barrier at the end of sections unless you add nowait.

Typical uses:

Running a small number of independent tasks in parallel:

E.g., reading different input files, precomputing tables, or performing independent setup steps.

Parallelizing code where each part is logically different, unlike a loop where all iterations are similar.

`sections` with `nowait`

#pragma omp parallel
{
    #pragma omp sections nowait
    {
        #pragma omp section
        { /* Work A */ }
        #pragma omp section
        { /* Work B */ }
    }
    // No barrier here; threads may proceed before all sections finish
}

Use nowait only when later code does not require the completion of all sections.

The `single` construct

single is a work-sharing construct that specifies code that should be executed by exactly one thread in a team, while the other threads skip it.

C/C++:

#pragma omp parallel
{
    // ... some parallel work ...
    #pragma omp single
    {
        // Only one thread executes this block
        // e.g., input, initialization, or logging
    }
    // implicit barrier by default
}

Fortran:

!$omp parallel
    ! ... some parallel work ...
    !$omp single
        ! Only one thread executes this block
    !$omp end single
!$omp end parallel

Key properties:

Exactly one thread executes the single block; which thread is implementation-dependent.
By default, there is an implicit barrier at end single: all threads wait until the single block finishes.

Typical uses:

Code that must be executed only once, but is logically within a parallel region:

Allocating shared data structures.
Reading input needed by all threads.
Writing output that must not be duplicated.

`single` with `nowait`

You can add nowait to avoid the barrier at the end:

#pragma omp parallel
{
    #pragma omp single nowait
    {
        // One thread executes this; others do not wait
    }
    // No barrier here
}

Use this only if other threads do not depend on the result of the single block immediately after it.

The `task` construct as a flexible work-sharing tool

While tasks are often treated separately in OpenMP documentation, they function as a more dynamic work-sharing construct, where work units (tasks) are created at runtime and scheduled onto threads.

Basic C/C++ form:

#pragma omp parallel
{
    #pragma omp single
    {
        for (int i = 0; i < N; ++i) {
            #pragma omp task
            {
                // work for element i
            }
        }
    }
}

Key work-sharing aspects:

Each task represents a piece of work that can be executed by any thread in the team.
Tasks may be created by one thread (often inside single) and executed by any thread.
Tasks allow irregular and nested parallelism (e.g., recursive algorithms, tree traversals).

Task-specific features (dependencies, taskloops, etc.) are covered elsewhere; here, the main point is that tasks provide dynamic work distribution, beyond simple loop or section division.

Fortran `workshare` (language-specific construct)

In Fortran, workshare is a construct that can automatically distribute certain array operations and constructs over threads. It is primarily relevant to Fortran array syntax and similar features.

Example pattern (conceptual):

!$omp parallel
!$omp workshare
    A = B + C   ! array operation; iterations over elements are shared
!$omp end workshare
!$omp end parallel

Key idea:

The compiler identifies element-wise work in the enclosed region and distributes it across threads.
This is mostly used in Fortran-centric codes that rely on array syntax/operations.

Detailed behavior and best practices around workshare are typically Fortran-specific and depend on compiler support.

Choosing between work-sharing constructs

Some common decision guidelines, focusing purely on how work is shared:

Use for / do when:

You have a loop with many similar, independent iterations.
You want explicit control over iteration scheduling (static, dynamic, etc.).

Use sections when:

You have a small number of different code blocks, each doing a different task.
You want each block to run in parallel with the others exactly once.

Use single when:

You need code that runs once inside a parallel region, and other threads should skip it.
Examples: one-time initialization, I/O, or task creation.

Use task when:

Work units are irregular, created dynamically, or structured recursively.
You want the runtime to dynamically balance these work units across threads.

Use workshare (Fortran) when:

You have Fortran array syntax or related constructs that you want automatically parallelized over threads.

Interactions with data scoping and synchronization

Work-sharing constructs interact strongly with:

Data scoping (shared, private, firstprivate, reduction, etc.).
Synchronization (implicit barriers, nowait, and explicit constructs like critical, atomic, and locks).

Only the work-sharing aspect is emphasized here:

Every work-sharing construct may have an implicit barrier at its end; nowait can be used to remove it where safe.
Data sharing attributes on variables can significantly affect correctness and performance when work is divided among threads.

The detailed rules and best practices for data scoping and synchronization are discussed in other chapters; when using work-sharing constructs, always ensure:

Each piece of work has the necessary private or shared variables correctly specified.
You understand whether the implicit barrier should remain or be removed (nowait), based on dependencies between threads.

Comments

Please login to add a comment.

Don't have an account? Register now!

7.4 Work-sharing constructs

What “work sharing” means in OpenMP

The `for` / `do` construct (loop work sharing)

Scheduling policies

Static scheduling

Dynamic scheduling

Guided scheduling

Runtime and auto

Using `nowait` to omit the barrier

Combined `parallel for` / `parallel do`

The `sections` construct

`sections` with `nowait`

The `single` construct

`single` with `nowait`

The `task` construct as a flexible work-sharing tool

Fortran `workshare` (language-specific construct)

Choosing between work-sharing constructs

Interactions with data scoping and synchronization

Comments

Where to Move