Table of Contents
What “work sharing” means in OpenMP
In shared-memory programming with OpenMP, a work-sharing construct tells the runtime how to divide the work of a region among the threads in a team. Unlike a parallel region (which creates threads), work-sharing constructs:
- Do not create or destroy threads by themselves.
- Simply distribute iterations or tasks to the threads that already exist in the current team.
- Are usually used inside a
parallelregion or combined with it in a single directive.
The main OpenMP work-sharing constructs are:
for/do(loop work sharing)sectionssingletask(more advanced, sometimes treated separately, but conceptually similar)workshare(Fortran-specific, for array syntax and similar constructs)
This chapter focuses on how these constructs divide work and the typical usage patterns, not on general OpenMP setup or threading basics.
The `for` / `do` construct (loop work sharing)
Loop work sharing is the most common pattern in OpenMP. Its purpose is to split loop iterations among threads.
Basic C/C++ form:
#pragma omp parallel
{
#pragma omp for
for (int i = 0; i < N; ++i) {
// each iteration i is executed by exactly one thread
}
}Basic Fortran form:
!$omp parallel
!$omp do
do i = 1, N
! each iteration i is executed by exactly one thread
end do
!$omp end do
!$omp end parallelKey properties:
- Each iteration of the loop body is executed exactly once.
- Iterations are partitioned among the threads.
- There is an implicit barrier at the end of the
for/doconstruct unless you addnowait.
Scheduling policies
for / do supports several schedule options that control how iterations are split:
schedule(static[, chunk])schedule(dynamic[, chunk])schedule(guided[, chunk])schedule(runtime)schedule(auto)
Only the work-sharing behavior is covered here; detailed performance implications belong to the performance chapter.
Static scheduling
#pragma omp for schedule(static)
for (int i = 0; i < N; ++i) { ... }- Iterations are divided before execution into contiguous blocks.
- Each thread gets a fixed set of iterations.
- With
schedule(static, chunk), iterations are split into chunks of sizechunkand assigned to threads in round-robin order.
Useful when:
- Each iteration does similar work.
- You want predictable mapping from iteration to thread.
Dynamic scheduling
#pragma omp for schedule(dynamic, chunk)
for (int i = 0; i < N; ++i) { ... }- Threads request chunks of iterations on demand at runtime.
- When a thread finishes its current chunk, it takes the next available chunk.
Useful when:
- Iteration work is irregular or unpredictable.
- You want simple load balancing between threads.
Guided scheduling
#pragma omp for schedule(guided, chunk)
for (int i = 0; i < N; ++i) { ... }- Like dynamic, but chunks start large and get smaller over time.
- Reduces scheduling overhead compared to pure dynamic while still balancing load.
Useful when:
- Iteration times vary significantly.
Nis large, and dynamic scheduling overhead would be noticeable.
Runtime and auto
#pragma omp for schedule(runtime)
for (int i = 0; i < N; ++i) { ... }runtime: schedule is taken from environment variableOMP_SCHEDULEor runtime settings.auto: lets the implementation choose the schedule; behavior is implementation-dependent.
These are mostly for tuning without recompiling or relying on compiler/runtime heuristics.
Using `nowait` to omit the barrier
By default, for / do has an implicit barrier at the end: all threads wait until every thread finishes its iterations.
You can skip this barrier with nowait:
#pragma omp parallel
{
#pragma omp for nowait
for (int i = 0; i < N; ++i) {
// work on part 1
}
// No barrier here; threads may continue immediately
#pragma omp for
for (int i = 0; i < N; ++i) {
// work on part 2
}
}
You should only use nowait when:
- Later code does not depend on all loop iterations being finished.
- It is safe if some threads progress further while others are still in the loop.
Combined `parallel for` / `parallel do`
A shorthand combines creating the parallel region and distributing loop iterations:
#pragma omp parallel for schedule(static)
for (int i = 0; i < N; ++i) { ... }Fortran:
!$omp parallel do schedule(static)
do i = 1, N
...
end do
!$omp end parallel doThis behaves (roughly) like:
#pragma omp parallel
{
#pragma omp for schedule(static)
for (int i = 0; i < N; ++i) { ... }
}Use combined forms for simple loop-only parallel regions where you do not need extra code inside the parallel region outside the loop.
The `sections` construct
sections is a work-sharing construct for dividing different code blocks (not loop iterations) among threads.
C/C++:
#pragma omp parallel
{
#pragma omp sections
{
#pragma omp section
{
// Work A
}
#pragma omp section
{
// Work B
}
#pragma omp section
{
// Work C
}
}
}Fortran:
!$omp parallel
!$omp sections
!$omp section
! Work A
!$omp section
! Work B
!$omp section
! Work C
!$omp end sections
!$omp end parallelKey properties:
- Each
sectionblock is executed by exactly one thread in the team. - Every
sectionis executed once. - Threads that do not get a
sectionsimply skip to the end (subject to the barrier). - There is an implicit barrier at the end of
sectionsunless you addnowait.
Typical uses:
- Running a small number of independent tasks in parallel:
- E.g., reading different input files, precomputing tables, or performing independent setup steps.
- Parallelizing code where each part is logically different, unlike a loop where all iterations are similar.
`sections` with `nowait`
#pragma omp parallel
{
#pragma omp sections nowait
{
#pragma omp section
{ /* Work A */ }
#pragma omp section
{ /* Work B */ }
}
// No barrier here; threads may proceed before all sections finish
}
Use nowait only when later code does not require the completion of all sections.
The `single` construct
single is a work-sharing construct that specifies code that should be executed by exactly one thread in a team, while the other threads skip it.
C/C++:
#pragma omp parallel
{
// ... some parallel work ...
#pragma omp single
{
// Only one thread executes this block
// e.g., input, initialization, or logging
}
// implicit barrier by default
}Fortran:
!$omp parallel
! ... some parallel work ...
!$omp single
! Only one thread executes this block
!$omp end single
!$omp end parallelKey properties:
- Exactly one thread executes the
singleblock; which thread is implementation-dependent. - By default, there is an implicit barrier at
end single: all threads wait until the single block finishes.
Typical uses:
- Code that must be executed only once, but is logically within a parallel region:
- Allocating shared data structures.
- Reading input needed by all threads.
- Writing output that must not be duplicated.
`single` with `nowait`
You can add nowait to avoid the barrier at the end:
#pragma omp parallel
{
#pragma omp single nowait
{
// One thread executes this; others do not wait
}
// No barrier here
}
Use this only if other threads do not depend on the result of the single block immediately after it.
The `task` construct as a flexible work-sharing tool
While tasks are often treated separately in OpenMP documentation, they function as a more dynamic work-sharing construct, where work units (tasks) are created at runtime and scheduled onto threads.
Basic C/C++ form:
#pragma omp parallel
{
#pragma omp single
{
for (int i = 0; i < N; ++i) {
#pragma omp task
{
// work for element i
}
}
}
}Key work-sharing aspects:
- Each
taskrepresents a piece of work that can be executed by any thread in the team. - Tasks may be created by one thread (often inside
single) and executed by any thread. - Tasks allow irregular and nested parallelism (e.g., recursive algorithms, tree traversals).
Task-specific features (dependencies, taskloops, etc.) are covered elsewhere; here, the main point is that tasks provide dynamic work distribution, beyond simple loop or section division.
Fortran `workshare` (language-specific construct)
In Fortran, workshare is a construct that can automatically distribute certain array operations and constructs over threads. It is primarily relevant to Fortran array syntax and similar features.
Example pattern (conceptual):
!$omp parallel
!$omp workshare
A = B + C ! array operation; iterations over elements are shared
!$omp end workshare
!$omp end parallelKey idea:
- The compiler identifies element-wise work in the enclosed region and distributes it across threads.
- This is mostly used in Fortran-centric codes that rely on array syntax/operations.
Detailed behavior and best practices around workshare are typically Fortran-specific and depend on compiler support.
Choosing between work-sharing constructs
Some common decision guidelines, focusing purely on how work is shared:
- Use
for/dowhen: - You have a loop with many similar, independent iterations.
- You want explicit control over iteration scheduling (
static,dynamic, etc.). - Use
sectionswhen: - You have a small number of different code blocks, each doing a different task.
- You want each block to run in parallel with the others exactly once.
- Use
singlewhen: - You need code that runs once inside a parallel region, and other threads should skip it.
- Examples: one-time initialization, I/O, or task creation.
- Use
taskwhen: - Work units are irregular, created dynamically, or structured recursively.
- You want the runtime to dynamically balance these work units across threads.
- Use
workshare(Fortran) when: - You have Fortran array syntax or related constructs that you want automatically parallelized over threads.
Interactions with data scoping and synchronization
Work-sharing constructs interact strongly with:
- Data scoping (
shared,private,firstprivate,reduction, etc.). - Synchronization (implicit barriers,
nowait, and explicit constructs likecritical,atomic, and locks).
Only the work-sharing aspect is emphasized here:
- Every work-sharing construct may have an implicit barrier at its end;
nowaitcan be used to remove it where safe. - Data sharing attributes on variables can significantly affect correctness and performance when work is divided among threads.
The detailed rules and best practices for data scoping and synchronization are discussed in other chapters; when using work-sharing constructs, always ensure:
- Each piece of work has the necessary private or shared variables correctly specified.
- You understand whether the implicit barrier should remain or be removed (
nowait), based on dependencies between threads.