Kahibaro
Discord Login Register

Introduction to OpenMP

What OpenMP Is (and Isn’t)

OpenMP is a specification for shared‑memory parallel programming, mainly for C, C++, and Fortran. It lets you express parallelism using:

Key characteristics:

OpenMP is not:

Basic Compilation and Enabling OpenMP

To use OpenMP, you need:

Typical examples:

Example (C):

gcc -O2 -fopenmp -o myprog myprog.c

If you omit the OpenMP flag, most compilers will ignore the directives and produce a normal serial executable.

Basic OpenMP Syntax and Structure

Directives in C/C++

OpenMP directives in C/C++ use #pragma:

Examples:

Directives must appear in the same line as the keyword; they apply to the succeeding structured block (for example, the next statement or loop).

Directives in Fortran (brief orientation)

In Fortran, directives are special comments, e.g.:

The syntax details and style are Fortran-specific, but the conceptual constructs mirror those of C/C++.

Your First OpenMP Program

Below is a minimal C example showing a “Hello, world” using OpenMP:

#include <stdio.h>
#include <omp.h>
int main(void) {
    #pragma omp parallel
    {
        int tid = omp_get_thread_num();
        printf("Hello from thread %d\n", tid);
    }
    return 0;
}

Key points:

Compile with:

gcc -O2 -fopenmp hello_omp.c -o hello_omp

Run:

./hello_omp

You should see multiple lines, one per thread. The number of threads is controlled by runtime settings (see below).

Controlling the Number of Threads

OpenMP lets you specify the team size at runtime or via code. Common mechanisms:

Environment variable: `OMP_NUM_THREADS`

Set before running your program:

export OMP_NUM_THREADS=4
./hello_omp

On typical HPC systems, you will often choose this to match the number of physical cores you want to use on a node.

In-code control: `num_threads` clause

You can override the environment variable in specific parallel regions:

#pragma omp parallel num_threads(8)
{
    // this region uses 8 threads
}

Runtime function calls

You can query and set the thread count programmatically:

#include <omp.h>
int main(void) {
    omp_set_num_threads(4);       // request 4 threads for subsequent regions
    #pragma omp parallel
    {
        int tid  = omp_get_thread_num();
        int nthr = omp_get_num_threads();
        // ...
    }
}

On HPC clusters, the number of threads is often coordinated with the job scheduler and the number of cores you requested (discussed elsewhere).

OpenMP Execution Model: Teams and Threads

Basic model:

Conceptually:

  1. Fork: Enter #pragma omp parallel → threads are created.
  2. Join: Exit #pragma omp parallel → threads synchronize and join.

Inside a parallel region:

Basic Data-Sharing Concepts (OpenMP View)

OpenMP builds on the shared-memory model: all threads see the same address space, but variables can be:

OpenMP lets you control this via clauses:

Example skeleton (details of clauses are used more extensively in later sections):

#pragma omp parallel shared(a, b) private(i, tmp)
{
    // 'a' and 'b' are visible and shared by all threads
    // each thread has its own 'i' and 'tmp'
}

For simple “hello world” or basic loops, default behavior often works, but correct data scoping becomes crucial as programs become more complex.

Parallelizing Loops with OpenMP

One of OpenMP’s main strengths is easy parallelization of regular loops over independent iterations.

The `parallel for` construct (C/C++)

A common pattern:

#pragma omp parallel for
for (int i = 0; i < N; i++) {
    a[i] = b[i] + c[i];
}

Meaning:

Requirements for correctness:

Equivalent two-step form:

#pragma omp parallel
{
    #pragma omp for
    for (int i = 0; i < N; i++) {
        a[i] = b[i] + c[i];
    }
}

This makes it clear that parallel creates threads and for distributes iterations. In many cases, parallel for is a convenient shorthand.

Loop scheduling (introductory view)

OpenMP supports different schedules for assigning loop iterations to threads via the schedule clause:

Example:

#pragma omp parallel for schedule(static)
for (int i = 0; i < N; i++) {
    // ...
}

The choice of schedule affects load balancing and performance but does not change the loop’s final results if the loop is correctly parallelizable.

Reductions: Parallel Accumulations

Many loops perform accumulations (sum, min, max, etc.). OpenMP provides the reduction clause so that each thread safely contributes to a shared result.

Example: summation

double sum = 0.0;
#pragma omp parallel for reduction(+:sum)
for (int i = 0; i < N; i++) {
    sum += a[i];
}

What happens:

Support operators commonly include +, *, -, &, |, ^, &&, ||, and others depending on the type.

This is the preferred way to handle patterns like sums and dot products in OpenMP.

Basic OpenMP Runtime Library Routines

OpenMP provides a small, portable runtime API. The most commonly used routines for beginners are:

Example timing code:

double t0 = omp_get_wtime();
#pragma omp parallel for
for (int i = 0; i < N; i++) {
    // work
}
double t1 = omp_get_wtime();
printf("Elapsed time: %f seconds\n", t1 - t0);

Note: omp_get_wtime() measures wall-clock time and is independent of the underlying platform.

Environment Variables for Runtime Control

OpenMP behavior can be tuned without recompiling, using environment variables:

Examples:

export OMP_NUM_THREADS=8
export OMP_SCHEDULE="dynamic,4"
./myprog

On HPC systems, these variables are often combined with scheduler settings (e.g., number of cores per task) to achieve good performance.

Typical Usage Patterns in HPC Codes

Within HPC applications, OpenMP is frequently used for:

Conceptually common patterns:

  1. MPI rank per node (or per socket) + OpenMP threads per rank to use all cores.
  2. OpenMP around compute kernels, leaving I/O and communication mostly serial.

The details of hybrid strategies are covered elsewhere; here the key idea is that OpenMP targets intra-node, shared-memory parallelism.

Minimal Debugging and Safety Tips for Beginners

When starting with OpenMP:

Some early safeguards:

For initial experiments, you can add temporary checks inside parallel regions:

#pragma omp parallel
{
    int tid = omp_get_thread_num();
    printf("Thread %d is running here\n", tid);
}

Be aware that printf from multiple threads can interleave output, but this helps verify that threads are created and running.

Summary

In this chapter you learned:

Views: 12

Comments

Please login to add a comment.

Don't have an account? Register now!