7.6 Race conditions

Understanding Race Conditions in Shared-Memory Programs

In shared-memory parallel programming, a race condition occurs when the final outcome of a program depends on the timing or interleaving of operations performed by multiple threads accessing shared data. This usually leads to results that are:

Non-deterministic (different runs give different answers)
Hard to reproduce and debug
Often wrong, but not always obviously so

Race conditions are not syntax errors; your program compiles and may even appear to work most of the time. They are logic errors due to uncontrolled concurrent access to shared state.

Essential Ingredients of a Race Condition

A race condition typically requires all of the following:

Shared state
At least one memory location (variable, array element, data structure) that is accessible by multiple threads.
Concurrent access
Two or more threads can access the shared state at the same time (overlapping in time).
At least one write
At least one of those accesses modifies the shared data (a write). Multiple writers or one writer plus one or more readers are both problematic.
Lack of proper synchronization
There is no mechanism (locks, atomics, barriers, ordered constructs, etc.) to enforce a safe ordering of these accesses.

When these conditions are all present, the order in which threads access and update the shared data is not controlled, and the state can be corrupted or inconsistent.

A Classic Example: Lost Updates

Consider incrementing a shared counter in parallel:

int counter = 0;
#pragma omp parallel for
for (int i = 0; i < 1000000; i++) {
    counter++;
}

You might expect counter == 1000000 at the end, but on many runs you will see a smaller value. Why?

The operation counter++ is not atomic; it is typically translated into something like:

Load counter from memory into a register
Add 1 to the register
Store the register back to memory

With two threads, the following interleaving can happen:

Thread A: load counter (say it sees 5)
Thread B: load counter (also sees 5)
Thread A: add 1, store 6
Thread B: add 1, store 6

The update from one thread is lost. With more threads and more increments, the final result can deviate significantly from the expected value.

This is a read-modify-write race on counter.

Non-Determinism and Reproducibility

Race conditions often produce:

Different numerical results between runs
Rare, “once in a while” failures
Bugs that appear only on some machines or core counts

Because scheduling is influenced by many factors (OS decisions, other jobs, hardware details), the exact timing of operations varies run-to-run. Thus:

A code with a race condition might pass tests 99 times and fail on the 100th.
Adding printf or extra logging can hide or expose the bug by changing timing.
Switching compilers or optimization levels can change whether the bug shows up.

For HPC, where reproducibility and numerical reliability are important, this is a serious issue.

Common Patterns That Lead to Race Conditions

1. Shared Accumulators Without Synchronization

Shared sums, counters, and statistics are typical:

double sum = 0.0;
#pragma omp parallel for
for (int i = 0; i < N; i++) {
    sum += a[i];   // race
}

Multiple threads perform read-modify-write on sum concurrently. This leads to lost updates and non-deterministic results.

(Using mechanisms like reduction is the standard way to avoid this, but the mechanics belong to other chapters.)

2. Writing to Shared Arrays With Overlapping Indices

If threads update overlapping portions of an array, the order of updates matters:

#pragma omp parallel for
for (int i = 1; i < N; i++) {
    x[i] = x[i] + x[i-1];   // potential race on x[i-1]
}

Here, each iteration depends on x[i-1], which may be updated by another iteration (and thus another thread) at the same time. The program now depends on the scheduling order of iterations.

3. Shared Loop Induction Variables

Incorrect manual parallelization can cause races on indices:

int i;
#pragma omp parallel
{
    for (i = 0; i < N; i++) {
        do_work(i);  // race on i
    }
}

All threads share the same i. Different threads write to it concurrently, corrupting the loop logic.

With proper for constructs in OpenMP, this is handled automatically, but hand-written patterns like this are common pitfalls.

4. Races on Flags and State Variables

Using shared flags for coordination without proper synchronization is error-prone:

int ready = 0;
double result;
#pragma omp parallel sections
{
    #pragma omp section
    {
        result = compute();
        ready = 1;  // write
    }
    #pragma omp section
    {
        while (!ready) {
            // spin
        }
        use(result); // read
    }
}

This has two problems:

Race on ready: no guarantee that changes to ready are seen promptly or at all by the other thread (reordering and caching effects).
Race on result: the consumer may see a stale or partially updated value.

Without explicit synchronization, the compiler and hardware are allowed to reorder and cache reads/writes in ways that break your intended logic.

5. Races in Data Structures

Concurrent updates to shared data structures without protection lead to subtle bugs:

Pushing/popping from a shared queue or stack
Inserting/removing from a linked list or tree
Updating hash tables

For example, two threads inserting into the same linked list node field at the same time can easily corrupt pointers, causing crashes or silent data corruption.

Why the Compiler and Hardware Make This Worse

Modern compilers and hardware aggressively optimize code. They assume that, unless synchronization is used:

Memory operations can be reordered as long as single-thread semantics are preserved.
Values loaded from memory can be cached in registers and not reloaded each time.
Stores can be buffered and not immediately visible to other cores.

From a single-thread perspective, reordering and caching are safe. In a multi-threaded program without explicit synchronization, these optimizations can break assumptions such as:

“If I write to x and then read y, the other thread will see these in the same order.”
“If I change a flag, the other thread will see that change immediately.”
“If I write a structure field, the other thread will never see a half-updated value.”

Race-free programs rely on well-defined synchronization primitives to constrain these optimizations and establish happens-before relationships; programs with race conditions do not.

Data Races vs. Higher-Level Race Conditions

In shared-memory programming, the term “race condition” is often used broadly, but it is useful to distinguish:

Data race: two or more unsynchronized accesses to the same memory location where at least one is a write.
These are always problematic in standard shared-memory models.
Higher-level race / logic race: no low-level data race exists, but the logical ordering of events is still wrong.
Example: two threads correctly use locks to update a data structure, but acquire locks in different orders, leading to deadlocks or incorrect semantics.

In this chapter, “race condition” mostly refers to data races on shared memory.

Detecting Race Conditions

Race conditions are difficult to spot by eye in large codes. Several techniques help:

1. Static Reasoning and Code Review

Check which variables are shared vs. private across threads.
Look for shared writes outside of critical sections, atomics, or safe parallel constructs.
Examine any shared global variables, static local variables, and heap-allocated structures.

Structured patterns (like reductions or explicit per-thread data) are typically safer; ad-hoc shared updates are suspicious.

2. Dynamic Tools (Race Detectors)

Many tools can detect data races at runtime by instrumenting code:

Thread sanitizers (e.g., -fsanitize=thread in compatible compilers)
Vendor-specific analysis tools
OpenMP-aware debuggers and analyzers

These tools monitor memory accesses and report conflicting operations. The trade-offs:

They slow down execution (often significantly).
They may miss races that are not triggered during the tested run.
They can produce many warnings; interpreting them requires care.

Still, using such tools on smaller test cases is one of the most effective ways to uncover race conditions.

3. Stress Testing and Perturbation

Because races depend on timing:

Run the same test many times.
Vary thread counts, problem sizes, and scheduling strategies.
Introduce deliberate delays (sleep, heavy logging) in different parts of the code to alter timing.

If results become inconsistent or occasional errors appear, that is a strong indicator of a race.

Typical Symptoms in HPC Codes

In HPC applications, race conditions often manifest as:

Non-deterministic numerical results:

Different answers for the same input with different thread counts.
Minor discrepancies that grow over time in iterative methods.

Occasional crashes or NaNs:

Segmentation faults that disappear when recompiled or re-run.
Uninitialized or corrupted values leading to overflow or NaNs.

Debug vs. optimized build differences:

Code appears correct in debug builds (less reordering, more checks).
Bugs only appear with high optimization flags and large core counts.

These issues are often misattributed to “stability” or “convergence” problems rather than concurrency bugs.

Strategies to Avoid Race Conditions (Conceptual)

The detailed mechanisms are covered in other sections (synchronization mechanisms, OpenMP constructs, etc.). Conceptually, you avoid races by:

Eliminating sharing where possible

Give each thread its own data (private variables, thread-local storage).
Use per-thread arrays or structures and combine results at the end.

Structuring work to avoid overlapping writes

Partition data so each thread writes to distinct regions.
Ensure dependencies between iterations are respected (no use-before-producer-finished).

Using explicit synchronization where sharing is needed

Protect shared state updates with appropriate synchronization primitives.
Use designed patterns (e.g., reductions) instead of ad-hoc shared updates.

The key mental model is: every shared write must be justified and safely ordered.

Trade-Offs: Correctness vs. Performance

In HPC, performance is a priority, but correctness is non-negotiable:

Adding synchronization (locks, atomics, barriers) can reduce scalability if overused or misplaced.
Removing or weakening synchronization to “go faster” almost always introduces race conditions.
Good design aims to minimize the amount of shared mutable state and the frequency of synchronization, not to eliminate safety.

Early in development, emphasize correctness first:

Design your parallel structure.
Make it race-free with clear synchronization.
Then optimize by reducing sharing or restructuring work, while maintaining correctness.

Summary

Race conditions arise when multiple threads access and modify shared data without proper synchronization, and the program’s behavior depends on timing.
They cause non-determinism, lost updates, corrupted data, and subtle, hard-to-reproduce bugs.
Common sources include unsynchronized shared counters, overlapping array writes, shared loop indices, flag-based coordination without proper memory ordering, and concurrent updates to complex data structures.
Compiler and hardware optimizations, like reordering and caching, make unsynchronized sharing unsafe, even if the code appears logically ordered.
Detection involves code review, race-detection tools, and stress testing; elimination relies on careful control of sharing and use of appropriate synchronization constructs.

Understanding and recognizing race conditions is essential for building correct, reliable shared-memory parallel programs in HPC.

Comments

Please login to add a comment.

Don't have an account? Register now!