Kahibaro
Discord Login Register

Debugging tools for HPC

Goals of This Chapter

In this chapter you will:

The aim is not to master every tool, but to recognize what to use when, and how to get started with each.

Types of Debugging Tools in HPC

HPC debugging tools can be grouped into:

Many full-featured HPC tools bundle several of these capabilities.

Using Debuggers on Clusters: Practical Considerations

On HPC systems, debugging has some constraints compared to a laptop:

Typical workflow with SLURM (schematic):

salloc -N 1 -n 4 --time=01:00:00  # interactive allocation
srun --pty gdb ./my_mpi_program   # run under debugger

The exact options depend on your site; cluster documentation usually has a “debugging” section.

Symbolic Debuggers for HPC

GDB and Variants

GDB (GNU Debugger) is the default debugger on many Linux/HPC systems. Key capabilities:

For MPI programs, you typically:

For OpenMP programs, you normally:

Many sites also offer:

LLDB

LLDB (from the LLVM project) is another modern debugger, especially used with Clang/LLVM:

Memory and Correctness Checkers

Valgrind

Valgrind is widely available on clusters and provides several tools; the most important for HPC beginners:

Example:

valgrind --tool=memcheck ./my_program input.dat

Considerations in HPC:

  srun -n 2 valgrind --tool=memcheck ./my_mpi_program

but performance overhead and log volume grow quickly with rank count.

Address Sanitizer and Other Sanitizers

Compiler-based sanitizers (part of GCC and Clang) are extremely helpful:

You enable them at compile time, for example:

# GCC/Clang example
mpicc -g -O1 -fsanitize=address -fno-omit-frame-pointer -o myprog myprog.c

Then run normally (on reduced input sizes). Pros:

Check your site documentation; some clusters provide preconfigured sanitizer builds.

MPI-Aware Debugging Tools

For MPI-specific correctness (mismatched sends/receives, deadlocks, incorrect collectives), MPI-aware tools are very valuable.

MPI Checkers (MUST, Intel MPI correctness, etc.)

Examples (availability varies by site):

Typical usage pattern (schematic):

mpirun -np 8 mustrun ./my_mpi_program

or use a wrapper command specified by the tool.

Features:

These tools are particularly useful for bugs that only appear in parallel and may not crash (e.g., silent hangs).

Parallel-Aware GUI Debuggers (DDT/Arm Forge, TotalView)

Many HPC centers provide commercial parallel debuggers, commonly:

Key capabilities:

Typical workflow (high level):

  1. Start an interactive job (salloc).
  2. Load the module, e.g. module load forge or module load totalview.
  3. Launch via the tool’s front-end or a wrapper, e.g.:
   ddt srun -n 8 ./my_mpi_program
  1. Use the GUI (X11 forwarding or remote desktop) to insert breakpoints, inspect variables, etc.

These tools are often the most practical way to debug non-trivial MPI applications on clusters.

Tools for Threading and Race Conditions

Race conditions and synchronization bugs in OpenMP (and other threading models) are notoriously tricky. Dedicated tools can help.

OpenMP Debugging with Traditional Debuggers

Using gdb/lldb + environment variables:

However, stepping through multi-threaded execution can be confusing; race detectors are often more practical.

Thread/Memory Race Detectors

Common tools:

For example, with Clang:

clang -g -O1 -fopenmp -fsanitize=thread -o myprog myprog.c

Run with reduced problem size and inspect the sanitizer’s error output, which usually:

Vendor suites (e.g., Intel Inspector in Intel oneAPI) also provide GUI-based thread/race analysis.

GPU Debugging Tools

For GPU-accelerated applications (CUDA, OpenACC, OpenMP offload), specialized tools are required.

CUDA Debugging

If you use CUDA:

    cuda-gdb ./my_cuda_program

Important considerations:

  nvcc -G -g -O0 mykernel.cu -o myprog

OpenACC / OpenMP Offload Debugging

For directive-based GPU programming (OpenACC, OpenMP target offload):

For correctness of GPU memory usage (out-of-bounds, illegal accesses), sanitizer-like CUDA tools (e.g. cuda-memcheck, or its newer replacements in CUDA compute-sanitizer) are critical:

cuda-memcheck ./my_cuda_program

They report illegal memory accesses, race conditions, and API misuse in GPU code.

Record-and-Replay and Deterministic Debugging

Some bugs appear only rarely or at scale. Record-and-replay tools try to:

Examples (availability and practicality vary):

In practice, on many clusters you will instead:

Logging, Tracing, and Lightweight Instrumentation

For bugs that depend on scale or particular timing:

Instead, you can use:

Examples (names may vary by site): Score-P, Vampir, Paraver, Intel VTune/Trace Analyzer. While mainly performance tools, they double as powerful “what actually happened?” debuggers.

Integrating Debugging Tools with the Batch Scheduler

Common patterns when using tools with schedulers like SLURM:

    #!/bin/bash
    #SBATCH -N 2
    #SBATCH -n 64
    #SBATCH -t 00:30:00
    module load must
    srun mustrun ./my_mpi_program > must_output.log 2>&1

Always check site documentation for:

Choosing the Right Tool

A practical mapping from symptom to tool:

Practical Tips and Best Practices

By understanding the strengths and limitations of each debugging tool and how to run them on a cluster, you can systematically approach even complex parallel bugs rather than relying on trial-and-error.

Views: 12

Comments

Please login to add a comment.

Don't have an account? Register now!