Kahibaro
Discord Login Register

Profiling tools

What Profiling Tools Do in HPC

Profiling tools measure where time is spent and how resources are used while your program runs. In the context of HPC, they help answer questions like:

In this chapter, the focus is on types of profiling tools, typical workflows, and how to interpret and act on profiling data in an HPC setting. The concepts of what to measure and why are covered in the parent chapter; here we concentrate on the tools themselves and their use.

Kinds of Profiling Tools

Most HPC performance tools fall into a few broad categories:

Instrumentation-based profilers

Instrumentation means adding extra code (manually or automatically) to record events such as function entry/exit or MPI calls.

Characteristics:

Typical uses:

Examples (you don’t need to know them all, but you should recognize the types):

Sampling-based profilers

Sampling profilers interrupt the program at regular intervals and record the current call stack and other hardware counters.

Characteristics:

Typical uses:

Examples:

Tracing tools

Traces record a timeline of events (function calls, MPI messages, I/O, etc.) for each process or thread. This is often based on instrumentation, but the key feature is the time-ordered log.

Characteristics:

Typical uses:

Examples:

Hardware counter tools

These tools focus on low-level hardware events:

They often use hardware performance monitoring units (PMUs).

Typical uses:

Examples:

Specialized tools (MPI, OpenMP, GPU)

There are also tools focused on specific programming models:

These tools report metrics that are particularly meaningful for that model (e.g., MPI wait time, OpenMP idle time, GPU occupancy).

Typical HPC Profiling Workflow

Profiling is not a one-shot activity. A practical workflow on HPC systems usually looks like this:

  1. Start with a small-to-medium test case
    • Use a representative input that runs quickly enough to experiment.
    • Start with modest core counts or a single node to simplify data.
  2. Run a sampling profiler to find hot spots
    • Use a low-overhead tool (perf, VTune sampling, HPCToolkit).
    • Identify:
      • Top functions by percentage of total time
      • Whether time is spent in your code or in libraries
    • Decide: Is the issue algorithmic, computational, memory, or communication-related?
  3. Drill down with more detailed tools if needed
    • If communication seems expensive:
      • Use an MPI-aware tracing tool or MPI summary profiler.
    • If threads or OpenMP are the issue:
      • Use thread-aware profiling/tracing to look at imbalance or oversubscription.
    • If GPU kernels are involved:
      • Use GPU profilers (Nsight Compute/Systems) to look at kernel performance and memory traffic.
  4. Change one thing at a time
    • Apply a targeted optimization (e.g., change loop order, modify MPI domain decomposition).
    • Re-profile with the same tool and settings for comparison.
    • Look at relative changes, not just absolute timings.
  5. Scale up gradually
    • Once single-node performance looks reasonable, profile at a few larger scales.
    • Use tools that can handle more processes (often sampling-based).
    • Focus on metrics like:
      • Parallel efficiency
      • Communication/computation ratio
      • Load balance

Using Profiling Tools on HPC Systems

Most HPC clusters provide a curated set of tools via environment modules. A typical usage pattern is:

  1. Load the tool module

Example (exact names vary by system):

Views: 16

Comments

Please login to add a comment.

Don't have an account? Register now!