Kahibaro
Discord Login Register

Benchmarking applications

What Benchmarking Is (and Is Not)

Benchmarking in HPC is the systematic, repeatable measurement of application performance under controlled conditions.

You are not just running your code once and looking at how long it took. Proper benchmarking aims to:

Benchmarking answers questions like:

It does not replace correctness testing or debugging.

Types of Benchmarks in HPC

Microbenchmarks

Microbenchmarks focus on a very narrow aspect of performance:

These do not tell you directly how your whole application will perform but give insight into bottlenecks.

Kernel / Mini-Application Benchmarks

Here you benchmark a representative kernel or mini-app that captures the dominant computation and communication pattern of your full application, but in a simplified form.

Use cases:

Full Application Benchmarks

This is your actual production application (or a close equivalent), run with realistic input data and configuration.

Characteristics:

Use cases:

Synthetic vs Realistic Workloads

Both have value:

Designing a Meaningful Benchmark

A useful benchmark must be:

Define the Benchmark Scenario Clearly

Specify:

All of this should be documented so that someone else (or you in 6 months) can reproduce the benchmark.

Choose Metrics That Matter

Common metrics include:

Pick metrics aligned with your goals:

Control Variables vs Tunables

Clearly separate:

Change one category of tunables at a time to isolate their impact.

Strong and Weak Scaling Benchmarks

Scaling concepts themselves are covered elsewhere, but here is how they shape benchmarking setups.

Strong Scaling Benchmarks

Goal: measure how time decreases as you increase resources for a fixed problem size.

Setup:

Use cases:

Weak Scaling Benchmarks

Goal: measure how time behaves as you increase resources and increase problem size so that work per resource stays constant.

Setup:

Use cases:

Controlling the Benchmark Environment

To make benchmarks comparable and interpretable, the environment must be as stable as possible.

System Load and Interference

Cluster conditions vary:

Mitigation strategies:

Process and Thread Affinity

To reduce variability:

Affinity impacts:

Repeating Runs and Statistical Treatment

Due to noise, single measurements are often misleading.

Basic practice:

Interpretation:

Instrumentation and Timing Techniques

Benchmarking requires reliable timing for the portions of the code that matter.

What to Time

Avoid timing everything from process start to exit if:

Common practice:

Timing Tools and APIs

Depending on your application and environment, you might use:

Consistency is more important than the specific API chosen. Always:

Benchmark Input Design and Warm-Up

Representative Inputs

The performance of many codes depends strongly on input characteristics, such as:

Guidelines:

Warm-Up Runs

First runs often include:

Methods:

This helps the benchmark reflect steady-state performance.

Benchmarking Different Configurations

Benchmarking is often used to compare:

Hardware Comparisons

When comparing different systems:

Interpretation tips:

Software and Compiler Comparisons

When comparing builds:

Document:

Algorithmic Variants

When benchmarking algorithmic changes:

This allows fair comparisons when different algorithms have different convergence rates.

Interpreting and Presenting Benchmark Results

Basic Data Handling

For each configuration (e.g., problem size, core count, compiler), you should at least have:

Avoid:

Visualization

Common useful plots:

Guidelines:

Identifying Performance Regimes and Limits

From benchmarking data, you can often identify regimes:

Recognizing which regime you are in informs which optimizations are likely to pay off.

Benchmarking Best Practices

To make your benchmarks robust and useful over time:

Used systematically, benchmarking becomes a powerful tool to:

Views: 12

Comments

Please login to add a comment.

Don't have an account? Register now!