Kahibaro
Discord Login Register

Weak scaling

Understanding Weak Scaling

Weak scaling describes how the performance of a parallel program changes when you increase the problem size in proportion to the number of processing elements (cores, nodes, GPUs, etc.), while attempting to keep the amount of work per processing element constant.

In other words: if each core always does roughly the same amount of work, can we keep the time-to-solution about the same as we add more cores and make the overall problem bigger?

This is fundamentally different from strong scaling, where the total problem size is fixed and the number of processing elements increases.


The Core Idea

Assume:

For weak scaling, you:

The ideal weak scaling behavior is:

So, in an ideal world you would have:
$$
T(P) \approx T(1) \quad \text{for all } P
$$

Of course, in practice, communication, synchronization, and other overheads make $T(P)$ increase with $P$. Weak scaling analysis helps you understand how fast it increases and why.


Weak Scaling Efficiency

To quantify weak scaling behavior, we typically define a weak scaling efficiency.

Let:

A common definition is:
$$
E_\text{weak}(P) = \frac{T(1)}{T(P)}
$$

If $T(P) = T(1)$ for all $P$, then:
$$
E_\text{weak}(P) = 1 \quad \text{(or } 100\% \text{)}
$$

In practice:

Some practitioners rescale or define efficiency slightly differently, but the basic idea is: compare runtime at 1 PE with runtime at $P$ PEs, holding work per PE constant.


Why Weak Scaling Matters in HPC

Many HPC applications are not about solving a fixed-size problem faster. Instead, they aim to:

In these cases, when larger machines become available, users increase the problem size instead of just reducing the time-to-solution. Weak scaling is the metric that answers:

If I use a bigger machine to solve a bigger problem (with the same work per core), does the time-to-solution stay roughly the same?

Good weak scaling means you can grow your scientific or engineering ambition (bigger, finer, more detailed simulations) without paying a large time penalty.


Typical Weak Scaling Scenarios

Some common patterns where weak scaling is the natural measure:

In all of these, the main question is: how much extra overhead appears when the system grows?


How to Design a Weak Scaling Experiment

To evaluate weak scaling for your own application, you usually:

  1. Choose a unit workload per process
    • Example: a grid of $128 \times 128 \times 128$ cells per rank.
    • Or: 1 million particles per rank.
    • Or: a fixed-number of data items per process.
  2. Set up runs with increasing $P$
    • For $P = 1, 2, 4, 8, 16, \dots$
    • For each $P$, build a global problem such that each rank still gets the same local size (or as close as possible).

Example (3D grid):

  1. Measure the same metric for each run
    • Total wall-clock time.
    • Or time for the main compute loop (excluding initialization/IO if needed, but be consistent).
  2. Compute weak scaling efficiency
    • Using $E_\text{weak}(P) = T(1) / T(P)$ or your preferred variant.
    • Plot $T(P)$ and/or $E_\text{weak}(P)$ versus $P$ on a graph.
  3. Analyze deviations
    • If $T(P)$ stays flat, weak scaling is excellent.
    • If $T(P)$ grows slowly (e.g., logarithmically), weak scaling may still be acceptable.
    • If $T(P)$ grows steeply, there are scaling bottlenecks to investigate.

Interpreting Weak Scaling Behavior

In an idealized case with only perfectly parallel work and no overhead:

Real-world programs deviate because of:

In weak scaling plots:

Simple Cost Models for Weak Scaling

A very basic conceptual model for runtime under weak scaling:

$$
T(P) = T_\text{comp} + T_\text{comm}(P) + T_\text{sync}(P) + T_\text{other}(P)
$$

Where:

For many stencil-like PDE solvers with simple nearest-neighbor communication:

The goal of weak scaling optimization is to keep $T_\text{comm}(P)$ and $T_\text{sync}(P)$ as small as possible so that total runtime $T(P)$ grows slowly.


Weak vs Strong Scaling: When to Use Which

Even though the full comparison is covered in the broader parallel concepts, it’s important to understand when weak scaling is the right tool:

Use weak scaling when:

Use strong scaling when:

Both analyses are complementary. A code might have:

Understanding this helps you set realistic expectations and choose appropriate job sizes on HPC systems.


Practical Considerations for Weak Scaling on Clusters

When performing weak scaling studies on real clusters, several practical aspects matter:

Domain Decomposition Strategy

How you split the problem across processes affects weak scaling:

Example: For a 3D grid, prefer:

Network Topology and Placement

Weak scaling curves can change behavior:

On many systems:

I/O Effects

For large-scale weak scaling runs:

Often:

Common Pitfalls in Weak Scaling Experiments

When designing or interpreting weak scaling results, avoid:

  1. Changing more than one factor at a time
    • Keep software version, compiler flags, algorithmic parameters, etc. constant.
    • Otherwise, you cannot attribute changes in performance solely to scaling.
  2. Comparing different workloads per process
    • If the local problem size is not actually constant, you are no longer measuring pure weak scaling.
    • Ensure the workload is as consistent as possible across different $P$.
  3. Including startup and initialization artifacts
    • For small $P$, fixed overheads (startup, reading input) might dominate.
    • For large $P$, these may become negligible relative to compute.
    • Decide up front whether to include these in your metric, and be consistent.
  4. Ignoring load imbalance
    • Weak scaling assumes constant work per process.
    • If your decomposition yields uneven work, measured scaling reflects both algorithmic scalability and load imbalance.
  5. Misinterpreting “flat” runtime
    • Flat runtime is ideal in theory, but slight increases are expected.
    • You need to understand which contributions (communication, synchronization, I/O) are responsible, particularly at large scales.

Simple Example Scenario (Conceptual)

Consider a 2D heat equation solver on a structured grid, parallelized with domain decomposition:

Expected behavior:

Such an example is often used in practice to characterize the scalability limits of both the application and the underlying HPC system.


When Weak Scaling Results Guide Application Design

Weak scaling studies often expose:

Developers then:

For users (non-developers), understanding weak scaling:

Summary

Views: 18

Comments

Please login to add a comment.

Don't have an account? Register now!