Kahibaro
Discord Login Register

Strong scaling

Concept of Strong Scaling

Strong scaling describes how the time to solve a fixed-size problem changes as you increase the number of processing elements (cores, nodes, GPUs, etc.). The total amount of work (problem size) stays constant; only the parallel resources change.

Formally, strong scaling efficiency $E_s$ for $p$ processing elements is often defined as
$$
E_s(p) = \frac{T(1)}{p \, T(p)}
$$
where:

If doubling the number of cores halves the runtime (with fixed problem size), the program is said to strong-scale perfectly over that range.

Speedup and Efficiency in Strong Scaling

The two core quantities used in strong scaling are:

Interpretation:

In practice, $T(1)$ may be replaced by $T(p_{\text{min}})$, the runtime at the smallest core count for which the problem fits in memory.

How to Perform a Strong Scaling Study

A strong scaling study is an experiment:

  1. Choose a fixed problem size
    • Same input size, same grid size, same number of particles, same dataset, etc.
    • Ensure that the problem fits into memory for all tested core counts.
  2. Select a range of core counts
    • E.g., $p = 1, 2, 4, 8, 16, 32, \dots$
    • Stay within practical limits (e.g., avoid using entire machine for tiny problem).
  3. Measure runtime for each $p$
    • Use consistent measurement methods (e.g., application timer, job runtime).
    • Run each configuration multiple times to average out noise.
  4. Compute speedup and efficiency
    • Use the formulas above.
    • Often $T(1)$ is replaced with $T(p_{\text{min}})$ if $T(1)$ is too slow or infeasible.
  5. Plot results
    • Speedup vs. cores: compare to ideal line $S_{\text{ideal}}(p) = p$.
    • Efficiency vs. cores: see how quickly it drops as you add resources.

Typical Strong Scaling Behavior

As you increase $p$ for a fixed-size problem:

This “useful strong scaling range” is important in choosing how many cores to request on an HPC cluster: using more cores than this range wastes resources without significant speedup.

Factors Limiting Strong Scaling

For a fixed problem size, the following effects become more pronounced as you add cores:

Strong Scaling vs. Practical Resource Usage

Strong scaling analysis helps answer questions like:

Practical observations:

On shared systems with fair-share policies, using more cores than needed for modest speed gains can be considered poor resource usage.

Designing Applications for Strong Scaling

To improve strong scaling for a fixed problem size, common strategies include:

Example Strong Scaling Experiment Template

Below is a simple template to organize a strong scaling experiment (independent of specific parallel programming models):

  1. Setup
    • Fix problem size (e.g., grid $1000 \times 1000$, or $10^7$ particles).
  2. Run commands (conceptually):
   # 1 core
   srun -n 1 ./my_app --size 1000 > log_p1.txt
   # 2 cores
   srun -n 2 ./my_app --size 1000 > log_p2.txt
   # 4 cores
   srun -n 4 ./my_app --size 1000 > log_p4.txt
   # 8 cores
   srun -n 8 ./my_app --size 1000 > log_p8.txt
  1. Extract runtimes (from logs or job info) into a table:

| Cores $p$ | Time $T(p)$ [s] | Speedup $S(p)$ | Efficiency $E_s(p)$ |
|-----------|-----------------|----------------|----------------------|
| 1 | 100 | 1.00 | 1.00 |
| 2 | 52 | 1.92 | 0.96 |
| 4 | 28 | 3.57 | 0.89 |
| 8 | 17 | 5.88 | 0.74 |

  1. Interpret
    • Up to 8 cores, efficiency is still reasonably high.
    • If at 16 cores efficiency dropped to 0.4, you might choose 8 cores for production runs of this specific problem size.

When Strong Scaling Is (and Is Not) Appropriate

Strong scaling is particularly useful when:

It is less appropriate when:

Understanding strong scaling gives you a key tool for making efficient, informed decisions about how to run your codes on HPC systems.

Views: 15

Comments

Please login to add a comment.

Don't have an account? Register now!