Kahibaro
Discord Login Register

Amdahl’s Law

Intuition Behind Amdahl’s Law

Amdahl’s Law is a simple way to reason about the limits of speedup when you parallelize a program. It answers a concrete question: if you speed up the parallel part of a program by using more processors, how much faster can the whole program become?

The key idea is that every real program has some parts that you can parallelize and some parts that must remain serial. No matter how many cores or nodes you add, the serial part still takes the same time. This serial part becomes a bottleneck and limits the maximum achievable speedup.

In the context of HPC, this is very important when you plan how far it is useful to scale an application. Amdahl’s Law tells you when adding more processors stops paying off because the serial fraction dominates.

Decomposing a Program: Serial and Parallel Fractions

To apply Amdahl’s Law, you conceptually split your program’s execution time into two parts.

Suppose that, running on a single processor, your program spends a fraction $f$ of its time in code that is inherently serial. This might be things like:

Input and output that must happen in a sequence.

Initialization and finalization logic.

Certain data dependencies that cannot be broken.

The remaining fraction, $1 - f$, is the part that you can ideally parallelize. In the best case, this parallelizable part can be divided evenly among $P$ processors.

If $T_1$ is the total execution time on a single processor, then:

The serial part takes time $f T_1$.

The parallel part takes time $(1 - f) T_1$.

On $P$ processors, the serial part does not speed up and still takes $f T_1$. The parallel part ideally speeds up by a factor of $P$, so it takes $\dfrac{(1 - f) T_1}{P}$.

Deriving Amdahl’s Law

We define speedup as the ratio of the time taken on one processor to the time taken on $P$ processors. Using the decomposition above, the parallel execution time on $P$ processors is

$$
T_P = f T_1 + \frac{(1 - f) T_1}{P}.
$$

The speedup $S(P)$ is

$$
S(P) = \frac{T_1}{T_P}.
$$

Substitute $T_P$:

$$
S(P) = \frac{T_1}{f T_1 + \frac{(1 - f) T_1}{P}}.
$$

We can factor out $T_1$ in the denominator and cancel it:

$$
S(P) = \frac{1}{f + \frac{1 - f}{P}}.
$$

This is the classic form of Amdahl’s Law.

Amdahl’s Law (speedup formula)
If a fraction $f$ of a program is serial and a fraction $1 - f$ is perfectly parallel, then the speedup on $P$ processors is
$$
S(P) = \frac{1}{f + \frac{1 - f}{P}}.
$$

This formula assumes ideal conditions for the parallel part. It does not account for communication, synchronization, or load imbalance. Those effects will make real-world speedup smaller than this ideal.

Maximum Speedup and the Serial Bottleneck

One of the most important consequences of Amdahl’s Law comes from asking what happens as $P$ becomes very large. In the formula

$$
S(P) = \frac{1}{f + \frac{1 - f}{P}},
$$

the term $\frac{1 - f}{P}$ goes to zero when $P \to \infty$. Hence, the maximum theoretical speedup is

$$
\lim_{P \to \infty} S(P) = \frac{1}{f}.
$$

Maximum theoretical speedup
Even with infinitely many processors, the maximum speedup is
$$
S_{\max} = \frac{1}{f},
$$
where $f$ is the serial fraction of the original program.

This means that if 10% of your program is serial, $f = 0.1$, then no matter how many processors you use, you cannot exceed a speedup of

$$
S_{\max} = \frac{1}{0.1} = 10.
$$

If you want a speedup larger than 10, you must further reduce the serial fraction, not just add processors.

This is why optimization of serial bottlenecks can be as important as adding more parallel resources in HPC codes. If you leave even a modest serial section unimproved, it sets a hard ceiling on what parallelization can achieve.

Numerical Examples

It is helpful to plug in some numbers to see how sensitive Amdahl’s Law is to the serial fraction.

Suppose $f = 0.05$, so 5% of the program is serial and 95% is parallelizable.

On $P = 4$ processors:

$$
S(4) = \frac{1}{0.05 + \frac{0.95}{4}}
= \frac{1}{0.05 + 0.2375}
= \frac{1}{0.2875} \approx 3.48.
$$

On $P = 16$ processors:

$$
S(16) = \frac{1}{0.05 + \frac{0.95}{16}}
= \frac{1}{0.05 + 0.059375}
= \frac{1}{0.109375} \approx 9.14.
$$

On $P = 128$ processors:

$$
S(128) = \frac{1}{0.05 + \frac{0.95}{128}}
= \frac{1}{0.05 + 0.007421875}
= \frac{1}{0.057421875} \approx 17.4.
$$

The maximum speedup, even with infinitely many processors, would be

$$
S_{\max} = \frac{1}{0.05} = 20.
$$

By the time you reach 128 processors, you have already achieved most of the possible speedup. Adding more processors beyond that brings only small additional gains.

If we decrease the serial fraction to $f = 0.01$, then

$$
S_{\max} = \frac{1}{0.01} = 100.
$$

At the same processor counts:

For $P = 16$:

$$
S(16) = \frac{1}{0.01 + \frac{0.99}{16}}
= \frac{1}{0.01 + 0.061875}
= \frac{1}{0.071875} \approx 13.9.
$$

For $P = 128$:

$$
S(128) = \frac{1}{0.01 + \frac{0.99}{128}}
= \frac{1}{0.01 + 0.007734375}
= \frac{1}{0.017734375} \approx 56.4.
$$

With only a 1% serial fraction, the system can benefit from many more processors before hitting the practical ceiling. This is why HPC practitioners spend significant effort on reducing the serial portion of critical codes.

Parallel Efficiency and Amdahl’s Law

Parallel efficiency measures how well you are using your processors. It is defined as the speedup divided by the number of processors:

$$
E(P) = \frac{S(P)}{P}.
$$

Using Amdahl’s formula for $S(P)$:

$$
E(P) = \frac{1}{P \left( f + \frac{1 - f}{P} \right )}
= \frac{1}{Pf + (1 - f)}.
$$

This tells you how the serial fraction and the processor count together affect efficiency. For fixed $f$, as $P$ increases, the term $Pf$ grows and efficiency decreases.

For example, with $f = 0.02$ and $P = 64$:

$$
E(64) = \frac{1}{64 \cdot 0.02 + (1 - 0.02)}
= \frac{1}{1.28 + 0.98}
= \frac{1}{2.26} \approx 0.442.
$$

So the efficiency at 64 processors is about 44%. Roughly speaking, less than half of the potential computing power is converted into useful speedup because the serial fraction limits the scaling.

In HPC usage, a drop in efficiency as you increase the processor count is expected. Amdahl’s Law lets you estimate how quickly that efficiency decline will happen given your serial fraction.

Using Amdahl’s Law in Practice

In real HPC projects, Amdahl’s Law is used mainly as a guiding principle and a back-of-the-envelope estimator.

If you can measure or estimate the serial fraction of your code, you can predict whether scaling to more nodes will pay off. A typical workflow is:

Measure or profile an application on a small number of processors to identify how much time is spent in serial sections.

Use Amdahl’s formula to project how the speedup will behave as you increase $P$.

Decide up to which processor count it makes sense to scale, given your desired efficiency and queue policies.

This can prevent wasteful use of a cluster where jobs request thousands of cores, but the speedup has already saturated at a few hundred because of the serial bottleneck.

Amdahl’s Law also suggests where optimization effort is most valuable. Reducing $f$ by improving or restructuring the serial parts of the program can significantly raise the maximum possible speedup. For example, if you can change $f$ from 0.05 to 0.02, you raise $S_{\max}$ from 20 to 50. In many cases, improving the serial phase of I/O, initialization, or serial communication patterns can unlock much better scaling than hardware changes alone.

Idealization and Practical Limitations

The original form of Amdahl’s Law assumes that the parallel part of the program scales perfectly. In reality, several factors break this idealization:

Parallel work may not partition evenly across processors.

Communication and synchronization cost grow as you add more processors, which effectively increases the serial fraction or adds extra overhead not included in the basic model.

Memory and cache effects can change performance characteristics at different scales.

Operating system noise and contention on shared resources introduce additional delays.

All of these effects reduce actual speedup compared to the ideal curve predicted by the law. In practice, you can think of the measured speedup as matching Amdahl’s curve but with a larger effective serial fraction that incorporates these overheads.

Therefore, Amdahl’s Law is best seen as an upper bound and a conceptual framework. It sets expectations and highlights that parallelism alone cannot overcome serial limitations. For more realistic modeling under changing problem sizes, you will encounter Gustafson’s Law in a separate chapter, which uses a different assumption about how problems scale with processor counts.

Summary of Key Takeaways

Amdahl’s Law expresses how the serial fraction of a program limits its parallel speedup. The central formula

$$
S(P) = \frac{1}{f + \frac{1 - f}{P}}
$$

shows that even a small serial portion can sharply restrict the benefits of increasing processor count. As $P$ grows large, the maximum speedup converges to $1 / f$, so improving or reducing the serial part is essential.

In HPC environments, Amdahl’s Law helps decide when it is worthwhile to use more resources, estimate how efficient a given scaling will be, and prioritize code improvements that lower the serial fraction. It reminds us that high performance on large parallel systems depends not only on adding more cores but also on careful algorithm and software design that minimizes unavoidable serial work.

Views: 4

Comments

Please login to add a comment.

Don't have an account? Register now!