Table of Contents
Intuition and Motivation
Amdahl’s Law answers a practical question:
If I speed up part of my program (e.g., by using more cores), how much faster can the entire program become?
Even if most of your code is parallelized, any remaining serial (non-parallel) portion limits the total speedup. This law is fundamental when deciding:
- Whether further parallelization is worth the effort
- How many cores/nodes it makes sense to use
- Where to focus optimization work
It is primarily a strong scaling concept (fixed total problem size, increasing resources).
Basic Formulation
Assume:
- A fraction $f$ of your program is serial: it cannot be parallelized.
- A fraction $(1 - f)$ is perfectly parallelizable: can be spread across many processors with no extra overhead.
- You run on $N$ processors.
Normalize the original execution time (on 1 processor) to $T_1 = 1$. Then:
- Time spent in serial part: $f$ (cannot be reduced with more processors)
- Time spent in parallel part on $N$ processors: $\dfrac{1-f}{N}$
Total time on $N$ processors:
$$
T_N = f + \frac{1-f}{N}
$$
The speedup $S(N)$ is defined as:
$$
S(N) = \frac{T_1}{T_N} = \frac{1}{f + \frac{1-f}{N}}
$$
As $N \to \infty$:
$$
\lim_{N \to \infty} S(N) = \frac{1}{f}
$$
So the maximum possible speedup is bounded by the inverse of the serial fraction.
Interpreting the Serial Fraction
The serial fraction $f$ includes:
- Truly sequential operations (e.g., some I/O, initialization, certain algorithms)
- Code paths not parallelized (either by design or oversight)
- Overheads that cannot be parallelized away (some global coordination, setup/teardown)
Example interpretations:
- $f = 0.1$ (10% serial) ⇒ maximum speedup is $1/0.1 = 10\times$ no matter how many processors are used.
- $f = 0.01$ (1% serial) ⇒ maximum speedup is $100\times$.
This illustrates how even a small serial portion can drastically cap scalability.
Numerical Examples
Assume $f = 0.05$ (5% serial, 95% parallel):
- On $N = 4$ processors:
$$
T_4 = 0.05 + \frac{0.95}{4} = 0.05 + 0.2375 = 0.2875
$$
$$
S(4) = \frac{1}{0.2875} \approx 3.48
$$
- On $N = 16$ processors:
$$
T_{16} = 0.05 + \frac{0.95}{16} = 0.05 + 0.059375 = 0.109375
$$
$$
S(16) = \frac{1}{0.109375} \approx 9.14
$$
- As $N \to \infty$, maximum speedup:
$$
S_{\max} = \frac{1}{0.05} = 20
$$
Notice how going from 4 to 16 processors gives a noticeable improvement, but you are already far from the theoretical max (20×), and gains diminish as you increase $N$.
Diminishing Returns
Amdahl’s Law shows why speedup flattens as you add processors:
Rewrite the speedup formula for clarity:
$$
S(N) = \frac{1}{f + \frac{1-f}{N}} =
\frac{N}{Nf + (1-f)}
$$
Two key observations:
- For small $N$, the term $\frac{1-f}{N}$ is still large, so adding processors significantly reduces runtime.
- For large $N$, the serial term $f$ dominates, and the $\frac{1-f}{N}$ term becomes negligible. Extra processors mostly sit idle during the serial sections.
In practice, this means:
- There is a point beyond which adding more cores is wasteful for a fixed problem size.
- Efficient HPC usage requires choosing a reasonable number of cores per job, not always “as many as possible.”
Practical Use in HPC Planning
Amdahl’s Law is often used to:
1. Estimate Realistic Speedup
Given an estimated serial fraction $f$ and planned processor count $N$, you can estimate your expected speedup:
- Measure or guess $f$ from profiling
- Plug into $S(N) = \dfrac{1}{f + \dfrac{1-f}{N}}$
This helps manage expectations and justify resource requests.
2. Decide Where to Optimize
For maximizing speedup, reducing $f$ is often more impactful than further parallelizing already parallel code.
For example:
- Halving the serial fraction from $f = 0.1$ to $f = 0.05$ raises $S_{\max}$ from $10\times$ to $20\times$.
- Meanwhile, simply doubling $N$ past a certain point might yield only a tiny improvement.
Conclusion: Parallelizing or refactoring the serial bottlenecks can be more beneficial than throwing more hardware at the problem.
3. Choosing a Core Count
Using the law, you can find a rough “sweet spot” where additional cores give only marginal speedup.
For instance, you might stop scaling when each doubling of cores yields less than, say, 10% additional speedup. You can compute:
- $S(N)$ and $S(2N)$
- Compare the gain: $\dfrac{S(2N) - S(N)}{S(N)}$
If the relative gain is small, asking for more cores may not be worth the queue time and resource usage.
Extensions and Limitations
Amdahl’s Law provides a best-case picture under simplifying assumptions:
- The parallel part scales perfectly linearly with $N$.
- Parallelization introduces no overhead.
- The problem size is fixed (strong scaling).
In reality:
- Communication, synchronization, and I/O introduce extra overheads.
- Load imbalance and memory contention can increase the effective serial fraction.
- Actual speedup is usually worse than Amdahl’s prediction.
Nevertheless, the law is still extremely useful as:
- A conceptual model of the scaling limit
- A quick back-of-the-envelope tool for reasoning about performance
(Weak scaling and more optimistic scaling under growing problem sizes are treated using Gustafson’s Law in a separate chapter.)
Key Takeaways
- Amdahl’s Law describes the maximum speedup possible for a fixed-sized problem when only a fraction of the code can be parallelized.
- The serial fraction $f$ limits speedup: maximum speedup is $1/f$ regardless of processor count.
- Even small serial portions (e.g., 5–10%) can severely limit scalability on large core counts.
- The law highlights diminishing returns with increasing processor count and underscores the importance of reducing serial bottlenecks.
- In HPC practice, Amdahl’s Law guides decisions about parallelization effort, core counts, and realistic performance expectations.