Table of Contents
Limits of Serial Computing
A single CPU core executing one instruction stream (serial computing) has fundamental limits:
- Physical limits to clock speed
You cannot keep making processors arbitrarily faster: - Higher frequencies mean more power and heat.
- Power roughly scales with frequency and voltage; both have practical limits.
- Modern CPU clock speeds have plateaued compared to the 1990s–2000s trend.
- Diminishing returns from micro-optimizations
Even with clever algorithms and compiler optimizations, at some point: - Each additional optimization yields smaller speedups.
- The remaining runtime is dominated by parts that are already “near optimal” on a single core.
- Problem sizes keep growing
Data volumes and model resolutions grow faster than single-core performance: - Genomics: more sequencing data per experiment.
- Climate and CFD: finer grids, more variables, longer simulations.
- ML/AI: larger models, bigger datasets.
At some scale, a single core cannot finish the computation in acceptable time, regardless of how optimized the serial code is.
Time-to-Solution and Practical Deadlines
In HPC, the key question is often: how quickly must we get the result?
Examples where serial runtimes are unacceptable:
- Weather forecasting
- A 7-day forecast must be ready in hours, not weeks.
- Higher resolution, more physics → more computation.
Parallelism is essential to meet real-time deadlines. - Engineering design cycles
- Many candidate designs must be evaluated (e.g., hundreds of CFD runs).
- Product timelines require results in days, not months.
- Parallel computing makes it possible to:
- Run one simulation faster, and/or
- Run many simulations in parallel.
- Clinical or emergency decision-making
- Medical imaging reconstruction, treatment planning, or epidemic modeling
often have strict time constraints. - Waiting days for a result is not acceptable; parallel computing reduces
time-to-solution to something clinically useful.
If your serial runtime is, say, 6 months, no amount of “patience” will make
that useful for most real-world applications. You must use more hardware in parallel.
Enabling Higher Resolution and Model Complexity
Parallel computing is not just about getting the same answer faster; it often enables:
- Better, more detailed answers that are completely infeasible in serial.
Examples:
- Spatial/temporal resolution
- Climate, weather, seismic, or fluid dynamics simulations:
- Doubling the resolution in each spatial dimension can increase work by
^3 = 8$ times (3D). - Longer simulated times or smaller time steps add further multipliers.
- Parallelism allows:
- Much finer grids,
- More detailed physics,
- Longer simulation windows.
- More complex physics and couplings
- Multi-physics models (e.g., fluid–structure interaction, climate–biogeochemistry)
are far more expensive than single-physics approximations. - To include these extra processes while keeping runtimes manageable,
you need parallel execution across many CPU cores and/or GPUs. - Uncertainty quantification and parameter sweeps
- Exploring parameter space, running ensembles, or computing statistics over many
runs may require 100s–1000s of simulations. - Serially, this may be impossible in reasonable time.
- Parallelism allows:
- Many simulations running simultaneously,
- Systematic exploration of options rather than a few ad-hoc tests.
Without parallel computing, many modern scientific and engineering questions
would require either:
- Drastically simplified models, or
- Runtimes so long that the study would be impractical.
Working with Massive Data
Data sizes in HPC contexts can easily reach:
- Gigabytes to terabytes per dataset (e.g., 3D imaging, simulation outputs).
- Petabytes across many experiments or long-running simulations.
Parallel computing is needed for:
- Parallel data processing
- Dividing data into chunks processed by different cores or nodes:
- Image tiles, subvolumes, time segments, file partitions.
- Processing each part in parallel to meet realistic time goals.
- Parallel I/O (concept covered in detail elsewhere)
- Reading/writing large datasets from/to storage in parallel
so that I/O does not dominate total runtime. - Parallel filesystems and libraries make this feasible, but
they rely on parallel applications to take advantage of them. - Real-time or near-real-time analysis
- Online analysis during experiments (e.g., at synchrotrons or telescopes)
must keep up with data rates. - Parallel computing on clusters or accelerators is often the only way.
Serial approaches quickly become I/O-bound and CPU-bound when data volumes grow, making parallelism necessary to handle the scale.
Hardware Trends: Parallelism is the Default
Modern hardware is inherently parallel:
- Multi-core CPUs
- Even laptops commonly have multiple cores.
- Servers and HPC nodes can have dozens to hundreds of cores per node.
- SIMD/vector units and GPUs
- Hardware supports operating on multiple data elements at once.
- GPUs and accelerators are massively parallel by design.
- Cluster architectures
- HPC systems are collections of many nodes connected by high-speed networks.
- Each node has many cores and often accelerators.
To effectively use the performance that hardware vendors provide:
- Software must exploit:
- Thread-level parallelism,
- Process-level parallelism,
- Vector parallelism,
- Accelerator parallelism.
- Serial-only programs use just a tiny fraction of what the machine can do.
In other words, hardware evolution has shifted from “faster single cores” to “more parallelism”. To benefit from this evolution, applications must become parallel.
Economic and Energy Considerations
Parallel computing matters not only for speed but also for cost and energy:
- Throughput and resource utilization
- HPC centers aim to complete as many useful jobs as possible per day.
- Parallelizing workloads:
- Increases throughput,
- Reduces idle hardware,
- Makes better use of expensive infrastructure.
- Energy and power budgets
- Large systems have strict energy constraints.
- Getting results faster can sometimes:
- Reduce total energy by finishing quickly and allowing components to idle,
- Move work to more energy-efficient parallel units (e.g., GPUs).
- Cost of waiting versus cost of hardware
- In industry, researcher/engineer time is expensive.
- It can be cheaper to:
- Use more hardware in parallel,
- Reduce turnaround time,
than to have people wait for long serial jobs.
Parallel computing allows a better balance between time, cost, and energy use, especially at large scales.
Making New Types of Workflows Possible
Some workflows fundamentally rely on parallelism; they are not just “faster serial”:
- Interactive exploration of complex models
- Steering simulations while they run,
- Live visualization and analysis.
These require results quickly enough to support user interaction. - Real-time or streaming processing
- Continual data streams (from sensors, instruments, or simulations)
processed as they arrive. - Parallelism is needed to keep up with incoming rates.
- Large-scale AI/ML in scientific contexts
- Training large models, combining simulations with learning, and
processing scientific data with deep learning often require parallel
training across many devices.
Such workflows would simply not exist in useful form on purely serial systems.
Summary: Why Parallel Computing is Needed
Parallel computing is needed because:
- Physical and architectural limits prevent single-core performance from scaling indefinitely.
- Many real-world problems have time constraints that serial codes cannot meet.
- Higher resolution, more realistic, and more complex models require vast computation.
- Data volumes are too large for serial processing to be practical.
- Modern hardware is inherently parallel; ignoring parallelism wastes available performance.
- Parallelism improves throughput and can help manage costs and energy.
- Entire classes of modern workflows (real-time, interactive, large-scale AI) depend on parallel execution.
The rest of this section of the course focuses on how to organize and use this parallelism effectively (types of parallelism, scaling behavior, and related concepts), building on the motivation outlined here.