Table of Contents
Understanding High-Performance Computing
High-Performance Computing (HPC) refers to using large amounts of computing power to solve problems that are too big, too slow, or too complex for ordinary computers. In practice, this usually means:
- Many processors (CPUs and/or GPUs) working together
- Large memory and storage systems
- High-speed networks connecting components
- Specialized software that can exploit this hardware
HPC is not one specific technology; it is a way of combining hardware, software, and algorithms to push performance beyond what a typical desktop or laptop can achieve.
Core Idea: Scale Up and Scale Out
HPC systems achieve high performance in two complementary ways:
- Scale up: Make individual components more powerful
Faster processors, more cores per CPU, more memory per node, more capable GPUs. - Scale out: Use many components in parallel
Networks of nodes (computers) working together on one problem, often numbering from tens to hundreds of thousands of cores.
Most modern HPC systems combine both: each node is quite powerful on its own, and many such nodes are linked together into a cluster or supercomputer.
What Makes HPC Different from Everyday Computing
Several features distinguish HPC from normal computing environments:
- Performance as the primary goal
The main objective is to complete large computations as quickly and efficiently as possible, often within fixed time or resource limits. - Parallel execution
Instead of one processor doing all the work, many processors work simultaneously on different parts of the problem. This requires algorithms that can be split into parallel tasks. - Scale of resources
HPC jobs may use: - Thousands of CPU cores
- Terabytes of memory
- Petabytes of storage
- High-speed interconnects and parallel filesystems
- Batch-style usage
Users typically do not run programs interactively on the main compute resources. Instead, they submit jobs to a scheduler and let the system decide when and where to run them. - Focus on scientific and technical workloads
HPC is primarily used for numerical simulations, data analysis, and modeling tasks, rather than office productivity or general web services.
Types of Problems Suited to HPC
HPC is used when problems have at least one of these characteristics:
- Compute-intensive
Require an enormous number of operations, for example: - Simulating physical systems (fluids, weather, materials)
- Solving large systems of equations
- Running large ensembles of simulations or parameter sweeps
- Data-intensive
Involve very large datasets that must be processed, stored, and moved efficiently: - Processing images from telescopes or microscopes
- Analyzing genomic or particle physics data
- Training large machine learning models
- Time-critical
Results are needed quickly: - Weather forecasting and climate projections
- Real-time decision support (e.g., emergency planning)
- Rapid design and testing cycles in engineering
If a task can be completed easily on a laptop in a reasonable time, it usually does not require HPC. HPC becomes valuable when a single machine is insufficient, either in compute speed, memory capacity, or data throughput.
HPC Systems at a High Level
While details vary widely, most HPC systems share a common high-level structure:
- Compute nodes
The workhorses where user computations run. Each node typically has: - Multiple CPU cores
- Often one or more GPUs or other accelerators
- Local memory and sometimes local storage
- High-speed interconnect
A specialized network links nodes together so they can cooperate on parallel tasks and share data efficiently. - Shared storage
Large central filesystems store input data, simulation outputs, software, and user files. - Management and login nodes
Front-end systems where users log in, edit code, compile programs, and submit jobs. These nodes orchestrate use of the shared compute resources.
The combination of many such systems worldwide, from small university clusters to national supercomputers, forms the global HPC ecosystem.
HPC vs. Related Concepts
High-Performance Computing overlaps with, but is not identical to, several other computing paradigms:
- Supercomputing
Often used synonymously with HPC, especially for the largest, most powerful systems ranked in global lists. HPC can also refer to smaller clusters, not only top-tier supercomputers. - Cluster computing
A common architecture used in HPC: multiple connected machines acting as one resource. HPC clusters are typically optimized for performance and scientific workloads. - Cloud computing
Offers on-demand access to computing resources over the internet. Cloud platforms can be used for HPC-style workloads, but traditional HPC centers differ in how they are structured, accessed, and optimized. - High-throughput computing (HTC)
Focuses on running many relatively small tasks over long periods (often independent of each other). HPC often focuses more on tightly coupled, large parallel jobs, though in practice many systems support both styles.
Why Parallelism Is Central to HPC
A defining feature of HPC is explicit parallelism. Because single-core performance improvements have slowed, further performance increases come mainly from:
- Using more cores per node
- Using more nodes in parallel
- Exploiting specialized accelerators like GPUs
- Using vector instructions within each core
To benefit from HPC systems, applications must be structured to exploit these forms of parallelism. This often requires:
- Identifying independent tasks or data partitions
- Minimizing communication and synchronization between tasks
- Organizing data so that each processing unit can operate efficiently
The details of how this is achieved are addressed in later chapters; here it is enough to note that parallelism is not optional for effective use of HPC resources.
Typical Goals When Using HPC
People turn to HPC to:
- Solve larger problems
Increase the resolution of simulations, use more detailed models, or process more data than would fit on a single machine. - Get results faster
Reduce the wall-clock time from days to hours or minutes, enabling more experimentation and iteration. - Improve accuracy and realism
Use more complex physics, statistics, or models that would otherwise be too expensive computationally. - Explore parameter spaces
Run many variants of a computation (e.g., different initial conditions or model parameters) in parallel. - Enable new science and engineering
Perform experiments in silico that would be impossible, too dangerous, or too expensive in the real world.
Who Uses HPC
HPC is used across many domains, including:
- Physical sciences (physics, chemistry, astronomy, earth sciences)
- Engineering (aerospace, automotive, civil, mechanical)
- Life sciences and medicine (genomics, drug discovery, medical imaging)
- Finance and economics (risk modeling, option pricing)
- Data science and AI (training and deploying large models)
- Public policy and planning (climate impacts, disaster response modeling)
Despite this diversity, users share a common need: they must perform computations beyond the capabilities of ordinary computing resources.
Summary
High-Performance Computing is the practice of using large-scale, parallel computing systems to tackle problems that are too big, too slow, or too complex for standard computers. It relies on:
- Many processors working together in parallel
- Specialized hardware and networks
- Software and algorithms designed to exploit these resources
The rest of this course explains the hardware, software, and programming models that make HPC possible, and how you can use them effectively in your own work.