1.1 What is High-Performance Computing

Table of Contents

Understanding High-Performance Computing

High-Performance Computing (HPC) refers to using large amounts of computing power to solve problems that are too big, too slow, or too complex for ordinary computers. In practice, this usually means:

Many processors (CPUs and/or GPUs) working together
Large memory and storage systems
High-speed networks connecting components
Specialized software that can exploit this hardware

HPC is not one specific technology; it is a way of combining hardware, software, and algorithms to push performance beyond what a typical desktop or laptop can achieve.

Core Idea: Scale Up and Scale Out

HPC systems achieve high performance in two complementary ways:

Scale up: Make individual components more powerful
Faster processors, more cores per CPU, more memory per node, more capable GPUs.
Scale out: Use many components in parallel
Networks of nodes (computers) working together on one problem, often numbering from tens to hundreds of thousands of cores.

Most modern HPC systems combine both: each node is quite powerful on its own, and many such nodes are linked together into a cluster or supercomputer.

What Makes HPC Different from Everyday Computing

Several features distinguish HPC from normal computing environments:

Performance as the primary goal
The main objective is to complete large computations as quickly and efficiently as possible, often within fixed time or resource limits.
Parallel execution
Instead of one processor doing all the work, many processors work simultaneously on different parts of the problem. This requires algorithms that can be split into parallel tasks.
Scale of resources
HPC jobs may use:

Thousands of CPU cores
Terabytes of memory
Petabytes of storage
High-speed interconnects and parallel filesystems

Batch-style usage
Users typically do not run programs interactively on the main compute resources. Instead, they submit jobs to a scheduler and let the system decide when and where to run them.
Focus on scientific and technical workloads
HPC is primarily used for numerical simulations, data analysis, and modeling tasks, rather than office productivity or general web services.

Types of Problems Suited to HPC

HPC is used when problems have at least one of these characteristics:

Compute-intensive
Require an enormous number of operations, for example:

Simulating physical systems (fluids, weather, materials)
Solving large systems of equations
Running large ensembles of simulations or parameter sweeps

Data-intensive
Involve very large datasets that must be processed, stored, and moved efficiently:

Processing images from telescopes or microscopes
Analyzing genomic or particle physics data
Training large machine learning models

Time-critical
Results are needed quickly:

Weather forecasting and climate projections
Real-time decision support (e.g., emergency planning)
Rapid design and testing cycles in engineering

If a task can be completed easily on a laptop in a reasonable time, it usually does not require HPC. HPC becomes valuable when a single machine is insufficient, either in compute speed, memory capacity, or data throughput.

HPC Systems at a High Level

While details vary widely, most HPC systems share a common high-level structure:

Compute nodes
The workhorses where user computations run. Each node typically has:

Multiple CPU cores
Often one or more GPUs or other accelerators
Local memory and sometimes local storage

High-speed interconnect
A specialized network links nodes together so they can cooperate on parallel tasks and share data efficiently.
Shared storage
Large central filesystems store input data, simulation outputs, software, and user files.
Management and login nodes
Front-end systems where users log in, edit code, compile programs, and submit jobs. These nodes orchestrate use of the shared compute resources.

The combination of many such systems worldwide, from small university clusters to national supercomputers, forms the global HPC ecosystem.

HPC vs. Related Concepts

High-Performance Computing overlaps with, but is not identical to, several other computing paradigms:

Supercomputing
Often used synonymously with HPC, especially for the largest, most powerful systems ranked in global lists. HPC can also refer to smaller clusters, not only top-tier supercomputers.
Cluster computing
A common architecture used in HPC: multiple connected machines acting as one resource. HPC clusters are typically optimized for performance and scientific workloads.
Cloud computing
Offers on-demand access to computing resources over the internet. Cloud platforms can be used for HPC-style workloads, but traditional HPC centers differ in how they are structured, accessed, and optimized.
High-throughput computing (HTC)
Focuses on running many relatively small tasks over long periods (often independent of each other). HPC often focuses more on tightly coupled, large parallel jobs, though in practice many systems support both styles.

Why Parallelism Is Central to HPC

A defining feature of HPC is explicit parallelism. Because single-core performance improvements have slowed, further performance increases come mainly from:

Using more cores per node
Using more nodes in parallel
Exploiting specialized accelerators like GPUs
Using vector instructions within each core

To benefit from HPC systems, applications must be structured to exploit these forms of parallelism. This often requires:

Identifying independent tasks or data partitions
Minimizing communication and synchronization between tasks
Organizing data so that each processing unit can operate efficiently

The details of how this is achieved are addressed in later chapters; here it is enough to note that parallelism is not optional for effective use of HPC resources.

Typical Goals When Using HPC

People turn to HPC to:

Solve larger problems
Increase the resolution of simulations, use more detailed models, or process more data than would fit on a single machine.
Get results faster
Reduce the wall-clock time from days to hours or minutes, enabling more experimentation and iteration.
Improve accuracy and realism
Use more complex physics, statistics, or models that would otherwise be too expensive computationally.
Explore parameter spaces
Run many variants of a computation (e.g., different initial conditions or model parameters) in parallel.
Enable new science and engineering
Perform experiments in silico that would be impossible, too dangerous, or too expensive in the real world.

Who Uses HPC

HPC is used across many domains, including:

Physical sciences (physics, chemistry, astronomy, earth sciences)
Engineering (aerospace, automotive, civil, mechanical)
Life sciences and medicine (genomics, drug discovery, medical imaging)
Finance and economics (risk modeling, option pricing)
Data science and AI (training and deploying large models)
Public policy and planning (climate impacts, disaster response modeling)

Despite this diversity, users share a common need: they must perform computations beyond the capabilities of ordinary computing resources.

Summary

High-Performance Computing is the practice of using large-scale, parallel computing systems to tackle problems that are too big, too slow, or too complex for standard computers. It relies on:

Many processors working together in parallel
Specialized hardware and networks
Software and algorithms designed to exploit these resources

The rest of this course explains the hardware, software, and programming models that make HPC possible, and how you can use them effectively in your own work.

Comments

Please login to add a comment.

Don't have an account? Register now!