Table of Contents
Big Picture: Where HPC Is Going
High-Performance Computing is changing quickly. The main trends revolve around three themes:
- Pushing to ever larger scales (exascale and beyond)
- Combining traditional simulation with AI and data-driven methods
- Adapting to increasingly heterogeneous and specialized hardware
In this chapter you get an orientation to the directions HPC is moving in, what that means for systems and applications, and what skills will remain important as the ecosystem changes.
You do not need to understand every technical detail yet; the goal is to recognize the trends and the kinds of adaptations they require.
Exascale and Beyond
What “exascale” really means in practice
“Exascale” refers to systems capable of sustained performance on the order of $10^{18}$ floating‑point operations per second for realistic workloads. In practice, exascale is about:
- Extreme concurrency: millions to billions of hardware threads or cores
- Deep memory hierarchies: multiple cache levels, HBM, NVRAM, and complex NUMA layouts
- Node-level heterogeneity: CPUs + multiple GPUs/accelerators per node
- Power constraints: performance gains under a (mostly) fixed power budget
From a user’s perspective, exascale does not mean you automatically run $10^3$ times faster than a petascale machine. It means:
- Your code must expose much more parallelism
- Algorithms must tolerate higher failure rates (hardware and software)
- Communication and synchronization costs become more dominant
- I/O and data movement are often the bottleneck, not raw flops
Programming models at extreme scale
At exascale, no single programming model is sufficient for all layers. Common patterns include:
- Hybrid parallelism: MPI + threads (OpenMP) + accelerators (CUDA, HIP, SYCL, OpenACC)
- Task-based runtime systems: delegating scheduling decisions to runtimes that can better match the hardware concurrency (e.g. asynchronous tasks, DAG-based execution)
- Domain-specific solutions: frameworks that encode knowledge about a particular class of problems, so users write less low-level parallel code
For application developers, this means:
- You still need a good grasp of concepts like locality, synchronization, and load balancing
- You increasingly rely on libraries and frameworks to manage the most complex aspects of parallelism and data movement
Resilience and reliability
As systems grow:
- Component counts increase dramatically
- The probability of some component failing during a long run rises
Systems and applications respond by:
- Using checkpoint/restart more intelligently (incremental, in-memory, multi-level)
- Introducing algorithmic resilience, where methods can tolerate some errors or missing data
- Adopting fault-aware runtimes, which can recover from node or process failure without killing the entire job
For you, the practical takeaway is that resilience becomes part of algorithm and software design, not just a system-level concern.
AI and Machine Learning in HPC
Convergence of simulation and data-driven methods
Traditional HPC focuses on solving physics-based models with numerical methods. AI and machine learning add:
- Surrogate models: fast approximations of expensive simulations
- Data-driven components: learning parts of models directly from data (e.g. closures, subgrid-scale models)
- Automated discovery: exploring parameter spaces or designs with optimization guided by ML
This leads to hybrid workflows that combine:
- Large-scale simulations to generate data
- ML training on this data (often on the same HPC systems)
- Lightweight ML inference embedded back into simulations or used for steering
Workflows and software stacks
AI/ML in HPC changes typical workflows:
- Use of Python ecosystems (NumPy, PyTorch, TensorFlow) on supercomputers
- Integration of Jupyter-based analysis with batch job systems
- Pipelines that mix:
- Traditional compiled codes (Fortran/C/C++ with MPI/OpenMP/accelerators)
- Python orchestration and ML libraries
- Containerized environments for reproducibility
From a beginner’s standpoint, it is helpful to:
- Be comfortable running mixed Python + compiled code workflows on clusters
- Understand that ML jobs can be HPC jobs—they still use schedulers, nodes, accelerators, and parallel I/O
HPC for AI, and AI for HPC
Two complementary directions are emerging:
- HPC for AI:
- Training very large models requires HPC-class clusters with fast interconnects, huge GPU counts, and scalable storage
- Job scheduling and resource management principles are similar to traditional HPC workloads
- AI for HPC:
- Using ML to tune parameters, choose algorithms, or predict performance
- Applying ML to scheduling and resource allocation decisions
- Detecting anomalies in system logs or performance metrics
You do not need to design these systems now, but you should expect to encounter:
- ML-assisted performance tuning and auto-tuning
- Schedulers that use learned policies instead of fixed heuristics
Heterogeneous and Specialized Architectures
Growing diversity of hardware
Future HPC systems are increasingly heterogeneous:
- Multiple CPU architectures (x86, Arm, RISC‑V) in large systems
- GPUs from different vendors (NVIDIA, AMD, Intel)
- Domain-specific accelerators, e.g. for:
- Dense linear algebra
- AI workloads (tensor cores, TPUs, NPUs)
- Graph processing or sparse workloads
- New memory and storage technologies (HBM, persistent memory, computational storage)
This diversity has important implications:
- Software must be portable across architectures
- Performance tuning becomes more complex but also more rewarding
- Tooling and standards are critical to avoid being locked to one vendor
Performance portability
“Performance portability” aims to write code once that runs:
- Correctly across many architectures
- With good (not necessarily perfect) performance on each
Approaches include:
- Abstraction libraries and frameworks for parallel loops and data structures
- Directive-based models (compiler hints instead of explicit low-level code)
- Programming standards that can target multiple backends
For you, the long-term skill is to:
- Separate algorithmic intent from hardware-specific implementation details
- Learn to use portable programming models and profile them on different systems
Energy efficiency and green HPC
Power and energy constraints drive many hardware decisions:
- Specialized accelerators for better flops per watt
- Architecture features for dynamic voltage and frequency scaling (DVFS)
- On-node and in-network computation to reduce data movement
For users and developers, energy becomes a first-class metric alongside runtime:
- Job schedulers may consider energy budgets
- Tools are emerging to measure and attribute energy usage
- Algorithms that minimize data movement (even at the cost of more computation) become more attractive
Quantum Computing and Its Relationship to HPC
Complementary, not a replacement
Quantum computing is not expected to replace classical HPC in the near term. Instead, it is viewed as:
- A co-processor for specific tasks (e.g. some optimization or quantum chemistry problems)
- An additional resource integrated into HPC workflows, not a stand-alone solution
Most large quantum workloads still rely on classical HPC for:
- Circuit simulation (for design and verification)
- Pre- and post-processing of data
- Orchestrating hybrid quantum–classical algorithms
Hybrid quantum–classical workflows
Typical patterns include:
- Using an HPC cluster to:
- Optimize parameters of a quantum algorithm
- Run many small quantum circuits in parallel
- Aggregate and analyze measurement data
- Coordinating between:
- A classical optimizer (running on CPUs/GPUs)
- A quantum processor for specific substeps
Even as a beginner, it is useful to understand:
- Quantum systems are likely to appear as specialized resources within broader HPC infrastructures
- Software stacks will include middleware that hides the details of interacting with quantum hardware
Evolving Software and Programming Ecosystems
Higher-level abstractions and productivity
As hardware complexity grows, relying purely on low-level programming is becoming unsustainable for many users. Trends include:
- Domain-specific languages (DSLs) that let scientists specify problems at a higher level
- Code generation tools that produce optimized kernels for specific architectures
- Increasing use of Python and high-level interfaces with performance-critical parts in compiled languages
For new practitioners, this means:
- Investing time in understanding numerical methods and algorithms remains essential
- Expert knowledge about data layout, communication patterns, and memory access is still valuable, but often expressed through higher-level tools
Automation and autotuning
Autotuning is becoming standard practice:
- Automatically exploring different:
- Tile sizes
- Vectorization strategies
- Thread/block configurations
- Using search or ML methods to find good parameter sets for a particular architecture
In the future you may:
- Specify tunable parameters and let tools search for optimal values
- Use online tuning that adjusts configuration at runtime based on problem size and hardware state
Data-Centric and Workflow-Oriented HPC
From single jobs to complex workflows
Instead of one long monolithic job, more workloads are:
- Pipelines consisting of many interdependent tasks
- Combining simulation, data assimilation, AI, visualization, and post-processing
- Spanning multiple systems (local cluster, cloud, remote supercomputers)
This motivates:
- Workflow engines that interact with job schedulers
- Data management tools specialized for large, distributed datasets
- Closer integration between compute and data services (e.g. in situ analysis and visualization)
In situ and streaming approaches
Moving data out to storage and back is often too slow and too expensive:
- In situ techniques process data while it is still in memory during simulation
- Streaming approaches enable real-time or near-real-time analysis and decision-making
These approaches change how you think about:
- When to analyze or visualize data
- How frequently to checkpoint
- How to reduce data volumes (compression, feature extraction) before writing to disk
Skills That Stay Relevant
Despite all these changes, certain foundations continue to matter:
- Understanding parallelism, data locality, and communication
- Being able to reason about performance and bottlenecks
- Writing code that is readable, testable, and reproducible
- Being comfortable using Linux, job schedulers, and batch systems
- Knowing how to leverage libraries and frameworks instead of reinventing everything
Future trends will introduce new hardware and software, but they build on the same core ideas you have encountered throughout this course. If you focus on these fundamentals, you will be well positioned to learn new models, tools, and architectures as HPC continues to evolve.