Table of Contents
Welcome to the Course
This course is an introduction to High Performance Computing, often shortened to HPC, for absolute beginners. It assumes that you may never have logged into a cluster, written a parallel program, or used Linux before. The main goal is to help you move from curiosity about HPC to the point where you can actually run meaningful computations on a real system with confidence.
HPC combines ideas from computer architecture, operating systems, programming models, performance engineering, and scientific applications. You will encounter all of these topics, but always with a practical focus: how they help you solve larger, faster, or more complex computational problems than a single laptop can handle.
By the end of the course, you should understand what HPC is for, how typical systems are organized, and how to write, run, and evaluate simple parallel programs on a cluster.
What You Will Learn
This course has several intertwined learning goals.
First, you will learn how HPC systems are structured and why that structure matters. That includes the way processors, memory, storage, and networks interact, and how different parts of a cluster serve different roles. You will not become a hardware designer, but you will learn enough to make informed choices about how to run your computations efficiently.
Second, you will learn how to use an HPC environment in practice. This covers the basics of Linux in the context of clusters, how to work with files and directories, how software is organized, and how you access and control shared resources through schedulers and job scripts.
Third, you will learn fundamental concepts of parallel computing. This includes understanding why parallelism is necessary for modern performance, how different kinds of parallelism are organized, and how scaling behavior and efficiency are evaluated. You will also see how these concepts appear in shared memory, distributed memory, and accelerator programming models.
Fourth, you will gain hands-on experience writing and running parallel programs. You will be introduced to widely used programming interfaces such as OpenMP, MPI, and accelerator models. You will not master every feature, but you will learn enough to understand, modify, and create basic parallel codes.
Fifth, you will learn how to analyze and improve performance. That means measuring execution time, understanding where time is spent, recognizing bottlenecks in memory and communication, and applying simple optimization strategies. You will also learn about existing numerical libraries and software stacks that can save you from reimplementing complex algorithms.
Finally, you will develop good habits around data management, reproducibility, testing, and responsible use of HPC resources. The goal is not only to run things faster but also to run them in a way that others can reproduce, verify, and build upon, while using shared systems fairly and efficiently.
How the Course Is Structured
The course is divided into self-contained chapters that build on one another. Each major topic in the outline has its own place, so this overview only explains how they fit together and what role each plays in the overall learning path.
You begin with conceptual foundations. The early chapters on what HPC is, why it matters, and example applications give context. They show you how HPC affects science, engineering, and industry and why large-scale computation has become so central.
You then move into the technical basis of HPC systems. The chapters on computer architecture explain how processors, cores, and the memory hierarchy influence performance. You will see the difference between memory types and how data movement can dominate computation speed. The chapters on storage and accelerators introduce the wider system components that matter for large computations.
Once you have the architectural picture, you learn about the operating environment. The Linux-focused chapters teach you how to interact with HPC systems through the command line, how files are organized, and how software is managed. You will learn what environment modules are and how software installation is typically handled on clusters.
The central part of the course is about clusters and parallel execution. You will study the structure of HPC clusters, including different node types and interconnects, and the distinction between shared and distributed memory. Parallel filesystems are introduced so you know where data lives and how it is accessed at scale. On top of this hardware and system view, you then learn about job scheduling, resource management, and how batch systems such as SLURM control who runs what, where, and when.
Parallel computing concepts form the theoretical spine of the course. You explore why parallelism is needed, how tasks and data can be split, and what strong and weak scaling mean. You see formal statements such as Amdahl’s and Gustafson’s laws and learn about load balancing. These ideas guide your later choices in how to parallelize and evaluate applications.
Programming models are introduced in layers. You first work with shared memory programming using OpenMP. Then you learn distributed memory programming with MPI. After that, you see how to combine them in hybrid models to exploit both node level and cluster level parallelism. GPU and accelerator computing is added to show how heterogeneous architectures extend these ideas and why accelerators are so important in modern HPC.
To support writing and running codes, you learn about compilers and build systems. This includes the role of common compilers, how to use optimization flags, and how to distinguish debug and optimized builds. Basic build tools such as Make and CMake are introduced to help manage multi file projects in a systematic way.
Once you can compile and run parallel programs, the course turns to performance and software ecosystems. You examine how to measure performance, benchmark applications, and use profiling tools. Chapters on memory and cache optimization, vectorization, and parallel efficiency help deepen your understanding of why performance looks the way it does. Numerical libraries, FFT libraries, and scientific software frameworks are introduced so you can leverage existing tools instead of reinventing complex algorithms.
Data management and I/O, reproducibility and environments, and debugging and testing bring in the practical concerns of running real workloads. You learn strategies for parallel I/O, checkpointing, large data handling, and for building reproducible workflows with modules and containers. Debugging and testing chapters show common parallel bugs and how to systematically find and fix them.
In the later part of the course you examine HPC in practice through workflows and case studies, and you consider the ethical and sustainability aspects of large scale computation. Finally, the course looks forward to future trends such as exascale systems, AI integration, heterogeneous architectures, and potential roles for quantum computing. The course ends with a project and hands on exercises that integrate the main ideas.
How the Pieces Fit Together
The sequence of chapters is designed so each new idea relies on earlier foundations. Concepts like cores and caches support your understanding of shared memory programming and performance tuning. The distinction between shared and distributed memory explains why there are different programming models like OpenMP and MPI. The structure of clusters and interconnects sets the stage for job scheduling and process placement. Benchmarking and profiling chapters rely on the skills you acquire in compiling and running parallel codes. Reproducibility, data management, and debugging are interwoven across the course, even though they have dedicated chapters.
As you move through the material, you should gradually build three mental models. The first is a model of the system: what components exist in an HPC environment and how they relate. The second is a model of the software stack: from compilers and libraries up to applications and workflows. The third is a model of performance: how to think about time, memory, communication, and scaling.
The practical exercises and the final project are where these models come together. You will be asked to design or adapt an application, run it on a real or simulated cluster, evaluate its performance, and document what you did. The course is not only about individual concepts but also about integrating them into a coherent approach to computational problem solving.
What Background Is Expected
This course is intended for beginners to HPC, not necessarily beginners to all computing. You should be comfortable with basic programming in at least one language, such as C, C++, Fortran, or Python. You do not need prior exposure to parallel programming, Linux, or clusters; those are introduced from first principles in their respective chapters.
Some familiarity with elementary mathematics and scientific problem solving is helpful, especially when you interpret performance plots or reason about scaling behavior. However, the focus is on concepts and practice, not on advanced mathematics.
Whenever the course relies on a concept that belongs to another chapter or domain, that concept will be introduced where it is needed. You are not expected to know how operating systems, networks, or compilers work internally beyond what is explained in their dedicated sections.
How to Work Through the Course
You can treat the course as a linear path, starting with the introductory overview of HPC and moving through architecture, systems, programming, and performance. This is the recommended approach if you are new to most of the topics, because each chapter assumes familiarity with earlier material.
However, once you have some experience you may wish to revisit specific sections. For example, after learning MPI you might return to the parallel computing concepts to reinterpret load balancing or scaling laws in light of your coding experience. Similarly, performance analysis may make more sense when you apply it to codes you have already written.
Hands on exercises, when available, are central. Reading about job scheduling, memory hierarchies, or vectorization is less effective than trying them on a real system. You should plan to log into a cluster or suitable training environment, run examples, and observe the behavior. The final project is designed as a capstone that encourages you to integrate theoretical understanding with practical skills.
Key Themes and Course Philosophy
Several themes run throughout the course.
One theme is that performance is a property of both hardware and software. You will see repeatedly that understanding the architecture helps you write and run code that performs well, but you do not need to become a specialist in low level details. Instead, you learn to recognize patterns that often matter, such as data locality or overhead from communication.
Another theme is that existing tools and libraries are crucial. HPC is built on reusable components, from numerical kernels to parallel I/O libraries. A major skill is learning how to find, understand, and correctly use these components rather than writing everything from scratch.
A third theme is reproducibility and responsibility. Shared HPC systems are expensive and energy intensive. Your workflows should be transparent, reliable, and fair to other users. Throughout the course you will see practices that help with version control of software environments, repeatable job scripts, and clear documentation of computational experiments.
Finally, the course emphasizes that HPC is not static. Architectures, programming models, and best practices evolve. The goal is not only to teach you the current tools but also to give you conceptual foundations that will help you adapt to new systems and technologies in the future.
What You Will Be Able to Do
If you engage fully with the course material and exercises, by the end you should be able to do the following in a basic but real way.
You will be able to describe what HPC is, why it is used, and where it appears in science, engineering, and industry. You will be able to explain, in broad terms, how an HPC cluster is organized, what different kinds of nodes and networks do, and how memory and storage are arranged.
You will be able to log into an HPC system, navigate the Linux environment, manage files, load software via environment modules, and compile programs with appropriate optimization levels. You will be able to write job scripts for a scheduler such as SLURM, submit and monitor jobs, and request resources that match the needs of your application.
You will be able to write simple shared memory parallel programs using OpenMP and simple distributed memory programs using MPI, and you will understand how these models differ and how they can be combined. You will be able to run basic computations on GPUs or other accelerators using introductory models and to reason about when accelerators might help.
You will be able to measure the performance of your programs, interpret timing and scaling results, and identify obvious bottlenecks. You will know how to use simple profiling tools, how to check whether your code benefits from vectorization and cache friendly access, and when to rely on existing libraries for heavy numerical work.
You will be able to manage input and output for larger runs, use checkpointing strategies to protect long simulations, and organize data in a way that supports analysis and reproducibility. You will have experience with debugging and testing parallel programs, and with using containers or similar tools to control software environments.
You will also have practiced designing and executing an HPC oriented project, documenting the choices you made, and reflecting on performance, correctness, and resource usage. This experience should make you comfortable approaching new HPC tasks and reading more advanced material.
Moving Forward
This overview chapter has described what the course aims to achieve, how it is organized, and what you can expect to learn. Each subsequent chapter will focus on one specific part of the overall picture, from basic definitions and architecture to advanced tools and future trends.
As you proceed, it is useful to keep three questions in mind for any new topic. How does this concept fit into the structure of an HPC system or workflow. How does it affect performance or scalability. How might it influence the way you design, run, or interpret computational experiments. If you can answer these questions by the end of the course for each major component, you will have a solid introductory understanding of High Performance Computing.