Table of Contents
How the Course Is Organized
This course is designed as a guided path from “I’ve never used HPC” to “I can run and reason about real workloads on a cluster.” The outline you saw is essentially the roadmap. Here is how the pieces fit together and what you should expect to be able to do by the end.
High-Level Structure
The course is divided into three broad phases:
- Foundations
- Course Overview
- Fundamentals of Computer Architecture
- Operating Systems and the Linux Environment
- HPC Clusters and Infrastructure
Focus: understanding what HPC is, how the hardware and operating system are organized, and how to navigate a Linux-based HPC environment.
- Core Parallel Programming and Performance
- Job Scheduling and Resource Management
- Parallel Computing Concepts
- Shared-Memory Parallel Programming (OpenMP)
- Distributed-Memory Parallel Programming (MPI)
- Hybrid Parallel Programming
- GPU and Accelerator Computing
- Compilers and Build Systems
- Performance Analysis and Optimization
Focus: learning how to express parallelism (on CPUs and GPUs), how to get code compiled and running efficiently on clusters, and how to measure and improve performance.
- Applied HPC and Best Practices
- Numerical Libraries and Software Stacks
- Data Management and I/O
- Reproducibility and Software Environments
- Debugging and Testing Parallel Programs
- HPC in Practice
- Ethics, Sustainability, and Green Computing
- Future Trends in HPC
- Final Project and Hands-On Exercises
Focus: using existing HPC software, working with real data, making your work reproducible, debugging and testing, understanding real workflows, and thinking about the broader impact and future of HPC.
Each major section builds on the ones before it. You will move from concepts and basic skills to practical application and then to critical reflection and future directions.
Pedagogical Flow
The course is structured so that each new unit answers a specific learner question that naturally arises from the previous units:
- “What is HPC and why should I care?”
Addressed in the Course Overview chapters: definitions, motivation, and examples. - “What am I actually running my code on?”
Addressed in Fundamentals of Computer Architecture and HPC Clusters and Infrastructure. - “How do I interact with these systems?”
Addressed in Operating Systems and the Linux Environment and Job Scheduling and Resource Management. - “How can I exploit parallelism?”
Addressed in Parallel Computing Concepts, followed by OpenMP, MPI, hybrid approaches, and GPUs. - “How do I make my code fast and reliable?”
Addressed in Compilers and Build Systems, Performance Analysis and Optimization, and Debugging and Testing Parallel Programs. - “How do I use existing tools and libraries instead of reinventing everything?”
Addressed in Numerical Libraries and Software Stacks and Data Management and I/O. - “How do I ensure others can reproduce my work and that I use resources responsibly?”
Addressed in Reproducibility and Software Environments and Ethics, Sustainability, and Green Computing. - “How does all of this come together in real research or industrial settings?”
Addressed in HPC in Practice, Future Trends in HPC, and consolidated through the Final Project.
This question-driven structure is deliberate: each topic is motivated by a practical need you’ll face when using HPC systems.
Types of Learning Activities
Throughout the course, you will encounter:
- Conceptual explanations
Short lectures or readings that give you the minimal theory needed to understand what you are doing and why it matters. - Command-line and scripting exercises
Hands-on tasks such as logging into a cluster, navigating the filesystem, submitting simple jobs, and automating repetitive tasks. - Small, focused programming tasks
Examples in a compiled language (commonly C, C++, or Fortran) for OpenMP and MPI, and in CUDA/OpenACC for GPU-related sections, where the emphasis is on understanding constructs rather than writing large applications. - Profiling and optimization labs
Activities where you run code, measure performance, analyze bottlenecks, and apply simple optimizations to see concrete speedups. - Library- and tool-centered labs
Experiments using numerical libraries, job schedulers, modules, containers, debugging tools, and profilers. - Case study walkthroughs
Guided examples of complete HPC workflows in science and industry, showing how all the components you learn about fit together in practice. - Final Project
A capstone where you apply the full workflow: design or adapt a small HPC application or workflow, run it at some scale, measure and analyze performance, and document your methods.
The balance leans toward doing rather than memorizing. Skills like using schedulers, debuggers, and profilers cannot be learned from theory alone.
Dependencies and Recommended Progression
While you can occasionally jump ahead to look at specific topics, the course is designed to be followed in order because of key dependencies:
- Before working with job schedulers, you should be comfortable with:
- Basic Linux command-line usage
- The idea of clusters (login vs compute nodes)
- Before writing OpenMP or MPI code, you should:
- Understand basic parallel computing concepts (parallelism types, scaling)
- Be able to edit, compile, and run simple programs
- Before GPU programming, you should:
- Understand the difference between CPUs and GPUs at a high level
- Be comfortable with CPU-side compilation and execution
- Before serious performance tuning, you should:
- Know how to run your code via the scheduler
- Understand at least one parallel programming model (OpenMP/MPI)
- Before the final project, you should:
- Have completed the modules on job scheduling, at least one parallel model, basic performance analysis, and reproducibility/environment management.
If you are already familiar with some topics (for example, Linux basics), you can skim those sections but should still review cluster-specific details and scheduler usage, which are often new even to experienced programmers.
Learning Goals and Expected Outcomes
By the end of the course, you should be able to:
Conceptual Understanding
- Explain the role of HPC in modern science, engineering, and industry.
- Describe the basic architecture of HPC systems, including CPUs, memory hierarchy, storage, and interconnects.
- Distinguish between shared-memory, distributed-memory, and hybrid parallelism, and understand where each is appropriate.
- Recognize when and why GPUs and accelerators are beneficial.
Practical Skills
- Log in to a Linux-based HPC cluster, navigate the filesystem, and manage files.
- Use environment modules and software stacks to access compilers, libraries, and applications.
- Submit, monitor, and manage batch jobs using a scheduler (with emphasis on SLURM).
- Write and compile small parallel programs using:
- OpenMP for shared-memory parallelism
- MPI for distributed-memory parallelism
- A simple GPU programming model (CUDA or OpenACC) for accelerator usage
- Use basic compiler options to generate debug and optimized builds.
- Run simple profiling and benchmarking studies to measure performance and scaling.
- Use numerical libraries and precompiled software instead of implementing core algorithms from scratch.
- Apply basic strategies for parallel I/O and large data management when using HPC systems.
- Employ debugging tools and structured testing approaches for parallel codes.
Professional and Scientific Practices
- Set up and manage reproducible software environments using modules and, where appropriate, containers.
- Document computational workflows and configurations sufficiently for others (and your future self) to reproduce results.
- Size jobs reasonably, understand fair-share considerations, and use resources efficiently.
- Reflect on the energy and environmental impact of HPC and how to mitigate unnecessary waste.
- Relate course concepts to real-world HPC use cases and anticipate how future trends may affect your own work.
Assessment and Progress Tracking
Although assessment details depend on your specific setting (self-study, university course, or training workshop), the course structure supports:
- Formative checks
Short quizzes or self-check questions to verify understanding of key terms and ideas. - Practical milestones
For example: - Successfully submitting and completing a batch job.
- Demonstrating speedup from serial to parallel versions.
- Showing scaling behavior on multiple cores or nodes.
- Code and workflow reviews
Evaluating job scripts, basic parallel codes, and reproducible environment setups. - Final Project deliverables
- A working HPC application or workflow
- A performance analysis and optimization summary
- Documentation of the environment and reproducibility steps
- A short reflection on resource usage and ethical aspects
The emphasis is on demonstrating you can use HPC systems competently and thoughtfully, not on memorizing low-level details.
How to Use This Course Effectively
- Follow the sequence, especially for the foundational and core programming sections.
- Do all hands-on steps, even if they seem simple; familiarity with commands and tools is central in HPC.
- Revisit earlier sections when necessary; for example, you might return to Linux basics when learning about batch scripting or revisit performance concepts when starting GPU work.
- Use the outline as a map; each chapter title represents a competency you should be able to demonstrate in a small, concrete way (a command, a script, a short program, or a documented workflow).
- Connect topics explicitly; for instance, when you learn a new performance tool, tie it back to earlier concepts like memory hierarchy or parallel efficiency.
By the end of the course, you should feel comfortable moving from your own laptop to a shared HPC cluster, running parallel jobs, using established software stacks, and making informed choices about performance and resource usage. The remaining chapters in this course will guide you step by step along that path.