Table of Contents
Why Develop Code Locally First?
Developing on your local machine (laptop/workstation) before moving to an HPC cluster offers:
- Faster edit–compile–run cycles.
- Less dependency on shared resources and queues.
- Easier debugging with full control over the environment.
- Ability to prototype and experiment freely.
The goal is not to fully replicate the cluster, but to make it easy to move your work there later with minimal changes.
Choosing a Local Development Environment
Operating system considerations
- Linux on your laptop/workstation
- Easiest path to HPC clusters (which are almost always Linux).
- Package managers (
apt,dnf,pacman, etc.) provide compilers and tools similar to the cluster. - macOS
- Unix-like; compatible shell and many tools.
- Use Homebrew or MacPorts to install compilers and libraries.
- Windows
- Use WSL2 (Windows Subsystem for Linux) to run a real Linux environment.
- Alternatively, use a Linux virtual machine (VirtualBox, VMware, etc.).
In all cases, aim to have:
- A POSIX shell (bash, zsh, etc.).
- A package manager or another reliable way to install compilers and tools.
Local vs remote development workflows
Common patterns:
- Purely local: Edit, compile, and run entirely on your machine for small/medium problems.
- Local edit, remote run: Edit locally, run on the cluster for big problems.
- Remote development in a local editor: Use SSH-based integrations in editors/IDEs to work on cluster files as if they were local.
This chapter focuses on the local side: getting code and tools set up so that moving to the cluster is smooth.
Editors and IDEs for HPC-Oriented Workflows
Text editors
- vim / neovim
- Lightweight, ubiquitous on clusters.
- Steeper learning curve but powerful for remote editing over SSH.
- Emacs
- Programmable editor, popular in scientific computing communities.
- VS Code / Cursor / similar editors
- Modern UI, strong language support.
- Can connect to HPC clusters via “Remote SSH” extensions.
When choosing an editor, consider:
- Can you use the same or similar editor on both local and cluster?
- Does it support syntax highlighting, basic code navigation, and terminals?
You do not need an elaborate IDE to write effective HPC code; consistency and familiarity matter more.
Full IDEs
If you prefer a full IDE:
- CLion, Visual Studio Code, Eclipse, Visual Studio, Xcode, etc. all work for C/C++/Fortran and Python.
- Ensure they can:
- Configure custom compilers and build systems.
- Integrate with CMake or Make (common in HPC).
Keep in mind: the IDE itself will not run on the cluster’s compute nodes; you will still need to understand command-line tools eventually.
Local Toolchain Setup
Installing compilers
Aim to install at least these:
- C compiler:
gccorclang - C++ compiler:
g++orclang++ - Fortran compiler (if relevant):
gfortran - Python (for scripting, prototyping, glue code)
Examples:
- Debian/Ubuntu:
sudo apt update
sudo apt install build-essential gfortran python3 python3-venv- Fedora/RHEL:
sudo dnf groupinstall "Development Tools"
sudo dnf install gcc-gfortran python3 python3-virtualenv- macOS (Homebrew):
brew install gcc cmake make python- Windows with WSL2:
Use the same commands as for the chosen Linux distribution.
If your cluster uses a specific compiler (e.g., Intel, NVIDIA HPC SDK), you typically won’t install the same locally, but try to match language versions and major features (e.g., C++17 support).
Build systems and tools
Install the tools likely to be used on the cluster:
makecmakegit(for version control)- Optional but common:
ninja,pkg-config
Examples:
- Debian/Ubuntu:
sudo apt install cmake ninja-build git- macOS (Homebrew):
brew install cmake ninja gitProject Structure for Easy Cluster Migration
A clean project layout makes it almost trivial to move code between local and cluster environments.
Suggested layout
myproject/
src/
main.c
solver.c
solver.h
include/
myproject/
config.h
tests/
test_solver.c
CMakeLists.txt # or Makefile
README.md
scripts/
run_small.sh
run_large_cluster.shPrinciples:
- Separate source and tests: Makes it easier to run small tests locally and large ones on the cluster.
- Single configuration point: CMakeLists.txt or Makefile should capture how to build the project, regardless of where it is built.
- Scripts for common runs: Even locally, use simple scripts so that transitioning to job scripts later is natural.
Avoiding environment-specific assumptions
Try not to hard-code:
- Absolute paths (like
/home/username/...). - Compiler names (like
icc) if you can use environment variables.
Example using Makefile with overridable variables:
CC ?= gcc
CFLAGS ?= -O2 -Wall
all: myprog
myprog: main.o solver.o
$(CC) $(CFLAGS) -o $@ $^
clean:
rm -f *.o myprogOn the cluster you can then run:
make CC=icc CFLAGS="-O3 -xHost -qopenmp"without changing your source.
Developing with Parallelism Locally
You often want to test parallel concepts locally before going to the cluster, but with smaller scale.
OpenMP on your machine
If your compiler supports OpenMP (most do):
- Enable OpenMP in your local builds, for example:
gcc -O2 -fopenmp -o myprog main.c- Control threads locally with the same environment variables you’ll use on the cluster:
export OMP_NUM_THREADS=4
./myprogLocal machines usually have fewer cores than cluster nodes, but they still let you:
- Check correctness of threaded code.
- Catch basic race conditions and synchronization issues.
- Experiment with scheduling and environment variables.
MPI on your machine
To experiment with MPI locally:
- Install an MPI implementation (e.g., MPICH, Open MPI).
- Compile and run with a small number of processes.
Example on Ubuntu:
sudo apt install mpich
mpicc -O2 -o mympi main.c
mpirun -np 4 ./mympiYou cannot reproduce cluster-scale runs locally, but you can:
- Test that basic communication patterns work.
- Validate correctness with 2–8 processes before scaling up.
GPUs locally
If you have a GPU and want to experiment with accelerator code:
- Install the appropriate toolkit (e.g., CUDA Toolkit for NVIDIA GPUs).
- Use small, local problem sizes.
- Focus on correctness; performance tuning is often cluster-specific.
If you do not have a GPU, still structure your code so:
- GPU-specific parts are isolated (e.g., in a separate module).
- A CPU-only path exists for local development and testing.
Local Testing vs Cluster-Scale Testing
Designing test cases for local runs
Your local tests should be:
- Small: Run in seconds, not hours.
- Representative: Exercise the same logic used in large runs.
- Deterministic when possible: So you can reliably detect regressions.
Strategies:
- Create “mini” input datasets (reduced grids, fewer time steps).
- Use compile-time or runtime options to select “debug” or “small” modes.
- Write regression tests that verify numerical results within tolerances.
Example:
- Cluster: simulate 10,000 time steps on a 1024×1024 grid.
- Local: simulate 10–100 time steps on a 64×64 grid.
Debug vs release builds locally
Maintain at least two build configurations locally:
- Debug:
- Lower optimization (e.g.,
-O0or-O1). - Extra checks (
-g,-fsanitize=address,-fstack-protector, etc.). - Useful for stepping through code and catching memory errors.
- Release:
- Higher optimization (e.g.,
-O3). - Flags similar to those you’ll use on the cluster.
With CMake, for example:
cmake -S . -B build-debug -DCMAKE_BUILD_TYPE=Debug
cmake -S . -B build-release -DCMAKE_BUILD_TYPE=Release
cmake --build build-debug
cmake --build build-releaseOn the cluster, you can reuse the same CMakeLists.txt with different compilers and flags.
Managing Dependencies Locally
Your code may rely on external libraries (e.g., BLAS, FFT, HDF5). Local handling of dependencies should prepare you for cluster usage.
Using system packages
For commonly available libraries:
- Install via your OS package manager.
- Example (Ubuntu):
sudo apt install libopenblas-dev libfftw3-dev libhdf5-devThis is usually sufficient for development and small tests.
Using virtual environments (Python)
If your project involves Python:
- Create an isolated environment:
python3 -m venv venv
source venv/bin/activate
pip install numpy mpi4py- Keep
requirements.txtorenvironment.ymlso that you can reproduce or translate it into cluster modules or containers later.
Abstracting dependency locations
Use build system features to avoid hard-coding paths.
Example with CMake’s find_package:
find_package(FFTW3 REQUIRED)
target_link_libraries(myprog PRIVATE FFTW3::fftw3)On your local machine, CMake will find the system-installed library; on the cluster, it can find the library provided through modules or custom installations.
Using Version Control in an HPC Context
Version control is critical when your code exists both locally and on clusters.
Basic workflow with Git
Typical pattern:
- Create a repository locally:
git init myproject
cd myproject
git add .
git commit -m "Initial commit"- Host it on a platform (GitHub, GitLab, institutional Git server).
- On the cluster, clone the same repository:
git clone git@github.com:username/myproject.git- Synchronize changes with
git pullandgit push.
Benefits:
- Easy to keep local and cluster copies in sync.
- Clear history of changes affecting performance or correctness.
- Safer experimentation via branches.
Ignore local-only files
Use a .gitignore file to avoid committing build artifacts or local configuration:
build/
*.o
*.exe
*.out
*.log
*.swp
venv/
If you have IDE-specific files, add them as well (e.g., .vscode/, .idea/).
Emulating Cluster-Like Conditions Locally
You cannot fully reproduce an HPC environment, but you can approximate aspects of it to prepare your code.
Resource limitations
Test how your code behaves with limited resources:
- Restrict threads:
export OMP_NUM_THREADS=2- Limit memory usage by using small input sizes.
- Use tools like
ulimitto experiment with file limits and stack sizes (carefully).
Simulating batch-like runs
Even without a scheduler:
- Write simple shell scripts that:
- Set environment variables.
- Run your program.
- Redirect output to log files.
Example:
#!/usr/bin/env bash
set -e
export OMP_NUM_THREADS=4
./myprog input_small.dat > output.log 2>&1This is conceptually similar to a job script and prepares you for the cluster’s batch system.
Containers for closer replication
If your cluster uses containers or you want a more controlled environment:
- Use Docker or Podman locally.
- Define an image with compilers and libraries similar to the cluster.
- Develop and test inside this container.
Even if you do not use containers on the cluster, they can help standardize your local environment.
Moving from Local to Cluster
To transition smoothly:
- Ensure portability:
- Use standard C/C++/Fortran and widely supported libraries when possible.
- Avoid OS-specific APIs unless guarded with
#ifdefs or equivalents. - Externalize configuration:
- Problem size, file paths, and performance tuning parameters should be controlled via command-line arguments or configuration files, not hard-coded.
- Minimize assumptions about hardware:
- Do not assume a fixed number of cores or GPUs.
- Read those values from environment variables or provide them as runtime options.
- Document your local setup:
- Write down how to build and run the code locally.
- This documentation forms the basis for cluster-specific instructions.
When you first move to the cluster:
- Start with the same small test cases you use locally.
- Only after verifying correctness, increase problem sizes and resources.
Practical Development Workflow Summary
A practical, beginner-friendly workflow:
- Locally
- Set up compilers, build system, and editor.
- Create a clean project structure with Git.
- Implement features incrementally.
- Write small tests and run them frequently.
- Use debug builds and basic debugging tools.
- Pre-cluster check
- Ensure the project builds cleanly with a single command (
make,cmake --build, etc.). - Remove hard-coded paths and machine-specific settings.
- Commit your working state.
- On the cluster
- Clone or pull the repository.
- Adjust compiler/build flags via environment variables or build configuration, not by changing source every time.
- Run the same small tests first, then scale up.
Developing code locally in this disciplined way dramatically reduces friction later, when you begin running large jobs on shared HPC systems.