Kahibaro
Discord Login Register

Running applications on clusters

From Login to Results: The End-to-End Flow

Running on a cluster follows a fairly standard pattern, regardless of site:

  1. Prepare your code and inputs (usually elsewhere: laptop, dev node, or login node).
  2. Stage data and executables to the cluster filesystem.
  3. Request resources and submit a job to the scheduler (e.g. SLURM).
  4. Monitor while it is queued and running.
  5. Inspect output, handle errors, and iterate.
  6. Archive or clean up results.

This chapter focuses on how that flow looks in practice on a typical HPC cluster.

Typical roles of the main systems:

Preparing to Run: Executables and Inputs

Before submitting anything:

A typical project layout for running on a cluster:

my_project/
  code/
    main.cpp
    ...
  build/
    my_app            # compiled executable
  inputs/
    config.in
    initial_state.dat
  scripts/
    run_weak_scaling.slurm
    run_strong_scaling.slurm
  results/
    test/
    production/

Choosing How to Run: Interactive vs Batch

Most clusters support two main execution styles.

Interactive jobs

Use interactive jobs for short tests, debugging, and exploratory work.

    salloc -N 1 -n 4 -t 00:30:00 --partition=debug
    srun ./my_app input.dat

Site-specific examples (conceptually similar even if commands differ):

Batch jobs

Batch jobs are the normal way to run real workloads.

Skeleton batch flow (SLURM):

sbatch scripts/run_simulation.slurm   # submit job
squeue -u $USER                       # watch queue
cat slurm-123456.out                  # read output after completion

Practical Job Script Structure

The exact syntax depends on the scheduler; here we use SLURM as a concrete example, but the structure is similar in other systems.

A typical job script has four main parts:

  1. Shebang: which shell to use.
  2. Scheduler directives: resources, time, partition, account, etc.
  3. Environment setup: modules, variables, working directory.
  4. Execution commands: srun, mpirun, or an application launcher.

Example batch script for a simple MPI job:

#!/bin/bash
#SBATCH --job-name=mpi_test
#SBATCH --output=logs/mpi_test_%j.out
#SBATCH --error=logs/mpi_test_%j.err
#SBATCH --time=01:00:00
#SBATCH --nodes=2
#SBATCH --ntasks-per-node=16
#SBATCH --partition=standard
#SBATCH --account=my_project
# 1. Environment setup
module purge
module load gcc/13.2.0
module load openmpi/5.0.0
cd $SLURM_SUBMIT_DIR   # directory where 'sbatch' was called
# 2. Run the application
srun ./build/my_app inputs/config.in

Key practical points:

    echo "Job ID: $SLURM_JOB_ID"
    echo "Running on nodes:"
    scontrol show hostnames "$SLURM_JOB_NODELIST"
    module list
    git rev-parse HEAD 2>/dev/null || echo "Not a git repo"
  set -euo pipefail

This helps catch missing files or environment issues early.

Running Different Types of Applications

Clusters run a wide variety of codes. The launch pattern depends on the parallel model.

Serial (single-core) applications

Even serial programs should usually be run via the scheduler.

Script:

#SBATCH --job-name=serial_example
#SBATCH --time=00:30:00
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=1
module load gcc/13.2.0
./build/serial_app inputs/config.in

Notes:

OpenMP / threaded applications

Threaded codes use multiple cores within a node.

Key practical points:

Example:

#SBATCH --job-name=openmp_example
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=16
#SBATCH --time=00:30:00
module load gcc/13.2.0
export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK
export OMP_PROC_BIND=spread
export OMP_PLACES=cores
./build/openmp_app inputs/config.in

MPI applications

MPI codes use multiple processes, often across nodes.

Practical checklist:

Example:

#SBATCH --job-name=mpi_example
#SBATCH --nodes=4
#SBATCH --ntasks-per-node=32
#SBATCH --time=02:00:00
module load openmpi/5.0.0
srun ./build/mpi_app inputs/config.in

Some systems prefer mpirun or mpiexec:

mpirun -np $SLURM_NTASKS ./build/mpi_app inputs/config.in

Follow your site’s recommendation to avoid conflicts.

Hybrid MPI + OpenMP

Combine both models to exploit nodes with many cores.

Key practical choices:

Example:

#SBATCH --job-name=hybrid_example
#SBATCH --nodes=2
#SBATCH --ntasks-per-node=4        # 4 MPI ranks per node
#SBATCH --cpus-per-task=8          # 8 threads per rank
#SBATCH --time=02:00:00
module load openmpi/5.0.0
export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK
export OMP_PROC_BIND=spread
export OMP_PLACES=cores
srun ./build/hybrid_app inputs/config.in

GPU-accelerated applications

You must explicitly request GPUs and typically load CUDA or other GPU stacks.

Example for a single GPU per task:

#SBATCH --job-name=gpu_example
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=8
#SBATCH --gpus=1
#SBATCH --time=01:00:00
#SBATCH --partition=gpu
module load cuda/12.2
module load gcc/13.2.0
nvidia-smi   # sanity check
./build/gpu_app inputs/config.in

Some clusters use --gres=gpu:1 instead of --gpus=1, or special GPU partitions. Always check local documentation.

Managing Many Runs: Job Arrays and Sweeps

Real workloads often require many similar runs:

Job arrays

Job arrays let you launch many nearly identical jobs in one command.

Conceptual pattern (SLURM):

#!/bin/bash
#SBATCH --job-name=array_example
#SBATCH --array=0-9
#SBATCH --time=00:20:00
#SBATCH --nodes=1
#SBATCH --ntasks=1
module load gcc/13.2.0
PARAM_FILE=params.txt
PARAM=$(sed -n "$((SLURM_ARRAY_TASK_ID+1))p" "$PARAM_FILE")
./build/my_app --param "$PARAM"

Notes:

  #SBATCH --output=logs/run_%A_%a.out

where %A is the master array job ID and %a is the task ID.

Manual parameter sweeps within a job

Sometimes you prefer to run multiple cases within a single allocation to reduce queue overhead:

for param in 0.1 0.2 0.4 0.8; do
  echo "Running with param=$param"
  ./build/my_app --param "$param" > "results/param_${param}.out"
done

This is useful for cheap or quick runs, but make sure total run time stays within your wall-clock limit.

Resource Requests in Practice

How you request resources affects:

Key choices:

Practical pattern: start with small test runs (short time, 1 node) to:

Working with the Cluster Filesystem

How you manage data has a big impact on reliability and performance.

Choosing directories

Most clusters have distinct storage areas:

Typical running pattern:

  1. Put code, scripts, parameter files in home or project space.
  2. Run jobs from a working directory on scratch:
   cd /scratch/$USER/my_project/run1
   srun ./my_app ...
  1. After completion, copy important results back to project/home before scratch is purged.

Avoiding filesystem pitfalls

  results/
    case_A/
    case_B/
    scaling_test_32nodes/

Monitoring, Debugging, and Restarting Jobs

Your jobs will not always behave as expected. Practical handling is crucial.

Monitoring in queue and during runtime

Tools commonly available:

    tail -f logs/my_job_123456.out

Include periodic progress messages in your application or wrapper scripts:

echo "Starting at $(date)"
./build/my_app ...
echo "Finished at $(date)"

Common runtime issues

Symptoms and practical responses:

Checkpointing and restarts in practice

Many HPC applications support checkpoint/restart:

Example (using a hypothetical app):

# First run
srun ./my_app --input init.in --checkpoint checkpoint.dat
# Restart run
srun ./my_app --restart checkpoint.dat

Scaling Up: From Test to Production

Transitioning from tiny tests to full-scale runs requires a deliberate process.

Typical progression:

  1. Functional test:
    • 1 node, very small problem size, short wall time.
  2. Performance sanity check:
    • 1 node, realistic problem size.
    • Confirm no severe bottlenecks (I/O, CPU idle, missing vectorization/GPU usage).
  3. Scaling experiments:
    • Vary nodes or GPUs:
      • 1, 2, 4, 8 nodes, measuring runtime and efficiency.
    • Decide where scaling benefits flatten or reverse.
  4. Production plan:
    • Choose problem size and number of nodes based on scaling results.
    • Estimate wall time with margin (e.g. 20–30% buffer).
    • Submit final production jobs.

Keep all scripts and logs from test and scaling stages; they are useful for:

Practical Patterns and Tips

A few small habits significantly improve your experience:

  # run_caseA.sh
  #!/bin/bash
  set -euo pipefail
  module purge
  module load my_software_stack
  CASE=caseA
  WORKDIR=/scratch/$USER/my_project/$CASE
  mkdir -p "$WORKDIR"
  cd "$WORKDIR"
  cp ~/my_project/inputs/$CASE/* .
  srun ~/my_project/build/my_app config.in
  cp -r . ~/my_project/results/$CASE

Then call this entry script from your job script.

Putting It Together: A Minimal End-to-End Example

Imagine you want to run an MPI simulation on 4 nodes with 32 tasks per node.

  1. Compile on the login node:
   module load gcc/13.2.0 openmpi/5.0.0
   mkdir -p build && cd build
   mpicxx ../code/main.cpp -O3 -o my_app
  1. Create a job script scripts/run_production.slurm:
   #!/bin/bash
   #SBATCH --job-name=prod_sim
   #SBATCH --output=logs/prod_sim_%j.out
   #SBATCH --error=logs/prod_sim_%j.err
   #SBATCH --nodes=4
   #SBATCH --ntasks-per-node=32
   #SBATCH --time=04:00:00
   #SBATCH --partition=standard
   #SBATCH --account=my_project
   set -euo pipefail
   module purge
   module load gcc/13.2.0 openmpi/5.0.0
   cd $SLURM_SUBMIT_DIR
   # Run from scratch
   RUN_DIR=/scratch/$USER/prod_run_${SLURM_JOB_ID}
   mkdir -p "$RUN_DIR"
   cp inputs/config_prod.in "$RUN_DIR"
   cp build/my_app "$RUN_DIR"
   cd "$RUN_DIR"
   srun ./my_app config_prod.in
   # Save results
   mkdir -p $SLURM_SUBMIT_DIR/results/prod
   cp -r . $SLURM_SUBMIT_DIR/results/prod/run_${SLURM_JOB_ID}
  1. Submit:
   cd ~/my_project
   sbatch scripts/run_production.slurm
  1. Monitor:
   squeue -u $USER
   tail -f logs/prod_sim_123456.out    # replace with actual job ID
  1. Inspect results in results/prod/run_<jobid>/ after completion.

This pattern—compile, stage data, request resources with a carefully written job script, monitor, and archive results—is the core of running applications on clusters in practice.

Views: 11

Comments

Please login to add a comment.

Don't have an account? Register now!