Table of Contents
Goals of a Job Script
A job script is a text file that tells the scheduler:
- What resources you need (time, CPUs, memory, GPUs, etc.)
- What environment you want (modules, variables)
- What commands to run (your program, pre/post steps)
- Where to send output and error messages
In most HPC clusters, job scripts are submitted to a batch system (for example SLURM) using a command like sbatch. The rest of this chapter focuses on the practical aspects of writing such scripts.
Basic Structure of a Batch Job Script
A typical batch job script has three main parts:
- Shebang line – which shell to use
- Scheduler directives – special comments describing resources and job options
- Job body – the commands to execute
Minimal SLURM example:
#!/bin/bash
#SBATCH --job-name=my_test
#SBATCH --output=my_test_%j.out
#SBATCH --time=00:10:00
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=1
#SBATCH --mem=1G
echo "Running on host: $(hostname)"
echo "Starting at: $(date)"
# Load environment
module purge
module load gcc
# Run the program
./my_program input.datKey ideas:
- Lines starting with
#SBATCHare directives to SLURM, not normal shell comments. - Everything after the directives is executed by the shell you specify in the shebang.
Shebang and Shell Choice
The first line selects the shell:
#!/bin/bash– most common on Linux clusters#!/bin/zsh,#!/bin/sh– possible alternatives if supported
The shell determines:
- Available syntax (
[[ ... ]], arrays, functions, etc.) - How environment variables and loops are written
For beginners, #!/bin/bash is usually the best default.
Common SLURM Directives in Job Scripts
Directives control job behavior. They usually have both a long and often a short form. Some of the most commonly used:
Job identification and accounting
--job-name=NAME(-J NAME)
A short, descriptive name that appears in queue listings.--account=ACCOUNT(-A ACCOUNT)
Project/account to charge. Often required on shared systems.--partition=PART(-p PART)
Queue/partition to use (e.g.,short,long,gpu).
Example:
#SBATCH --job-name=matrix_mul
#SBATCH --account=project123
#SBATCH --partition=shortTime limits
--time=HH:MM:SS(-t HH:MM:SS)
Maximum wall-clock time you request.
Examples:
#SBATCH --time=00:30:00 # 30 minutes
#SBATCH --time=2-00:00:00 # 2 days (D-HH:MM:SS format)Request only as much as you realistically need; this can improve queue wait times and system efficiency.
CPU and task layout
Typical directives (interpretation depends on how the job is launched):
--ntasks=N(-n N)
Number of tasks (often equals number of MPI processes).--cpus-per-task=C(-c C)
Number of CPU cores (threads) per task, often used for OpenMP.--nodes=N(-N N)
Number of nodes to allocate.
Examples:
# Single-core serial job
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=1
# 16 MPI processes, one per task
#SBATCH --ntasks=16
#SBATCH --cpus-per-task=1
# 4 MPI processes, each using 8 threads (e.g., OpenMP)
#SBATCH --ntasks=4
#SBATCH --cpus-per-task=8
The job script usually combines these directives with the appropriate launch command in the body (srun, mpirun, omp settings, etc.), which is covered elsewhere.
Memory requests
Memory can be requested per node, per CPU, or per task depending on the cluster configuration.
Common forms:
--mem=4G
Total memory per node (e.g., 4 gigabytes per node).--mem-per-cpu=2G
Memory per CPU core.
Examples:
#SBATCH --mem=8G # 8 GB total per node
#SBATCH --mem-per-cpu=2G # 2 GB per CPU coreCheck the local cluster documentation to know which style is expected.
GPUs and accelerators
If the cluster has GPUs, you typically request them like:
--gres=gpu:NUM
Generic resources (GRES), e.g., GPUs.
Example:
#SBATCH --partition=gpu
#SBATCH --gres=gpu:1
#SBATCH --cpus-per-task=8
#SBATCH --mem=32GHere, one GPU is requested, with 8 CPU cores and 32 GB of memory.
Output and error handling
You can control where the job’s output and error messages go:
--output=FILE(-o FILE)
Standard output file.--error=FILE(-e FILE)
Standard error file.--mail-user=EMAIL
Email address for notifications.--mail-type=BEGIN,END,FAIL
When to send emails.
Useful placeholders:
%j– job ID%x– job name%u– user name
Examples:
#SBATCH --output=logs/%x_%j.out
#SBATCH --error=logs/%x_%j.err
#SBATCH --mail-user=myname@example.edu
#SBATCH --mail-type=FAIL,ENDOrganizing the Job Body
Within the job body, you write ordinary shell commands, but in a way that:
- Reconstructs the environment reliably
- Makes it easy to debug
- Records useful metadata
A common pattern:
- Initial info and safety checks
- Environment modules and variables
- Directory setup
- Run commands
- Final logging
Example:
#!/bin/bash
#SBATCH --job-name=heat_2d
#SBATCH --output=logs/%x_%j.out
#SBATCH --time=01:00:00
#SBATCH --partition=short
#SBATCH --ntasks=4
#SBATCH --cpus-per-task=4
#SBATCH --mem-per-cpu=1G
# 1. Log basic info
echo "Job ID: $SLURM_JOB_ID"
echo "Job name: $SLURM_JOB_NAME"
echo "User: $USER"
echo "Running on nodes: $SLURM_NODELIST"
echo "Number of tasks: $SLURM_NTASKS"
echo "CPUs per task: $SLURM_CPUS_PER_TASK"
echo "Started at: $(date)"
# 2. Setup environment
module purge
module load gcc/12.2
module load openmpi
export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK
# 3. Move to the directory from which the job was submitted
cd "$SLURM_SUBMIT_DIR"
# 4. Run the application
srun ./heat_2d_solver --nx 2048 --ny 2048 --steps 500
# 5. Final log
echo "Finished at: $(date)"Using Environment Variables Provided by the Scheduler
Schedulers often set environment variables that job scripts can use. For SLURM, common ones include:
SLURM_JOB_ID– numeric job IDSLURM_JOB_NAME– job nameSLURM_SUBMIT_DIR– directory from which you ransbatchSLURM_NTASKS– number of tasks allocatedSLURM_CPUS_PER_TASK– CPUs per taskSLURM_NODELIST– list of nodes allocated
Practical uses:
- Ensure you are in the expected directory:
cd "$SLURM_SUBMIT_DIR"- Control OpenMP threads:
export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK- Make logs self-describing:
echo "Nodes: $SLURM_NODELIST"Handling Working Directories and Paths
Common strategies in job scripts:
- Use absolute paths when possible to avoid confusion:
DATA_DIR=/scratch/$USER/data
RESULT_DIR=/scratch/$USER/results- Create necessary directories:
mkdir -p "$RESULT_DIR"- Move scratch data to a high-performance filesystem (if your cluster distinguishes home vs. scratch):
cp input/*.dat "$SLURM_TMPDIR"/
cd "$SLURM_TMPDIR"
srun ./my_code
cp results/* "$RESULT_DIR"/
Check your site documentation for recommended directories (e.g., $SCRATCH, $SLURM_TMPDIR, etc.).
Serial, OpenMP, MPI, and Hybrid Job Script Patterns
The directives and body structure vary with the parallel model you use. A few common patterns:
Pure serial job
#!/bin/bash
#SBATCH --job-name=serial_test
#SBATCH --output=serial_%j.out
#SBATCH --time=00:05:00
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=1
#SBATCH --mem=1G
cd "$SLURM_SUBMIT_DIR"
./serial_program input.datShared-memory (OpenMP-style) job
#!/bin/bash
#SBATCH --job-name=openmp_job
#SBATCH --output=openmp_%j.out
#SBATCH --time=00:30:00
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=8
#SBATCH --mem=8G
cd "$SLURM_SUBMIT_DIR"
export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK
./openmp_program input.datDistributed-memory (MPI-style) job
#!/bin/bash
#SBATCH --job-name=mpi_job
#SBATCH --output=mpi_%j.out
#SBATCH --time=02:00:00
#SBATCH --nodes=2
#SBATCH --ntasks-per-node=8
#SBATCH --mem=4G
cd "$SLURM_SUBMIT_DIR"
module load openmpi
srun ./mpi_program input.datHybrid MPI + OpenMP job
#!/bin/bash
#SBATCH --job-name=hybrid_job
#SBATCH --output=hybrid_%j.out
#SBATCH --time=02:00:00
#SBATCH --nodes=2
#SBATCH --ntasks-per-node=4
#SBATCH --cpus-per-task=8
#SBATCH --mem=8G
cd "$SLURM_SUBMIT_DIR"
module load mpi
export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK
srun ./hybrid_program input.datThe details of MPI/OpenMP themselves are covered elsewhere; here the focus is on matching directives to how you intend to run the job.
Parameter Sweeps and Simple Loops in Job Scripts
Sometimes you want a single job to run multiple related simulations with different parameters. You can use shell loops inside the job body:
#!/bin/bash
#SBATCH --job-name=param_sweep
#SBATCH --output=param_sweep_%j.out
#SBATCH --time=04:00:00
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=4
#SBATCH --mem=4G
cd "$SLURM_SUBMIT_DIR"
export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK
for NX in 128 256 512 1024; do
echo "Running with NX=$NX at $(date)"
./solver --nx "$NX" --ny "$NX" --steps 200 > "run_NX${NX}.log"
doneThis pattern is useful for small sweeps that can fit comfortably within a single job’s time and resource limits.
For large parameter sweeps, array jobs are more appropriate (introduced elsewhere), but the basic structure still resides in a job script.
Job Scripts for Interactive Sessions
Some clusters allow interactive jobs directly from the command line, but you can also use a script to request an interactive shell with certain resources:
#!/bin/bash
#SBATCH --job-name=interactive
#SBATCH --time=01:00:00
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=4
#SBATCH --mem=4G
# This command starts an interactive shell on a compute node
srun --pty bashSubmitting this script may give you a shell on a compute node with the resources you requested. You can then run commands interactively within that environment.
Common Pitfalls When Writing Job Scripts
Some frequent mistakes and how to avoid them:
- Forgetting the shebang
Result: job may fail to start or use an unexpected shell.
Fix: always start with#!/bin/bash(or another explicit shell). - Mismatched directives and launch commands
Example: Asking for--ntasks=16but running./program(serial) instead ofsrun ./program.
Fix: ensure resource requests align with how you run your code. - Not changing to the submit directory
Programs run in a default directory that may not contain your input files.
Fix: addcd "$SLURM_SUBMIT_DIR"early in the job body. - Requesting inconsistent memory
Using both--memand--mem-per-cpuin ways that conflict, or requesting more than exists per node.
Fix: check node specs and cluster policies; use one clear memory request style. - Hard-coding temporary paths incorrectly
Writing to/tmpdirectly on systems where node-local temporary directories are different or cleaned aggressively.
Fix: use site-provided environment variables like$SLURM_TMPDIRif available. - No logging or diagnostics
When something goes wrong, there is little information.
Fix: echo job info, and optionally addset -e(stop on first error) orset -x(print commands as they run) for debugging jobs.
Example for debugging:
#!/bin/bash
#SBATCH --job-name=debug_example
#SBATCH --output=debug_%j.out
#SBATCH --time=00:10:00
#SBATCH --ntasks=1
#SBATCH --mem=1G
set -e # exit on error
set -x # print commands
cd "$SLURM_SUBMIT_DIR"
./possibly_flaky_programLocal Customizations and Templates
Clusters often provide:
- Sample job scripts in a shared directory
- Documentation on required directives (e.g., default partition, accounting options)
- Recommended settings for specific applications
A good practice is to:
- Start from a working example provided by your site.
- Save your own minimal template scripts for common scenarios (serial, OpenMP, MPI, GPU).
- Modify copies of these templates for specific projects rather than starting from scratch.
Example template header you can adapt:
#!/bin/bash
#SBATCH --job-name=JOBNAME
#SBATCH --output=logs/%x_%j.out
#SBATCH --partition=PARTITION
#SBATCH --account=ACCOUNT
#SBATCH --time=HH:MM:SS
#SBATCH --nodes=N
#SBATCH --ntasks-per-node=T
#SBATCH --cpus-per-task=C
#SBATCH --mem=MEM
module purge
# module load ...
cd "$SLURM_SUBMIT_DIR"
# export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK
# srun ./program ...Filling in the placeholders consistently reduces errors and makes your jobs easier to manage.