Table of Contents
From interactive use to batch submission
On an HPC cluster you normally do not run heavy computations directly on the login node. Instead, you:
- Describe what you need (resources, time, executable, input, output) in a job script.
- Submit that script to the scheduler.
- The scheduler starts your job on appropriate compute nodes when resources are available.
This chapter focuses on the practical mechanics of submitting jobs, assuming you already know what job scripts are and what a scheduler like SLURM does conceptually.
Most examples below use SLURM, since it is widely deployed. Other schedulers (PBS Pro, LSF, SGE, etc.) have similar ideas but different commands and options.
Basic SLURM submission: `sbatch`
The standard way to submit a batch job script with SLURM is:
sbatch my_job.shKey points:
my_job.shis a text file (usually a shell script) containing:#!/bin/bash(or another shell) on the first line.#SBATCHdirectives for resources and job options.- Commands to load modules, set up the environment, and run your application.
sbatchsends the script to the scheduler. The scheduler returns a job ID, for example:
$ sbatch my_job.sh
Submitted batch job 123456You will use this job ID with monitoring and cancellation commands.
Where does output go?
By default, SLURM writes job output to a file in the submission directory:
- Typically named
slurm-<jobid>.out - Can be changed with
#SBATCH -o,#SBATCH -e, etc. (configured in the job script).
The output file does not usually appear immediately; it is created when the job starts and the first output is written.
Submitting simple test jobs
Using small, fast jobs is a safe way to practice job submission.
Minimal job script example
Assume a script hello.slurm:
#!/bin/bash
#SBATCH --job-name=hello
#SBATCH --time=00:01:00
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=1
#SBATCH --output=hello_%j.out
echo "Hello from job $SLURM_JOB_ID on host $(hostname)"
sleep 10Submit it:
sbatch hello.slurmWhat to observe:
sbatchprints a job ID.- After the job starts and completes,
hello_<jobid>.outappears in the directory. - Open the file to confirm your output and environment variables like
$SLURM_JOB_ID.
Submitting multiple similar jobs
You can submit several independent jobs by running sbatch multiple times:
sbatch job1.slurm
sbatch job2.slurm
sbatch job3.slurmThe scheduler will queue them and run them when resources are free, respecting any site policies and priorities.
Resource requests at submission time
There are two ways to specify job options:
- Inside the script with
#SBATCHlines (preferred for reproducibility). - On the
sbatchcommand line (useful for quick overrides or testing).
SLURM merges these; command-line options override script options.
Overriding options from the command line
Example: you have a script that requests 1 hour, but you want a shorter limit for a test run:
sbatch --time=00:05:00 my_job.shSimilarly, to temporarily change the job name:
sbatch --job-name=test_run my_job.shOr redirect output for this submission only:
sbatch --output=log_test_%j.out my_job.shCommon options set at submission time include:
--time=HH:MM:SS--partition=short(or another partition/queue)--ntasks,--cpus-per-task,--nodes--mem=4Gor--mem-per-cpu=2G--job-name=name--output=filename
Details of what these mean and how to size requests are covered elsewhere; here the focus is that you can pass them to sbatch when you submit.
Submission directory vs working directory
By default, SLURM uses the directory from which you run sbatch as the job’s working directory.
- If your script assumes a particular working directory, you have two common options:
- Always run
sbatchfrom that directory, or - In the script, explicitly
cdto the intended directory, e.g.:
cd /path/to/project
Some clusters support an option like #SBATCH --chdir=/path/to/project to set this at submission.
Interactive job submission
Sometimes you need an interactive shell on a compute node (e.g., for debugging or exploratory runs). Many schedulers provide a way to do this.
With SLURM, typical commands are srun, salloc, or sinteractive (the last is often a site-specific wrapper).
Using `srun` for an interactive shell
Example:
srun --time=00:30:00 --ntasks=1 --cpus-per-task=4 --pty bashExplanation of the key idea:
- You request resources with options similar to
sbatch. --pty bashtells SLURM to start an interactive bash session on the allocated resources.- Once the prompt appears on the compute node, any commands you run are executing within the allocation.
Exit the shell (exit or Ctrl-D) to release the resources.
Using `salloc` for allocations
Another pattern:
salloc --time=01:00:00 --ntasks=4When the allocation is granted, your current shell is running inside the allocation:
- You can then run
srunfrom that shell to launch tasks:
srun ./my_parallel_program- When you exit the shell, the allocation ends.
Interactive submissions still go through the scheduler: if the cluster is busy, you may have to wait before your interactive session starts.
Job arrays: submitting many similar jobs efficiently
If you need to submit a large number of closely related jobs (e.g., sweep over parameters, different input files), job arrays are the recommended mechanism.
Instead of hundreds of separate sbatch commands, you submit a single array job that represents many tasks.
Basic array submission
Example script array_job.slurm:
#!/bin/bash
#SBATCH --job-name=array_example
#SBATCH --time=00:10:00
#SBATCH --ntasks=1
#SBATCH --array=1-10
#SBATCH --output=array_%A_%a.out
echo "Array job index: $SLURM_ARRAY_TASK_ID"Submit it:
sbatch array_job.slurmKey variables:
$SLURM_ARRAY_TASK_ID: index of the current array element (1–10 here).%A: master array job ID.%a: task index within the array.
This single submission creates 10 sub-jobs. The scheduler will run them according to cluster load and any configured concurrency limits.
Using array indices to select input
Common patterns:
- Mapping indices to input files:
INPUT_FILE=input_${SLURM_ARRAY_TASK_ID}.dat
./my_program "$INPUT_FILE"- Using a parameter list file:
PARAM=$(sed -n "${SLURM_ARRAY_TASK_ID}p" params.txt)
./my_program --param "$PARAM"Limiting concurrent tasks
To limit the number of array elements that can run at once, use the % notation:
#SBATCH --array=1-100%10This means: tasks 1–100 in the array, but no more than 10 running simultaneously.
You can also specify this on the command line:
sbatch --array=1-100%10 array_job.slurmSubmission policies and local variations
Each cluster may impose site-specific rules on job submission. Common examples:
- Maximum walltime per partition.
- Maximum number of running jobs per user.
- Maximum number of nodes/cores per job.
- Restrictions on interactive vs batch usage.
- Separate partitions/queues for short, long, GPU, or debug jobs.
These are enforced when you submit or when the scheduler tries to start your job. If a submission violates policy, sbatch may:
- Fail immediately with an error.
- Submit but the job remains in a held state until corrected (behavior depends on the site).
Always consult your site documentation for:
- Preferred partitions for different workloads.
- Project or account codes you may need to specify (often via options like
--accountor--qos). - Any mandatory flags or templates.
Practical tips for safe and effective submission
Start small, then scale up
When developing or testing:
- Use short walltime (
--time=00:05:00). - Use fewer cores/nodes than you ultimately expect.
- Confirm:
- Your job starts and completes.
- Output files are as expected.
- No obvious errors in the log files.
After that, increase the requested resources and time.
Use descriptive job names and outputs
- Set a helpful job name:
--job-name=md_sim_lipidAinstead ofjob1. - Include the job ID in output names:
--output=md_%j.out. - For arrays, include
%Aand%ato distinguish tasks.
This makes it much easier to connect scheduler entries to log files.
Keep submission environment simple
Your submission environment (where you run sbatch) might differ from the job environment on the compute nodes. To avoid surprises:
- In the job script, explicitly:
- Load necessary modules.
- Set or export critical environment variables.
cdto the intended working directory.- Avoid relying on interactive shell customizations (
.bashrc,.bash_profile) unless you know how they behave in batch mode.
Advanced submission patterns (brief overview)
You may encounter additional submission-related features:
- Dependencies: submit jobs that start only after others complete successfully, fail, or just finish (e.g.,
--dependency=afterok:<jobid>). - Job hold/release: submit jobs in a held state, then release them later.
- Requeueing: allow jobs to be automatically requeued under certain conditions.
These are all triggered at submission time via extra sbatch options, and covered in more detail in related chapters on workflows and job management.
Summary
- Use
sbatchto submit batch job scripts; it returns a job ID. - You can override script resource requests directly on the
sbatchcommand line. - For interactive work on compute nodes, use
srunorsallocto request an allocation and start a shell. - Job arrays let you efficiently submit many similar jobs in a single command.
- Follow site policies and start with small test jobs before scaling up.