5.5 Submitting jobs

From interactive use to batch submission

On an HPC cluster you normally do not run heavy computations directly on the login node. Instead, you:

Describe what you need (resources, time, executable, input, output) in a job script.
Submit that script to the scheduler.
The scheduler starts your job on appropriate compute nodes when resources are available.

This chapter focuses on the practical mechanics of submitting jobs, assuming you already know what job scripts are and what a scheduler like SLURM does conceptually.

Most examples below use SLURM, since it is widely deployed. Other schedulers (PBS Pro, LSF, SGE, etc.) have similar ideas but different commands and options.

Basic SLURM submission: `sbatch`

The standard way to submit a batch job script with SLURM is:

sbatch my_job.sh

Key points:

my_job.sh is a text file (usually a shell script) containing:

#!/bin/bash (or another shell) on the first line.
#SBATCH directives for resources and job options.
Commands to load modules, set up the environment, and run your application.

sbatch sends the script to the scheduler. The scheduler returns a job ID, for example:

$ sbatch my_job.sh
Submitted batch job 123456

You will use this job ID with monitoring and cancellation commands.

Where does output go?

By default, SLURM writes job output to a file in the submission directory:

Typically named slurm-<jobid>.out
Can be changed with #SBATCH -o, #SBATCH -e, etc. (configured in the job script).

The output file does not usually appear immediately; it is created when the job starts and the first output is written.

Submitting simple test jobs

Using small, fast jobs is a safe way to practice job submission.

Minimal job script example

Assume a script hello.slurm:

#!/bin/bash
#SBATCH --job-name=hello
#SBATCH --time=00:01:00
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=1
#SBATCH --output=hello_%j.out
echo "Hello from job $SLURM_JOB_ID on host $(hostname)"
sleep 10

Submit it:

sbatch hello.slurm

What to observe:

sbatch prints a job ID.
After the job starts and completes, hello_<jobid>.out appears in the directory.
Open the file to confirm your output and environment variables like $SLURM_JOB_ID.

Submitting multiple similar jobs

You can submit several independent jobs by running sbatch multiple times:

sbatch job1.slurm
sbatch job2.slurm
sbatch job3.slurm

The scheduler will queue them and run them when resources are free, respecting any site policies and priorities.

Resource requests at submission time

There are two ways to specify job options:

Inside the script with #SBATCH lines (preferred for reproducibility).
On the sbatch command line (useful for quick overrides or testing).

SLURM merges these; command-line options override script options.

Overriding options from the command line

Example: you have a script that requests 1 hour, but you want a shorter limit for a test run:

sbatch --time=00:05:00 my_job.sh

Similarly, to temporarily change the job name:

sbatch --job-name=test_run my_job.sh

Or redirect output for this submission only:

sbatch --output=log_test_%j.out my_job.sh

Common options set at submission time include:

--time=HH:MM:SS
--partition=short (or another partition/queue)
--ntasks, --cpus-per-task, --nodes
--mem=4G or --mem-per-cpu=2G
--job-name=name
--output=filename

Details of what these mean and how to size requests are covered elsewhere; here the focus is that you can pass them to sbatch when you submit.

Submission directory vs working directory

By default, SLURM uses the directory from which you run sbatch as the job’s working directory.

If your script assumes a particular working directory, you have two common options:

Always run sbatch from that directory, or
In the script, explicitly cd to the intended directory, e.g.:

    cd /path/to/project

Some clusters support an option like #SBATCH --chdir=/path/to/project to set this at submission.

Interactive job submission

Sometimes you need an interactive shell on a compute node (e.g., for debugging or exploratory runs). Many schedulers provide a way to do this.

With SLURM, typical commands are srun, salloc, or sinteractive (the last is often a site-specific wrapper).

Using `srun` for an interactive shell

Example:

srun --time=00:30:00 --ntasks=1 --cpus-per-task=4 --pty bash

Explanation of the key idea:

You request resources with options similar to sbatch.
--pty bash tells SLURM to start an interactive bash session on the allocated resources.
Once the prompt appears on the compute node, any commands you run are executing within the allocation.

Exit the shell (exit or Ctrl-D) to release the resources.

Using `salloc` for allocations

Another pattern:

salloc --time=01:00:00 --ntasks=4

When the allocation is granted, your current shell is running inside the allocation:

You can then run srun from that shell to launch tasks:

  srun ./my_parallel_program

When you exit the shell, the allocation ends.

Interactive submissions still go through the scheduler: if the cluster is busy, you may have to wait before your interactive session starts.

Job arrays: submitting many similar jobs efficiently

If you need to submit a large number of closely related jobs (e.g., sweep over parameters, different input files), job arrays are the recommended mechanism.

Instead of hundreds of separate sbatch commands, you submit a single array job that represents many tasks.

Basic array submission

Example script array_job.slurm:

#!/bin/bash
#SBATCH --job-name=array_example
#SBATCH --time=00:10:00
#SBATCH --ntasks=1
#SBATCH --array=1-10
#SBATCH --output=array_%A_%a.out
echo "Array job index: $SLURM_ARRAY_TASK_ID"

Submit it:

sbatch array_job.slurm

Key variables:

$SLURM_ARRAY_TASK_ID: index of the current array element (1–10 here).
%A: master array job ID.
%a: task index within the array.

This single submission creates 10 sub-jobs. The scheduler will run them according to cluster load and any configured concurrency limits.

Using array indices to select input

Common patterns:

Mapping indices to input files:

  INPUT_FILE=input_${SLURM_ARRAY_TASK_ID}.dat
  ./my_program "$INPUT_FILE"

Using a parameter list file:

  PARAM=$(sed -n "${SLURM_ARRAY_TASK_ID}p" params.txt)
  ./my_program --param "$PARAM"

Limiting concurrent tasks

To limit the number of array elements that can run at once, use the % notation:

#SBATCH --array=1-100%10

This means: tasks 1–100 in the array, but no more than 10 running simultaneously.

You can also specify this on the command line:

sbatch --array=1-100%10 array_job.slurm

Submission policies and local variations

Each cluster may impose site-specific rules on job submission. Common examples:

Maximum walltime per partition.
Maximum number of running jobs per user.
Maximum number of nodes/cores per job.
Restrictions on interactive vs batch usage.
Separate partitions/queues for short, long, GPU, or debug jobs.

These are enforced when you submit or when the scheduler tries to start your job. If a submission violates policy, sbatch may:

Fail immediately with an error.
Submit but the job remains in a held state until corrected (behavior depends on the site).

Always consult your site documentation for:

Preferred partitions for different workloads.
Project or account codes you may need to specify (often via options like --account or --qos).
Any mandatory flags or templates.

Practical tips for safe and effective submission

Start small, then scale up

When developing or testing:

Use short walltime (--time=00:05:00).
Use fewer cores/nodes than you ultimately expect.
Confirm:

Your job starts and completes.
Output files are as expected.
No obvious errors in the log files.

After that, increase the requested resources and time.

Use descriptive job names and outputs

Set a helpful job name: --job-name=md_sim_lipidA instead of job1.
Include the job ID in output names: --output=md_%j.out.
For arrays, include %A and %a to distinguish tasks.

This makes it much easier to connect scheduler entries to log files.

Keep submission environment simple

Your submission environment (where you run sbatch) might differ from the job environment on the compute nodes. To avoid surprises:

In the job script, explicitly:

Load necessary modules.
Set or export critical environment variables.
cd to the intended working directory.

Avoid relying on interactive shell customizations (.bashrc, .bash_profile) unless you know how they behave in batch mode.

Advanced submission patterns (brief overview)

You may encounter additional submission-related features:

Dependencies: submit jobs that start only after others complete successfully, fail, or just finish (e.g., --dependency=afterok:<jobid>).
Job hold/release: submit jobs in a held state, then release them later.
Requeueing: allow jobs to be automatically requeued under certain conditions.

These are all triggered at submission time via extra sbatch options, and covered in more detail in related chapters on workflows and job management.

Summary

Use sbatch to submit batch job scripts; it returns a job ID.
You can override script resource requests directly on the sbatch command line.
For interactive work on compute nodes, use srun or salloc to request an allocation and start a shell.
Job arrays let you efficiently submit many similar jobs in a single command.
Follow site policies and start with small test jobs before scaling up.

Comments

Please login to add a comment.

Don't have an account? Register now!