Kahibaro
Discord Login Register

5.5 Submitting jobs

From interactive use to batch submission

On an HPC cluster you normally do not run heavy computations directly on the login node. Instead, you:

  1. Describe what you need (resources, time, executable, input, output) in a job script.
  2. Submit that script to the scheduler.
  3. The scheduler starts your job on appropriate compute nodes when resources are available.

This chapter focuses on the practical mechanics of submitting jobs, assuming you already know what job scripts are and what a scheduler like SLURM does conceptually.

Most examples below use SLURM, since it is widely deployed. Other schedulers (PBS Pro, LSF, SGE, etc.) have similar ideas but different commands and options.

Basic SLURM submission: `sbatch`

The standard way to submit a batch job script with SLURM is:

sbatch my_job.sh

Key points:

$ sbatch my_job.sh
Submitted batch job 123456

You will use this job ID with monitoring and cancellation commands.

Where does output go?

By default, SLURM writes job output to a file in the submission directory:

The output file does not usually appear immediately; it is created when the job starts and the first output is written.

Submitting simple test jobs

Using small, fast jobs is a safe way to practice job submission.

Minimal job script example

Assume a script hello.slurm:

#!/bin/bash
#SBATCH --job-name=hello
#SBATCH --time=00:01:00
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=1
#SBATCH --output=hello_%j.out
echo "Hello from job $SLURM_JOB_ID on host $(hostname)"
sleep 10

Submit it:

sbatch hello.slurm

What to observe:

Submitting multiple similar jobs

You can submit several independent jobs by running sbatch multiple times:

sbatch job1.slurm
sbatch job2.slurm
sbatch job3.slurm

The scheduler will queue them and run them when resources are free, respecting any site policies and priorities.

Resource requests at submission time

There are two ways to specify job options:

  1. Inside the script with #SBATCH lines (preferred for reproducibility).
  2. On the sbatch command line (useful for quick overrides or testing).

SLURM merges these; command-line options override script options.

Overriding options from the command line

Example: you have a script that requests 1 hour, but you want a shorter limit for a test run:

sbatch --time=00:05:00 my_job.sh

Similarly, to temporarily change the job name:

sbatch --job-name=test_run my_job.sh

Or redirect output for this submission only:

sbatch --output=log_test_%j.out my_job.sh

Common options set at submission time include:

Details of what these mean and how to size requests are covered elsewhere; here the focus is that you can pass them to sbatch when you submit.

Submission directory vs working directory

By default, SLURM uses the directory from which you run sbatch as the job’s working directory.

    cd /path/to/project

Some clusters support an option like #SBATCH --chdir=/path/to/project to set this at submission.

Interactive job submission

Sometimes you need an interactive shell on a compute node (e.g., for debugging or exploratory runs). Many schedulers provide a way to do this.

With SLURM, typical commands are srun, salloc, or sinteractive (the last is often a site-specific wrapper).

Using `srun` for an interactive shell

Example:

srun --time=00:30:00 --ntasks=1 --cpus-per-task=4 --pty bash

Explanation of the key idea:

Exit the shell (exit or Ctrl-D) to release the resources.

Using `salloc` for allocations

Another pattern:

salloc --time=01:00:00 --ntasks=4

When the allocation is granted, your current shell is running inside the allocation:

  srun ./my_parallel_program

Interactive submissions still go through the scheduler: if the cluster is busy, you may have to wait before your interactive session starts.

Job arrays: submitting many similar jobs efficiently

If you need to submit a large number of closely related jobs (e.g., sweep over parameters, different input files), job arrays are the recommended mechanism.

Instead of hundreds of separate sbatch commands, you submit a single array job that represents many tasks.

Basic array submission

Example script array_job.slurm:

#!/bin/bash
#SBATCH --job-name=array_example
#SBATCH --time=00:10:00
#SBATCH --ntasks=1
#SBATCH --array=1-10
#SBATCH --output=array_%A_%a.out
echo "Array job index: $SLURM_ARRAY_TASK_ID"

Submit it:

sbatch array_job.slurm

Key variables:

This single submission creates 10 sub-jobs. The scheduler will run them according to cluster load and any configured concurrency limits.

Using array indices to select input

Common patterns:

  INPUT_FILE=input_${SLURM_ARRAY_TASK_ID}.dat
  ./my_program "$INPUT_FILE"
  PARAM=$(sed -n "${SLURM_ARRAY_TASK_ID}p" params.txt)
  ./my_program --param "$PARAM"

Limiting concurrent tasks

To limit the number of array elements that can run at once, use the % notation:

#SBATCH --array=1-100%10

This means: tasks 1–100 in the array, but no more than 10 running simultaneously.

You can also specify this on the command line:

sbatch --array=1-100%10 array_job.slurm

Submission policies and local variations

Each cluster may impose site-specific rules on job submission. Common examples:

These are enforced when you submit or when the scheduler tries to start your job. If a submission violates policy, sbatch may:

Always consult your site documentation for:

Practical tips for safe and effective submission

Start small, then scale up

When developing or testing:

  1. Use short walltime (--time=00:05:00).
  2. Use fewer cores/nodes than you ultimately expect.
  3. Confirm:
    • Your job starts and completes.
    • Output files are as expected.
    • No obvious errors in the log files.

After that, increase the requested resources and time.

Use descriptive job names and outputs

This makes it much easier to connect scheduler entries to log files.

Keep submission environment simple

Your submission environment (where you run sbatch) might differ from the job environment on the compute nodes. To avoid surprises:

Advanced submission patterns (brief overview)

You may encounter additional submission-related features:

These are all triggered at submission time via extra sbatch options, and covered in more detail in related chapters on workflows and job management.

Summary

Views: 40

Comments

Please login to add a comment.

Don't have an account? Register now!