Kahibaro
Discord Login Register

Introduction to SLURM

What is SLURM?

SLURM (Simple Linux Utility for Resource Management) is one of the most widely used open-source job schedulers on HPC clusters. It is responsible for:

On most modern clusters, interacting with SLURM is the main way you request and use compute resources.

Key ideas:

SLURM’s Basic Components and Terminology

You will mainly use SLURM through a set of command-line tools. Some important terms:

Core daemons (for background understanding, not something you manage):

Typical SLURM Workflow Overview

A minimal SLURM usage cycle looks like:

  1. (Optional) Test interactively with SLURM:
    • Use srun or salloc to get an interactive shell or run a simple command.
  2. Write a batch script:
    • A shell script with special #SBATCH lines describing resource requirements and job settings.
  3. Submit the script:
    • sbatch my_job.sh
  4. Monitor the job:
    • squeue, sacct, or site-specific tools.
  5. Inspect results:
    • Check output and error files produced by the job.

The details of writing scripts, submitting, monitoring, and modifying jobs are in later chapters; here we focus on how SLURM itself is used at a basic level.

Interactive vs Batch Use of SLURM

SLURM supports two main ways of running work:

Batch jobs (non-interactive)

Example (just the SLURM part, not full explanation):

sbatch my_job.sh

You do not stay logged in waiting for it; SLURM handles it in the background.

Interactive jobs

Two common tools:

Examples:

# Simple interactive shell on a compute node for 30 minutes
salloc --time=00:30:00 --ntasks=1 --mem=2G
# Run a single command directly under SLURM
srun --time=00:10:00 --ntasks=1 hostname

Core SLURM Commands You Will Encounter

These commands form the backbone of day-to-day SLURM usage. Later chapters will go into scripting and options in depth; here we introduce what each is for.

`sbatch`: Submit batch jobs

Typical pattern:

sbatch my_script.sh
Submitted batch job 123456

You’ll see this command used anytime you see “submit a job script.”

`srun`: Run programs under SLURM

Two main roles:

Example inside a job script:

srun ./my_program

Think of srun as “start my parallel job under SLURM’s control.”

`salloc`: Request an interactive allocation

Example:

salloc --nodes=1 --ntasks=4 --time=01:00:00
# Now you're on a login or allocated environment;
# you might then run:
srun ./debug_version

`scancel`: Cancel jobs

Example:

scancel 123456

Monitoring and accounting commands

These are discussed more deeply in the monitoring chapter, but you should recognize them:

Basic Resource Requests in SLURM

SLURM commands share a common style for requesting resources. You will see the same options with sbatch, srun, and salloc.

Some commonly used options (names may vary between sites; consult your cluster docs):

Example resource specification (not a complete script):

sbatch --time=02:00:00 --nodes=2 --ntasks=64 --mem=4G my_job.sh

The idea: you describe your resource needs, SLURM decides where and when to run your job.

Understanding Partitions and Policies at a High Level

Every SLURM installation is configured by the site administrators and can look a bit different. However, there are common ideas:

To see partitions:

sinfo

You’ll typically pick a partition that matches the problem size and runtime you expect, following local guidelines.

SLURM Job States (High-Level)

Jobs move through several states in SLURM; understanding these helps interpret squeue output:

Common states:

You will see these in squeue -u $USER or in accounting output (sacct).

SLURM and Parallelism (Conceptual View)

SLURM works with the parallel programming models covered elsewhere:

SLURM itself does not implement MPI or threading; it ensures that the requested processes get the appropriate resources on the cluster.

Site-Specific Differences and Documentation

While SLURM options are largely standardized, clusters often have:

Always:

You will combine this general SLURM knowledge with site-specific rules when you start running real jobs.

Views: 12

Comments

Please login to add a comment.

Don't have an account? Register now!