Introduction to SLURM

Table of Contents

What is SLURM?

SLURM (Simple Linux Utility for Resource Management) is one of the most widely used open-source job schedulers on HPC clusters. It is responsible for:

Tracking available resources (nodes, cores, memory, GPUs, etc.)
Accepting, queuing, and prioritizing user jobs
Allocating resources for jobs
Starting and stopping jobs on compute nodes
Recording usage for accounting and reporting

On most modern clusters, interacting with SLURM is the main way you request and use compute resources.

Key ideas:

You do not run heavy workloads directly on login nodes.
You describe what you need (resources, time, etc.) to SLURM.
SLURM finds a place and time for your job and runs it there.

SLURM’s Basic Components and Terminology

You will mainly use SLURM through a set of command-line tools. Some important terms:

Job: A unit of work submitted to SLURM (batch job, interactive job, or job step).
Partition: A logical grouping of nodes (e.g., short, long, gpu), often with different limits and policies.
Node: A physical or virtual machine in the cluster.
Task: A unit of execution, often one MPI process (srun tasks).
Job ID: A unique identifier assigned to your job when submitted.
Account / Project: Identifier for billing/usage tracking; often required with --account/-A.

Core daemons (for background understanding, not something you manage):

slurmctld: Central controller, manages job queue and scheduling.
slurmd: Runs on each node, starts and stops job processes there.

Typical SLURM Workflow Overview

A minimal SLURM usage cycle looks like:

(Optional) Test interactively with SLURM:

Use srun or salloc to get an interactive shell or run a simple command.

Write a batch script:

A shell script with special #SBATCH lines describing resource requirements and job settings.

Submit the script:

sbatch my_job.sh

Monitor the job:

squeue, sacct, or site-specific tools.

Inspect results:

Check output and error files produced by the job.

The details of writing scripts, submitting, monitoring, and modifying jobs are in later chapters; here we focus on how SLURM itself is used at a basic level.

Interactive vs Batch Use of SLURM

SLURM supports two main ways of running work:

Batch jobs (non-interactive)

You prepare a script with #SBATCH directives.
You submit it with sbatch.
SLURM queues it, runs it when resources are available, and writes output to files.
Best for production runs and long simulations.

Example (just the SLURM part, not full explanation):

sbatch my_job.sh

You do not stay logged in waiting for it; SLURM handles it in the background.

Interactive jobs

You request resources and get an interactive shell or run a command directly on the compute node(s).
Useful for development, debugging, testing, or running small experiments.

Two common tools:

salloc: Allocate resources, then manually run commands inside that allocation.
srun: Run a command under SLURM’s control (within an existing allocation or creating a simple one).

Examples:

# Simple interactive shell on a compute node for 30 minutes
salloc --time=00:30:00 --ntasks=1 --mem=2G
# Run a single command directly under SLURM
srun --time=00:10:00 --ntasks=1 hostname

Core SLURM Commands You Will Encounter

These commands form the backbone of day-to-day SLURM usage. Later chapters will go into scripting and options in depth; here we introduce what each is for.

`sbatch`: Submit batch jobs

Takes a script and sends it to the queue.
Returns a job ID.

Typical pattern:

sbatch my_script.sh
Submitted batch job 123456

You’ll see this command used anytime you see “submit a job script.”

`srun`: Run programs under SLURM

Two main roles:

Launch tasks inside an existing allocation (e.g., MPI ranks, OpenMP processes).
Or create a simple allocation and run a command immediately.

Example inside a job script:

srun ./my_program

Think of srun as “start my parallel job under SLURM’s control.”

`salloc`: Request an interactive allocation

Allocates resources but does not immediately run a specific command.
You get a shell; everything you run from that shell uses the allocated resources.

Example:

salloc --nodes=1 --ntasks=4 --time=01:00:00
# Now you're on a login or allocated environment;
# you might then run:
srun ./debug_version

`scancel`: Cancel jobs

Stop a queued or running job.
You will use the job ID from sbatch or squeue.

Example:

scancel 123456

Monitoring and accounting commands

These are discussed more deeply in the monitoring chapter, but you should recognize them:

squeue: View queued and running jobs.
sacct: Show historical accounting info (once jobs have finished).
scontrol: Advanced inspection and control (cluster admins and power users).

Basic Resource Requests in SLURM

SLURM commands share a common style for requesting resources. You will see the same options with sbatch, srun, and salloc.

Some commonly used options (names may vary between sites; consult your cluster docs):

--time or -t: Wall-clock time limit, e.g. --time=01:30:00 for 1.5 hours.
--nodes or -N: Number of nodes.
--ntasks or -n: Number of parallel tasks (often MPI ranks).
--cpus-per-task: Number of CPU cores per task (often for OpenMP threads).
--mem: Memory per node, e.g. --mem=8G.
--partition or -p: Which partition/queue to use (e.g. short, long, gpu).
--gres: Generic resources (e.g., GPUs): --gres=gpu:2.
--account or -A: Which project/account to charge.

Example resource specification (not a complete script):

sbatch --time=02:00:00 --nodes=2 --ntasks=64 --mem=4G my_job.sh

The idea: you describe your resource needs, SLURM decides where and when to run your job.

Understanding Partitions and Policies at a High Level

Every SLURM installation is configured by the site administrators and can look a bit different. However, there are common ideas:

Partitions group nodes and define policies:

Limits on maximum wall time per job.
Who can use them.
Priority rules.

Examples you might see:

short (small jobs, short max time, fast turnaround)
long (larger/longer jobs)
gpu (nodes with GPUs)
debug (short time limit, for testing and debugging)

To see partitions:

sinfo

You’ll typically pick a partition that matches the problem size and runtime you expect, following local guidelines.

SLURM Job States (High-Level)

Jobs move through several states in SLURM; understanding these helps interpret squeue output:

Common states:

PD (PENDING): Waiting in the queue; not yet running.
R (RUNNING): Currently executing on compute nodes.
CG (COMPLETING): Finishing up; tasks ending, resources being released.
CA (CANCELLED): Cancelled by user or admin.
F (FAILED): Job did not complete successfully.
CD (COMPLETED): Job finished successfully.

You will see these in squeue -u $USER or in accounting output (sacct).

SLURM and Parallelism (Conceptual View)

SLURM works with the parallel programming models covered elsewhere:

MPI jobs:

You request multiple tasks (--ntasks) and use srun or mpirun (cluster-dependent) to start MPI ranks.

OpenMP or threaded jobs:

You request fewer tasks but more CPUs per task (--cpus-per-task) and set OMP_NUM_THREADS accordingly.

Hybrid jobs (MPI + OpenMP):

Combination: --ntasks for MPI ranks and --cpus-per-task for threads per rank.

SLURM itself does not implement MPI or threading; it ensures that the requested processes get the appropriate resources on the cluster.

Site-Specific Differences and Documentation

While SLURM options are largely standardized, clusters often have:

Different partition names and policies.
Different defaults for memory, cores, time limits.
Additional required options (e.g., --account, --qos).

Always:

Check your site’s user guide.
Look at example SLURM job scripts provided by your HPC center.
Use man pages for details (e.g., man sbatch, man srun).

You will combine this general SLURM knowledge with site-specific rules when you start running real jobs.

Comments

Please login to add a comment.

Don't have an account? Register now!