Slurm Workload Manager (Fimm)

From HPC documentation portal

Overview

Slurm is an open-source workload manager designed for Linux clusters of all sizes. It provides three key functions. First it allocates exclusive and/or non-exclusive access to resources (computer nodes) to users for some duration of time so they can perform work. Second, it provides a framework for starting, executing, and monitoring work (typically a parallel job) on a set of allocated nodes. Finally, it arbitrates contention for resources by managing a queue of pending work.

Commands

sinfo - reports the state of partitions and nodes managed by SLURM.

squeue - reports the state of jobs or job steps.

scontrol show partition

sbatch is used to submit a job script for later execution.

scancel is used to cancel a pending or running job or job step

srun is used to submit a job for execution or initiate job steps in real time

For more information regarding to slurm command please check man page.

man <command>

sbatch script

#!/bin/bash
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --mem-per-cpu=1G
#SBATCH --time=30:00     # default time is 15 minutes 
#SBATCH --output=my.stdout
#SBATCH --mail-user=saerda@uib.no
#SBATCH --mail-type=ALL
#SBATCH --job-name="slurm_job" 
#
# Put commands for executing job below this line
# 
sleep 30 
hostname

MPI program

#!/bin/bash
#CPU accounting is not enforced currently.
#SBATCH -A <account>
#SBATCH -N 2
#use --exclusive to get the whole nodes exclusively for this job
#SBATCH --exclusive
#SBATCH --time=01:00:00
#SBATCH -c 2
srun -n 10 ./mpi_program