Slurm Workload Manager (Fimm)

From HPC documentation portal
Revision as of 19:58, 3 November 2015 by Qug001 (talk | contribs) (→‎Idle queue)

Overview

Slurm is an open-source workload manager designed for Linux clusters of all sizes. It provides three key functions. First it allocates exclusive and/or non-exclusive access to resources (computer nodes) to users for some duration of time so they can perform work. Second, it provides a framework for starting, executing, and monitoring work (typically a parallel job) on a set of allocated nodes. Finally, it arbitrates contention for resources by managing a queue of pending work.

Commands

sinfo - reports the state of partitions and nodes managed by SLURM.

squeue - reports the state of jobs or job steps.

scontrol show partition

sbatch is used to submit a job script for later execution.

scancel is used to cancel a pending or running job or job step

srun is used to submit a job for execution or initiate job steps in real time

For more information regarding to slurm command please check man page.

man <command>

sbatch script

#!/bin/bash
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --mem-per-cpu=1G
#SBATCH --time=30:00     # default time is 15 minutes 
#SBATCH --output=my.stdout
#SBATCH --mail-user=saerda@uib.no
#SBATCH --mail-type=ALL
#SBATCH --job-name="slurm_job" 
#
# Put commands for executing job below this line
# 
sleep 30 
hostname


For submitting this batch scritpt

MPI program

#!/bin/bash
#CPU accounting is not enforced currently.
#SBATCH -A <account>
#SBATCH -N 2
#use --exclusive to get the whole nodes exclusively for this job
#SBATCH --exclusive
#SBATCH --time=01:00:00
#SBATCH -c 2
srun -n 10 ./mpi_program

Idle queue

To efficiently use the computing resources we have set up a special "idle" queue in the cluster which includes all computing nodes - including those nodes which are normally dedicated to specific groups.

Jobs submitted to the "idle" queue will be able to run on dedicated nodes if they are free.

Important: if the dedicated nodes are needed by the groups that own them (they submit a job to them) the "idle queue"-jobs using the needed nodes will be killed and re-queued to try to run at a later time.

The "idle" queue is accessible to everyone who has an account on fimm.bccs.uib.no.

The "idle" queue gives you access to the following extra resources:

Number of nodes CPU type Cores per node Memory per node
2 Quad-Core Intel(R) Xeon(R) CPU E5420 @ 2.50GHz 8 32GB
30 Quad-Core Intel(R) Xeon(R) CPU E5420 @ 2.50GHz 8 16GB
32 Quad-Core Intel(R) Xeon(R) CPU L5430 @ 2.66GHz 8 16GB
12 Six-Core AMD Opteron(tm) Processor 2431 12 32GB

The best situation to use the "idle" queue is:

  • The "default" queue is fully utilized and special queues are free.
  • You have short jobs which need high resource specification.
  • Your jobs are re-runnable without manual intervention. If not please set the "#PBS -r n" flag.

You can do the following to check which queues are available on fimm :

qstat -q 

The following will submit your job to the "idle" queue in interactive mode:

qsub -I -q idle 

In your PBS script you can add the following to submit your job to the "idle" queue

#PBS -q idle 

Please keep in mind that when you submit your job to the "idle" queue it is not guaranteed that your job will finish successfully since the owner of the hardware can "take the resources back" any time they submit a job to their specific queues.

Using infiniband in idle queue

We have 16 nodes with Mellanox Technologies MT25204 [InfiniHost III Lx HCA] cards connected to each other with 24 port "MT47396 Infiniscale-III Mellanox " infiniband switch . Those nodes are belong to nanobasic group.

If you are not belong to nanobasic group, the only way to access infiniband nodes are through idle queue.

One can access infiniband nodes through idle queue with following lines in your PBS script :

#PBS -l nodes=2:ppn=8:ib 

All infiniband nodes have "ib" as node futures. when your job landed on infiniband nodes , mpiexec will automatically pickup infiniband connection instead of regular ethernet connection.