Slurm Workload Manager (Fimm): Difference between revisions
| Line 6: | Line 6: | ||
==Commands== | ==Commands== | ||
sinfo - reports the state of partitions and nodes managed by SLURM. | sinfo - reports the state of partitions and nodes managed by SLURM. | ||
squeue - reports the state of jobs or job steps. | squeue - reports the state of jobs or job steps. | ||
Revision as of 20:23, 3 November 2015
Overview
Slurm is an open-source workload manager designed for Linux clusters of all sizes. It provides three key functions. First it allocates exclusive and/or non-exclusive access to resources (computer nodes) to users for some duration of time so they can perform work. Second, it provides a framework for starting, executing, and monitoring work (typically a parallel job) on a set of allocated nodes. Finally, it arbitrates contention for resources by managing a queue of pending work.
Commands
sinfo - reports the state of partitions and nodes managed by SLURM.
squeue - reports the state of jobs or job steps.
scontrol show partition
sbatch is used to submit a job script for later execution.
scancel is used to cancel a pending or running job or job step
srun is used to submit a job for execution or initiate job steps in real time
For more information regarding to slurm command please check man page.
man <command>
sbatch script
#!/bin/bash #SBATCH --nodes=1 #SBATCH --ntasks=1 #SBATCH --mem-per-cpu=1G #SBATCH --time=30:00 # default time is 15 minutes #SBATCH --output=my.stdout #SBATCH --mail-user=saerda@uib.no #SBATCH --mail-type=ALL #SBATCH --job-name="slurm_job" # # Put commands for executing job below this line # sleep 30 hostname
For submitting this batch scritpt
MPI program
#!/bin/bash #CPU accounting is not enforced currently. #SBATCH -A <account> #SBATCH -N 2 #use --exclusive to get the whole nodes exclusively for this job #SBATCH --exclusive #SBATCH --time=01:00:00 #SBATCH -c 2 srun -n 10 ./mpi_program
Idle queue
To efficiently use the computing resources we have set up a special "idle" queue in the cluster which includes all computing nodes - including those nodes which are normally dedicated to specific groups.
Jobs submitted to the "idle" queue will be able to run on dedicated nodes if they are free.
Important: if the dedicated nodes are needed by the groups that own them (they submit a job to them) the "idle queue"-jobs using the needed nodes will be killed and re-queued to try to run at a later time.
The "idle" queue is accessible to everyone who has an account on fimm.bccs.uib.no.
The "idle" queue gives you access to the following extra resources:
| Number of nodes | CPU type | Cores per node | Memory per node |
|---|---|---|---|
| 2 | Quad-Core Intel(R) Xeon(R) CPU E5420 @ 2.50GHz | 8 | 32GB |
| 30 | Quad-Core Intel(R) Xeon(R) CPU E5420 @ 2.50GHz | 8 | 16GB |
| 32 | Quad-Core Intel(R) Xeon(R) CPU L5430 @ 2.66GHz | 8 | 16GB |
| 12 | Six-Core AMD Opteron(tm) Processor 2431 | 12 | 32GB |
| 21 | Quad-Core Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz | 32 | 128 |
The best situation to use the "idle" queue is:
- The "default" queue is fully utilized and special queues are free.
- You have short jobs which need high resource specification.
- Your jobs are re-runnable without manual intervention. If not please set the "#PBS -r n" flag.
Please keep in mind that when you submit your job to the "idle" queue it is not guaranteed that your job will finish successfully since the owner of the hardware can "take the resources back" any time they submit a job to their specific queues.