Slurm Workload Manager (Fimm): Difference between revisions
| Line 74: | Line 74: | ||
|- | |- | ||
|12||Six-Core AMD Opteron(tm) Processor 2431||12||32GB | |12||Six-Core AMD Opteron(tm) Processor 2431||12||32GB | ||
|- | |||
|21||Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz||32||128 | |||
|} | |} | ||
Revision as of 20:19, 3 November 2015
Overview
Slurm is an open-source workload manager designed for Linux clusters of all sizes. It provides three key functions. First it allocates exclusive and/or non-exclusive access to resources (computer nodes) to users for some duration of time so they can perform work. Second, it provides a framework for starting, executing, and monitoring work (typically a parallel job) on a set of allocated nodes. Finally, it arbitrates contention for resources by managing a queue of pending work.
Commands
sinfo - reports the state of partitions and nodes managed by SLURM.
squeue - reports the state of jobs or job steps.
scontrol show partition
sbatch is used to submit a job script for later execution.
scancel is used to cancel a pending or running job or job step
srun is used to submit a job for execution or initiate job steps in real time
For more information regarding to slurm command please check man page.
man <command>
sbatch script
#!/bin/bash #SBATCH --nodes=1 #SBATCH --ntasks=1 #SBATCH --mem-per-cpu=1G #SBATCH --time=30:00 # default time is 15 minutes #SBATCH --output=my.stdout #SBATCH --mail-user=saerda@uib.no #SBATCH --mail-type=ALL #SBATCH --job-name="slurm_job" # # Put commands for executing job below this line # sleep 30 hostname
For submitting this batch scritpt
MPI program
#!/bin/bash #CPU accounting is not enforced currently. #SBATCH -A <account> #SBATCH -N 2 #use --exclusive to get the whole nodes exclusively for this job #SBATCH --exclusive #SBATCH --time=01:00:00 #SBATCH -c 2 srun -n 10 ./mpi_program
Idle queue
To efficiently use the computing resources we have set up a special "idle" queue in the cluster which includes all computing nodes - including those nodes which are normally dedicated to specific groups.
Jobs submitted to the "idle" queue will be able to run on dedicated nodes if they are free.
Important: if the dedicated nodes are needed by the groups that own them (they submit a job to them) the "idle queue"-jobs using the needed nodes will be killed and re-queued to try to run at a later time.
The "idle" queue is accessible to everyone who has an account on fimm.bccs.uib.no.
The "idle" queue gives you access to the following extra resources:
| Number of nodes | CPU type | Cores per node | Memory per node |
|---|---|---|---|
| 2 | Quad-Core Intel(R) Xeon(R) CPU E5420 @ 2.50GHz | 8 | 32GB |
| 30 | Quad-Core Intel(R) Xeon(R) CPU E5420 @ 2.50GHz | 8 | 16GB |
| 32 | Quad-Core Intel(R) Xeon(R) CPU L5430 @ 2.66GHz | 8 | 16GB |
| 12 | Six-Core AMD Opteron(tm) Processor 2431 | 12 | 32GB |
| 21 | Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz | 32 | 128 |
The best situation to use the "idle" queue is:
- The "default" queue is fully utilized and special queues are free.
- You have short jobs which need high resource specification.
- Your jobs are re-runnable without manual intervention. If not please set the "#PBS -r n" flag.
You can do the following to check which queues are available on fimm :
qstat -q
The following will submit your job to the "idle" queue in interactive mode:
qsub -I -q idle
In your PBS script you can add the following to submit your job to the "idle" queue
#PBS -q idle
Please keep in mind that when you submit your job to the "idle" queue it is not guaranteed that your job will finish successfully since the owner of the hardware can "take the resources back" any time they submit a job to their specific queues.
Using infiniband in idle queue
We have 16 nodes with Mellanox Technologies MT25204 [InfiniHost III Lx HCA] cards connected to each other with 24 port "MT47396 Infiniscale-III Mellanox " infiniband switch . Those nodes are belong to nanobasic group.
If you are not belong to nanobasic group, the only way to access infiniband nodes are through idle queue.
One can access infiniband nodes through idle queue with following lines in your PBS script :
#PBS -l nodes=2:ppn=8:ib
All infiniband nodes have "ib" as node futures. when your job landed on infiniband nodes , mpiexec will automatically pickup infiniband connection instead of regular ethernet connection.