Before you do anything, read this!
SLURM does not force your job to run on as many cores or with as much memory as you tell it to. Your application and SLURM know nothing of each other. Instead, the proper way to think of SLURM is that it is a tool that sets aside a certain amount of resources, such as cores and memory, for your job to run within.
An example is if you have a single core, single threaded job, such as as simplistic R job, which will run on one core, using one thread – but you allocated more than this in your submit script with #SBATCH --cpus-per-task=8
. This will assign 8 cpu cores to your single core, single threaded job. Since your application only needs one core, it will leave the remaining 7 unused and unavailable to other users.
The reverse is similar – if you have a job using many cores or workers, such as a MATLAB job using the parfor() function and say it is using 12 workers – it would then need an allocation of 12 cores. But if you did not specify --cpus-per-task
, or -c
(not defining this option will use a default allocation of --cpus-per-task=1
), or set it to 1, all 12 MATLAB processes will stack up on the one allocated CPU, resulting in waiting processes and actually making your job slower.
NOTE: The --ntasks
or -n
option is the number of tasks in your submit script. If you are executing one Rscript or R CMD BATCH line, that is one task unless it’s spawning off more than one R processes. This is true of all other applications as well (python, MATLAB, etc.). If you have just one application to execute but specify -n 4, it will execute your single task four times on four separate allocated cpus. When -n and -c are both specified they will multiply – for example, –ntasks=4 and –cpus-per-task=4 would give a total allocation of 16 cpus. In most cases you will only need to specify -c, and not -n.
Definitions
SLURM refers to cores/threads/cpus in peculiar, often un-intuitive ways. We must be careful about what words we use when describing an allocation.
Word | Meaning | #SBATCH [option] |
---|---|---|
CPU | A CPU is one single processor that should have one process/worker and no more. It has no threads, because it essentially is one by itself. | –ntasks, or -n |
Thread | A thread is the same thing as a “CPU” to SLURM. However, we must know about them if we have one application process that needs more than one thread. If that is the case, we must specify how many threads one process needs. As a result, more CPUs get allocated for that one process. | –cpus-per-task, or -c |
Task | A task is one iteration or execution of your application. If you submit a single execution of a job, for example, R CMD BATCH mycode.R, but it spawns more R processes, then each one of those processes is a task, and each should get a CPU. |
–ntasks, or -n |
Node | One whole computer in the cluster | -N |
Core | SLURM doesn’t really use the word “core” since a core actually breaks down to two “CPUs” on this cluster. This is because each thread in a CPU core is considered a “CPU” by SLURM. Since there are 24 CPUs on a node and each CPU has two threads, there appear to be 48 “CPUs” on each node. It’s usually best to forget the word “core” when dealing with SLURM and instead use “CPUs.” | –ntasks, or -n |
Memory | The amount of RAM needed by each CPU reserved for your job. Since SLURM is configured to reserve CPUs individually and not “cores,” it is required to allocate memory per CPU in megabytes. 2600M is the maximum allowed, or 2.6 gigabytes. Use only what you need. Without specifying this parameter in your submit scripts, the default amount will be allocated which is only 50 megabytes. | –mem-per-cpu=2600M |