From 0431fdc3554c85277e53f6d70ab02260cdcacec0 Mon Sep 17 00:00:00 2001 From: Martin Schroschk <martin.schroschk@tu-dresden.de> Date: Thu, 2 May 2024 13:28:16 +0200 Subject: [PATCH] Review section on exclusive jobs - Tighten desc. - Remove "partitions" --- .../docs/jobs_and_resources/slurm_examples.md | 48 +++++++++---------- 1 file changed, 24 insertions(+), 24 deletions(-) diff --git a/doc.zih.tu-dresden.de/docs/jobs_and_resources/slurm_examples.md b/doc.zih.tu-dresden.de/docs/jobs_and_resources/slurm_examples.md index c28932a8d..2f103faa7 100644 --- a/doc.zih.tu-dresden.de/docs/jobs_and_resources/slurm_examples.md +++ b/doc.zih.tu-dresden.de/docs/jobs_and_resources/slurm_examples.md @@ -253,40 +253,40 @@ enough resources in total were specified in the header of the batch script. ## Exclusive Jobs for Benchmarking -Jobs ZIH systems run, by default, in shared-mode, meaning that multiple jobs (from different users) -can run at the same time on the same compute node. Sometimes, this behavior is not desired (e.g. -for benchmarking purposes). Thus, the Slurm parameter `--exclusive` request for exclusive usage of -resources. - -Setting `--exclusive` **only** makes sure that there will be **no other jobs running on your nodes**. -It does not, however, mean that you automatically get access to all the resources which the node -might provide without explicitly requesting them, e.g. you still have to request a GPU via the -generic resources parameter (`gres`) to run on the partitions with GPU, or you still have to -request all cores of a node if you need them. CPU cores can either to be used for a task -(`--ntasks`) or for multi-threading within the same task (`--cpus-per-task`). Since those two -options are semantically different (e.g., the former will influence how many MPI processes will be -spawned by `srun` whereas the latter does not), Slurm cannot determine automatically which of the -two you might want to use. Since we use cgroups for separation of jobs, your job is not allowed to -use more resources than requested.* - -If you just want to use all available cores in a node, you have to specify how Slurm should organize -them, like with `--partition=haswell --cpus-per-tasks=24` or `--partition=haswell --ntasks-per-node=24`. +Jobs on ZIH systems run, by default, in shared-mode, meaning that multiple jobs (from different +users) can run at the same time on the same compute node. Sometimes, this behavior is not desired +(e.g. for benchmarking purposes). You can request for exclusive usage of resources using the Slurm +parameter `--exclusive`. + +!!! note "Exclusive does not allocate all available resources" + + Setting `--exclusive` **only** makes sure that there will be **no other jobs running on your + nodes**. It does not, however, mean that you automatically get access to all the resources + which the node might provide without explicitly requesting them. + + E.g. you still have to request for a GPU via the generic resources parameter (`gres`) on the GPU + cluster. On the other hand, you also have to request all cores of a node if you need them. + +CPU cores can either to be used for a task (`--ntasks`) or for multi-threading within the same task +(`--cpus-per-task`). Since those two options are semantically different (e.g., the former will +influence how many MPI processes will be spawned by `srun` whereas the latter does not), Slurm +cannot determine automatically which of the two you might want to use. Since we use cgroups for +separation of jobs, your job is not allowed to use more resources than requested. Here is a short example to ensure that a benchmark is not spoiled by other jobs, even if it doesn't -use up all resources in the nodes: +use up all resources of the nodes: -!!! example "Exclusive resources" +!!! example "Job file with exclusive resources" ```Bash #!/bin/bash - #SBATCH --partition=haswell #SBATCH --nodes=2 #SBATCH --ntasks-per-node=2 #SBATCH --cpus-per-task=8 - #SBATCH --exclusive # ensure that nobody spoils my measurement on 2 x 2 x 8 cores + #SBATCH --exclusive # ensure that nobody spoils my measurement on 2 x 2 x 8 cores #SBATCH --time=00:10:00 - #SBATCH --job-name=Benchmark - #SBATCH --mail-type=end + #SBATCH --job-name=benchmark + #SBATCH --mail-type=start,end #SBATCH --mail-user=<your.email>@tu-dresden.de srun ./my_benchmark -- GitLab