Skip to content
Snippets Groups Projects
Commit d14c38b2 authored by Martin Schroschk's avatar Martin Schroschk
Browse files

Fix typos and spelling

parent 7b503c28
No related branches found
No related tags found
4 merge requests!392Merge preview into contrib guide for browser users,!333Draft: update NGC containers,!327Merge preview into main,!317Jobs and resources
# Binding and Distribution of Tasks # Binding and Distribution of Tasks
Slurm provides several binding strategies to place and bind the tasks and/or threads of your job
to cores, sockets and nodes.
!!! note
Keep in mind that the distribution method might have a direct impact on the execution time of
your application. The manipulation of the distribution can either speed up or slow down your
application.
## General ## General
To specify a pattern the commands `--cpu_bind=<cores|sockets>` and `--distribution=<block|cyclic>` To specify a pattern the commands `--cpu_bind=<cores|sockets>` and `--distribution=<block|cyclic>`
...@@ -21,6 +30,25 @@ mind that the allocation pattern also depends on your specification. ...@@ -21,6 +30,25 @@ mind that the allocation pattern also depends on your specification.
In the following sections there are some selected examples of the combinations between `--cpu_bind` In the following sections there are some selected examples of the combinations between `--cpu_bind`
and `--distribution` for different job types. and `--distribution` for different job types.
## OpenMP Strategies
The illustration below shows the default binding of a pure OpenMP-job on a single node with 16 CPUs
on which 16 threads are allocated.
```Bash
#!/bin/bash
#SBATCH --nodes=1
#SBATCH --tasks-per-node=1
#SBATCH --cpus-per-task=16
export OMP_NUM_THREADS=16
srun --ntasks 1 --cpus-per-task $OMP_NUM_THREADS ./application
```
![OpenMP](misc/openmp.png)
{: align=center}
## MPI Strategies ## MPI Strategies
### Default Binding and Distribution Pattern ### Default Binding and Distribution Pattern
......
...@@ -30,7 +30,7 @@ Abaqus, Amber, Gaussian, GROMACS, LAMMPS, NAMD, NWChem, Quantum Espresso, STAR-C ...@@ -30,7 +30,7 @@ Abaqus, Amber, Gaussian, GROMACS, LAMMPS, NAMD, NWChem, Quantum Espresso, STAR-C
In case your program does not natively support checkpointing, there are attempts at creating generic In case your program does not natively support checkpointing, there are attempts at creating generic
checkpoint/restart solutions that should work application-agnostic. One such project which we checkpoint/restart solutions that should work application-agnostic. One such project which we
recommend is [Distributed MultiThreaded CheckPointing](http://dmtcp.sourceforge.net) (DMTCP). recommend is [Distributed Multi-Threaded Check-Pointing](http://dmtcp.sourceforge.net) (DMTCP).
DMTCP is available on ZIH systems after having loaded the `dmtcp` module DMTCP is available on ZIH systems after having loaded the `dmtcp` module
...@@ -94,7 +94,7 @@ about 2 days in total. ...@@ -94,7 +94,7 @@ about 2 days in total.
!!! Hints !!! Hints
- If you see your first job running into the timelimit, that probably - If you see your first job running into the time limit, that probably
means the timeout for writing out checkpoint files does not suffice means the timeout for writing out checkpoint files does not suffice
and should be increased. Our tests have shown that it takes and should be increased. Our tests have shown that it takes
approximately 5 minutes to write out the memory content of a fully approximately 5 minutes to write out the memory content of a fully
...@@ -104,7 +104,7 @@ about 2 days in total. ...@@ -104,7 +104,7 @@ about 2 days in total.
content is rather incompressible, it might be a good idea to disable content is rather incompressible, it might be a good idea to disable
the checkpoint file compression by setting: `export DMTCP_GZIP=0` the checkpoint file compression by setting: `export DMTCP_GZIP=0`
- Note that all jobs the script deems necessary for your chosen - Note that all jobs the script deems necessary for your chosen
timelimit/interval values are submitted right when first calling the time limit/interval values are submitted right when first calling the
script. If your applications take considerably less time than what script. If your applications take considerably less time than what
you specified, some of the individual jobs will be unnecessary. As you specified, some of the individual jobs will be unnecessary. As
soon as one job does not find a checkpoint to resume from, it will soon as one job does not find a checkpoint to resume from, it will
...@@ -124,7 +124,7 @@ What happens in your work directory? ...@@ -124,7 +124,7 @@ What happens in your work directory?
If you wish to restart manually from one of your checkpoints (e.g., if something went wrong in your If you wish to restart manually from one of your checkpoints (e.g., if something went wrong in your
later jobs or the jobs vanished from the queue for some reason), you have to call `dmtcp_sbatch` later jobs or the jobs vanished from the queue for some reason), you have to call `dmtcp_sbatch`
with the `-r, --resume` parameter, specifying a cpkt\_\* directory to resume from. Then it will use with the `-r, --resume` parameter, specifying a `cpkt_` directory to resume from. Then it will use
the same parameters as in the initial run of this job chain. If you wish to adjust the time limit, the same parameters as in the initial run of this job chain. If you wish to adjust the time limit,
for instance, because you realized that your original limit was too short, just use the `-t, --time` for instance, because you realized that your original limit was too short, just use the `-t, --time`
parameter again on resume. parameter again on resume.
...@@ -135,7 +135,7 @@ If for some reason our automatic chain job script is not suitable for your use c ...@@ -135,7 +135,7 @@ If for some reason our automatic chain job script is not suitable for your use c
just use DMTCP on its own. In the following we will give you step-by-step instructions on how to just use DMTCP on its own. In the following we will give you step-by-step instructions on how to
checkpoint your job manually: checkpoint your job manually:
* Load the dmtcp module: `module load dmtcp` * Load the DMTCP module: `module load dmtcp`
* DMTCP usually runs an additional process that * DMTCP usually runs an additional process that
manages the creation of checkpoints and such, the so-called `coordinator`. It must be started in manages the creation of checkpoints and such, the so-called `coordinator`. It must be started in
your batch script before the actual start of your application. To help you with this process, we your batch script before the actual start of your application. To help you with this process, we
...@@ -147,9 +147,9 @@ first checkpoint has been created, which can be useful if you wish to implement ...@@ -147,9 +147,9 @@ first checkpoint has been created, which can be useful if you wish to implement
chaining on your own. chaining on your own.
* In front of your program call, you have to add the wrapper * In front of your program call, you have to add the wrapper
script `dmtcp_launch`. This will create a checkpoint automatically after 40 seconds and then script `dmtcp_launch`. This will create a checkpoint automatically after 40 seconds and then
terminate your application and with it the job. If the job runs into its timelimit (here: 60 terminate your application and with it the job. If the job runs into its time limit (here: 60
seconds), the time to write out the checkpoint was probably not long enough. If all went well, you seconds), the time to write out the checkpoint was probably not long enough. If all went well, you
should find cpkt\* files in your work directory together with a script called should find `cpkt` files in your work directory together with a script called
`./dmtcp_restart_script.sh` that can be used to resume from the checkpoint. `./dmtcp_restart_script.sh` that can be used to resume from the checkpoint.
???+ example ???+ example
......
...@@ -181,11 +181,11 @@ has multiple advantages: ...@@ -181,11 +181,11 @@ has multiple advantages:
* Submit your job file to the scheduling system for later execution. In the meanwhile, you can grab * Submit your job file to the scheduling system for later execution. In the meanwhile, you can grab
a coffee and proceed with other work (,e.g., start writing a paper). a coffee and proceed with other work (,e.g., start writing a paper).
The syntax for submitting a job file to Slurm is !!! hint "The syntax for submitting a job file to Slurm is"
```console ```console
marie@login$ sbatch [options] <job_file> marie@login$ sbatch [options] <job_file>
``` ```
### Job Files ### Job Files
...@@ -367,72 +367,6 @@ marie@login$ scontrol show res=<reservation name> ...@@ -367,72 +367,6 @@ marie@login$ scontrol show res=<reservation name>
If you want to use your reservation, you have to add the parameter If you want to use your reservation, you have to add the parameter
`--reservation=<reservation name>` either in your sbatch script or to your `srun` or `salloc` command. `--reservation=<reservation name>` either in your sbatch script or to your `srun` or `salloc` command.
## Binding and Distribution of Tasks
Slurm provides several binding strategies to place and bind the tasks and/or threads of your job
to cores, sockets and nodes. Note: Keep in mind that the distribution method might have a direct
impact on the execution time of your application. The manipulation of the distribution can either
speed up or slow down your application. More detailed information about the binding can be found
[here](binding_and_distribution_of_tasks.md).
The default allocation of the tasks/threads for OpenMP, MPI and Hybrid (MPI and OpenMP) are as
follows.
### OpenMP
The illustration below shows the default binding of a pure OpenMP-job on a single node with 16 CPUs
on which 16 threads are allocated.
```Bash
#!/bin/bash
#SBATCH --nodes=1
#SBATCH --tasks-per-node=1
#SBATCH --cpus-per-task=16
export OMP_NUM_THREADS=16
srun --ntasks 1 --cpus-per-task $OMP_NUM_THREADS ./application
```
![OpenMP](misc/openmp.png)
{: align=center}
#### MPI
The illustration below shows the default binding of a pure MPI-job in which 32 global ranks are
distributed onto two nodes with 16 cores each. Each rank has one core assigned to it.
```Bash
#!/bin/bash
#SBATCH --nodes=2
#SBATCH --tasks-per-node=16
#SBATCH --cpus-per-task=1
srun --ntasks 32 ./application
```
![MPI](misc/mpi.png)
{: align=center}
#### Hybrid (MPI and OpenMP)
In the illustration below the default binding of a Hybrid-job is shown. In which eight global ranks
are distributed onto two nodes with 16 cores each. Each rank has four cores assigned to it.
```Bash
#!/bin/bash
#SBATCH --nodes=2
#SBATCH --tasks-per-node=4
#SBATCH --cpus-per-task=4
export OMP_NUM_THREADS=4
srun --ntasks 8 --cpus-per-task $OMP_NUM_THREADS ./application
```
![Hybrid MPI and OpenMP](misc/hybrid.png)
{: align=center}
## Node Features for Selective Job Submission ## Node Features for Selective Job Submission
The nodes in our HPC system are becoming more diverse in multiple aspects: hardware, mounted The nodes in our HPC system are becoming more diverse in multiple aspects: hardware, mounted
......
# Job Profiling # Job Profiling
Slurm offers the option to gather profiling data from every task/node of the job. Analysing this Slurm offers the option to gather profiling data from every task/node of the job. Analyzing this
data allows for a better understanding of your jobs in terms of walltime, runtime and IO behaviour, data allows for a better understanding of your jobs in terms of elapsed time, runtime and IO
and many more. behavior, and many more.
The following data can be gathered: The following data can be gathered:
...@@ -17,7 +17,7 @@ The data is sampled at a fixed rate (i.e. every 5 seconds) and is stored in a HD ...@@ -17,7 +17,7 @@ The data is sampled at a fixed rate (i.e. every 5 seconds) and is stored in a HD
Please be aware that the profiling data may be quiet large, depending on job size, runtime, and Please be aware that the profiling data may be quiet large, depending on job size, runtime, and
sampling rate. Always remove the local profiles from `/lustre/scratch2/profiling/${USER}`, sampling rate. Always remove the local profiles from `/lustre/scratch2/profiling/${USER}`,
either by running sh5util as shown above or by simply removing those files. either by running `sh5util` as shown above or by simply removing those files.
## Examples ## Examples
...@@ -59,4 +59,4 @@ line within your job file. ...@@ -59,4 +59,4 @@ line within your job file.
More information about profiling with Slurm: More information about profiling with Slurm:
- [Slurm Profiling](http://slurm.schedmd.com/hdf5_profile_user_guide.html) - [Slurm Profiling](http://slurm.schedmd.com/hdf5_profile_user_guide.html)
- [sh5util](http://slurm.schedmd.com/sh5util.html) - [`sh5util`](http://slurm.schedmd.com/sh5util.html)
personal_ws-1.1 en 203 personal_ws-1.1 en 203
Abaqus
Altix Altix
Amber
Amdahl's Amdahl's
analytics analytics
anonymized anonymized
...@@ -10,9 +12,12 @@ BLAS ...@@ -10,9 +12,12 @@ BLAS
broadwell broadwell
bsub bsub
bullx bullx
CCM
ccNUMA ccNUMA
centauri centauri
CentOS CentOS
cgroups
checkpointing
Chemnitz Chemnitz
citable citable
conda conda
...@@ -23,7 +28,6 @@ CSV ...@@ -23,7 +28,6 @@ CSV
CUDA CUDA
cuDNN cuDNN
CXFS CXFS
cgroups
dask dask
dataframes dataframes
DataFrames DataFrames
...@@ -33,15 +37,17 @@ DDP ...@@ -33,15 +37,17 @@ DDP
DDR DDR
DFG DFG
DistributedDataParallel DistributedDataParallel
DockerHub DMTCP
Dockerfile Dockerfile
Dockerfiles Dockerfiles
DockerHub
dockerized dockerized
EasyBuild EasyBuild
ecryptfs ecryptfs
engl engl
english english
env env
Espresso
ESSL ESSL
fastfs fastfs
FFT FFT
...@@ -51,21 +57,25 @@ filesystems ...@@ -51,21 +57,25 @@ filesystems
Flink Flink
foreach foreach
Fortran Fortran
Gaussian
GBit GBit
GFLOPS GFLOPS
gfortran gfortran
GiB GiB
gifferent gifferent
GitHub
GitLab GitLab
GitLab's GitLab's
GitHub
glibc glibc
gnuplot gnuplot
GPU GPU
GPUs GPUs
GROMACS
hadoop hadoop
haswell haswell
HDF
HDFS HDFS
HDFView
Horovod Horovod
hostname hostname
HPC HPC
...@@ -88,6 +98,7 @@ JupyterHub ...@@ -88,6 +98,7 @@ JupyterHub
JupyterLab JupyterLab
Keras Keras
KNL KNL
LAMMPS
LAPACK LAPACK
lapply lapply
LINPACK LINPACK
...@@ -117,6 +128,8 @@ mpifort ...@@ -117,6 +128,8 @@ mpifort
mpirun mpirun
multicore multicore
multithreaded multithreaded
NAMD
natively
NCCL NCCL
Neptun Neptun
NFS NFS
...@@ -126,6 +139,7 @@ NUMAlink ...@@ -126,6 +139,7 @@ NUMAlink
NumPy NumPy
Nutzungsbedingungen Nutzungsbedingungen
NVMe NVMe
NWChem
OME OME
OmniOpt OmniOpt
OPARI OPARI
...@@ -151,24 +165,26 @@ PGI ...@@ -151,24 +165,26 @@ PGI
PiB PiB
Pika Pika
pipelining pipelining
png
PMI PMI
png
PowerAI PowerAI
ppc ppc
PSOCK PSOCK
Pthreads Pthreads
pymdownx pymdownx
Quantum
queue queue
randint randint
reachability reachability
requeueing
README README
reproducibility reproducibility
requeueing
RHEL RHEL
Rmpi Rmpi
rome rome
romeo romeo
RSA RSA
RSS
RStudio RStudio
Rsync Rsync
runnable runnable
...@@ -198,8 +214,11 @@ squeue ...@@ -198,8 +214,11 @@ squeue
srun srun
ssd ssd
SSHFS SSHFS
STAR
stderr stderr
stdout stdout
subdirectories
subdirectory
SUSE SUSE
TBB TBB
TCP TCP
...@@ -218,12 +237,14 @@ uplink ...@@ -218,12 +237,14 @@ uplink
Vampir Vampir
VampirTrace VampirTrace
VampirTrace's VampirTrace's
VASP
vectorization vectorization
venv venv
virtualenv virtualenv
VirtualGL VirtualGL
VPN
VMs VMs
VMSize
VPN
WebVNC WebVNC
WinSCP WinSCP
Workdir Workdir
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment