Fix typos and spelling

d14c38b2 · Martin Schroschk · 7b503c28 · d14c38b2 · d14c38b2 · d14c38b2
Commit d14c38b2 authored 3 years ago by Martin Schroschk
--- a/doc.zih.tu-dresden.de/docs/jobs_and_resources/binding_and_distribution_of_tasks.md
+++ b/doc.zih.tu-dresden.de/docs/jobs_and_resources/binding_and_distribution_of_tasks.md
 # Binding and Distribution of Tasks

+Slurm provides several binding strategies to place and bind the tasks and/or threads of your job
+to cores, sockets and nodes.
+
+!!! note
+
+    Keep in mind that the distribution method might have a direct impact on the execution time of
+    your application. The manipulation of the distribution can either speed up or slow down your
+    application.
+
 ## General

 To specify a pattern the commands `--cpu_bind=<cores|sockets>` and `--distribution=<block|cyclic>`
@@ -21,6 +30,25 @@ mind that the allocation pattern also depends on your specification.
 In the following sections there are some selected examples of the combinations between `--cpu_bind`
 and `--distribution` for different job types.

+## OpenMP Strategies
+
+The illustration below shows the default binding of a pure OpenMP-job on a single node with 16 CPUs
+on which 16 threads are allocated.
+
+```Bash
+#!/bin/bash
+#SBATCH --nodes=1
+#SBATCH --tasks-per-node=1
+#SBATCH --cpus-per-task=16
+
+export OMP_NUM_THREADS=16
+
+srun --ntasks 1 --cpus-per-task $OMP_NUM_THREADS ./application
+```
+
+![OpenMP](misc/openmp.png)
+{: align=center}
+
 ## MPI Strategies

 ### Default Binding and Distribution Pattern

--- a/doc.zih.tu-dresden.de/docs/jobs_and_resources/checkpoint_restart.md
+++ b/doc.zih.tu-dresden.de/docs/jobs_and_resources/checkpoint_restart.md
@@ -30,7 +30,7 @@ Abaqus, Amber, Gaussian, GROMACS, LAMMPS, NAMD, NWChem, Quantum Espresso, STAR-C

 In case your program does not natively support checkpointing, there are attempts at creating generic
 checkpoint/restart solutions that should work application-agnostic. One such project which we
-recommend is [Distributed MultiThreaded CheckPointing](http://dmtcp.sourceforge.net) (DMTCP).
+recommend is [Distributed Multi-Threaded Check-Pointing](http://dmtcp.sourceforge.net) (DMTCP).

 DMTCP is available on ZIH systems after having loaded the `dmtcp` module

@@ -94,7 +94,7 @@ about 2 days in total.

 !!! Hints

-    - If you see your first job running into the timelimit, that probably
+    - If you see your first job running into the time limit, that probably
    means the timeout for writing out checkpoint files does not suffice
    and should be increased. Our tests have shown that it takes
    approximately 5 minutes to write out the memory content of a fully
@@ -104,7 +104,7 @@ about 2 days in total.
    content is rather incompressible, it might be a good idea to disable
    the checkpoint file compression by setting: `export DMTCP_GZIP=0`
    - Note that all jobs the script deems necessary for your chosen
-    timelimit/interval values are submitted right when first calling the
+    time limit/interval values are submitted right when first calling the
    script. If your applications take considerably less time than what
    you specified, some of the individual jobs will be unnecessary. As
    soon as one job does not find a checkpoint to resume from, it will
@@ -124,7 +124,7 @@ What happens in your work directory?

 If you wish to restart manually from one of your checkpoints (e.g., if something went wrong in your
 later jobs or the jobs vanished from the queue for some reason), you have to call `dmtcp_sbatch`
-with the `-r, --resume` parameter, specifying a cpkt\_\* directory to resume from.  Then it will use
+with the `-r, --resume` parameter, specifying a `cpkt_` directory to resume from.  Then it will use
 the same parameters as in the initial run of this job chain. If you wish to adjust the time limit,
 for instance, because you realized that your original limit was too short, just use the `-t, --time`
 parameter again on resume.
@@ -135,7 +135,7 @@ If for some reason our automatic chain job script is not suitable for your use c
 just use DMTCP on its own. In the following we will give you step-by-step instructions on how to
 checkpoint your job manually:

-* Load the dmtcp module: `module load dmtcp`
+* Load the DMTCP module: `module load dmtcp`
 * DMTCP usually runs an additional process that
 manages the creation of checkpoints and such, the so-called `coordinator`. It must be started in
 your batch script before the actual start of your application. To help you with this process, we
@@ -147,9 +147,9 @@ first checkpoint has been created, which can be useful if you wish to implement
 chaining on your own.
 * In front of your program call, you have to add the wrapper
 script `dmtcp_launch`.  This will create a checkpoint automatically after 40 seconds and then
-terminate your application and with it the job. If the job runs into its timelimit (here: 60
+terminate your application and with it the job. If the job runs into its time limit (here: 60
 seconds), the time to write out the checkpoint was probably not long enough. If all went well, you
-should find cpkt\* files in your work directory together with a script called
+should find `cpkt` files in your work directory together with a script called
 `./dmtcp_restart_script.sh` that can be used to resume from the checkpoint.

 ???+ example

--- a/doc.zih.tu-dresden.de/docs/jobs_and_resources/slurm.md
+++ b/doc.zih.tu-dresden.de/docs/jobs_and_resources/slurm.md
@@ -181,11 +181,11 @@ has multiple advantages:
 * Submit your job file to the scheduling system for later execution. In the meanwhile, you can grab
  a coffee and proceed with other work (,e.g., start writing a paper).

-The syntax for submitting a job file to Slurm is
+!!! hint "The syntax for submitting a job file to Slurm is"

-```console
-marie@login$ sbatch [options] <job_file>
-```
+    ```console
+    marie@login$ sbatch [options] <job_file>
+    ```

 ### Job Files

@@ -367,72 +367,6 @@ marie@login$ scontrol show res=<reservation name>
 If you want to use your reservation, you have to add the parameter
 `--reservation=<reservation name>` either in your sbatch script or to your `srun` or `salloc` command.

-## Binding and Distribution of Tasks
-
-Slurm provides several binding strategies to place and bind the tasks and/or threads of your job
-to cores, sockets and nodes. Note: Keep in mind that the distribution method might have a direct
-impact on the execution time of your application. The manipulation of the distribution can either
-speed up or slow down your application. More detailed information about the binding can be found
-[here](binding_and_distribution_of_tasks.md).
-
-The default allocation of the tasks/threads for OpenMP, MPI and Hybrid (MPI and OpenMP) are as
-follows.
-
-### OpenMP
-
-The illustration below shows the default binding of a pure OpenMP-job on a single node with 16 CPUs
-on which 16 threads are allocated.
-
-```Bash
-#!/bin/bash
-#SBATCH --nodes=1
-#SBATCH --tasks-per-node=1
-#SBATCH --cpus-per-task=16
-
-export OMP_NUM_THREADS=16
-
-srun --ntasks 1 --cpus-per-task $OMP_NUM_THREADS ./application
-```
-
-![OpenMP](misc/openmp.png)
-{: align=center}
-
-#### MPI
-
-The illustration below shows the default binding of a pure MPI-job in which 32 global ranks are
-distributed onto two nodes with 16 cores each. Each rank has one core assigned to it.
-
-```Bash
-#!/bin/bash
-#SBATCH --nodes=2
-#SBATCH --tasks-per-node=16
-#SBATCH --cpus-per-task=1
-
-srun --ntasks 32 ./application
-```
-
-![MPI](misc/mpi.png)
-{: align=center}
-
-#### Hybrid (MPI and OpenMP)
-
-In the illustration below the default binding of a Hybrid-job is shown. In which eight global ranks
-are distributed onto two nodes with 16 cores each. Each rank has four cores assigned to it.
-
-```Bash
-#!/bin/bash
-#SBATCH --nodes=2
-#SBATCH --tasks-per-node=4
-#SBATCH --cpus-per-task=4
-
-export OMP_NUM_THREADS=4
-
-srun --ntasks 8 --cpus-per-task $OMP_NUM_THREADS ./application
-```
-
-![Hybrid MPI and OpenMP](misc/hybrid.png)
-{: align=center}
-
 ## Node Features for Selective Job Submission

 The nodes in our HPC system are becoming more diverse in multiple aspects: hardware, mounted

--- a/doc.zih.tu-dresden.de/docs/jobs_and_resources/slurm_profiling.md
+++ b/doc.zih.tu-dresden.de/docs/jobs_and_resources/slurm_profiling.md
 # Job Profiling

-Slurm offers the option to gather profiling data from every task/node of the job. Analysing this
-data allows for a better understanding of your jobs in terms of walltime, runtime and IO behaviour,
-and many more.
+Slurm offers the option to gather profiling data from every task/node of the job. Analyzing this
+data allows for a better understanding of your jobs in terms of elapsed time, runtime and IO
+behavior, and many more.

 The following data can be gathered:

@@ -17,7 +17,7 @@ The data is sampled at a fixed rate (i.e. every 5 seconds) and is stored in a HD

    Please be aware that the profiling data may be quiet large, depending on job size, runtime, and
    sampling rate. Always remove the local profiles from `/lustre/scratch2/profiling/${USER}`,
-    either by running sh5util as shown above or by simply removing those files.
+    either by running `sh5util` as shown above or by simply removing those files.

 ## Examples

@@ -59,4 +59,4 @@ line within your job file.
 More information about profiling with Slurm:

 - [Slurm Profiling](http://slurm.schedmd.com/hdf5_profile_user_guide.html)
- [sh5util](http://slurm.schedmd.com/sh5util.html)
+- [`sh5util`](http://slurm.schedmd.com/sh5util.html)
--- a/doc.zih.tu-dresden.de/wordlist.aspell
+++ b/doc.zih.tu-dresden.de/wordlist.aspell
 personal_ws-1.1 en 203 
+Abaqus
 Altix
+Amber
 Amdahl's
 analytics
 anonymized
@@ -10,9 +12,12 @@ BLAS
 broadwell
 bsub
 bullx
+CCM
 ccNUMA
 centauri
 CentOS
+cgroups
+checkpointing
 Chemnitz
 citable
 conda
@@ -23,7 +28,6 @@ CSV
 CUDA
 cuDNN
 CXFS
-cgroups
 dask
 dataframes
 DataFrames
@@ -33,15 +37,17 @@ DDP
 DDR
 DFG
 DistributedDataParallel
-DockerHub
+DMTCP
 Dockerfile
 Dockerfiles
+DockerHub
 dockerized
 EasyBuild
 ecryptfs
 engl
 english
 env
+Espresso
 ESSL
 fastfs
 FFT
@@ -51,21 +57,25 @@ filesystems
 Flink
 foreach
 Fortran
+Gaussian
 GBit
 GFLOPS
 gfortran
 GiB
 gifferent
+GitHub
 GitLab
 GitLab's
-GitHub
 glibc
 gnuplot
 GPU
 GPUs
+GROMACS
 hadoop
 haswell
+HDF
 HDFS
+HDFView
 Horovod
 hostname
 HPC
@@ -88,6 +98,7 @@ JupyterHub
 JupyterLab
 Keras
 KNL
+LAMMPS
 LAPACK
 lapply
 LINPACK
@@ -117,6 +128,8 @@ mpifort
 mpirun
 multicore
 multithreaded
+NAMD
+natively
 NCCL
 Neptun
 NFS
@@ -126,6 +139,7 @@ NUMAlink
 NumPy
 Nutzungsbedingungen
 NVMe
+NWChem
 OME
 OmniOpt
 OPARI
@@ -151,24 +165,26 @@ PGI
 PiB
 Pika
 pipelining
-png
 PMI
+png
 PowerAI
 ppc
 PSOCK
 Pthreads
 pymdownx
+Quantum
 queue
 randint
 reachability
-requeueing
 README
 reproducibility
+requeueing
 RHEL
 Rmpi
 rome
 romeo
 RSA
+RSS
 RStudio
 Rsync
 runnable
@@ -198,8 +214,11 @@ squeue
 srun
 ssd
 SSHFS
+STAR
 stderr
 stdout
+subdirectories
+subdirectory
 SUSE
 TBB
 TCP
@@ -218,12 +237,14 @@ uplink
 Vampir
 VampirTrace
 VampirTrace's
+VASP
 vectorization
 venv
 virtualenv
 VirtualGL
-VPN
 VMs
+VMSize
+VPN
 WebVNC
 WinSCP
 Workdir