Merge branch 'fix-checks' into 'preview'

Fix checks See merge request !342

Merge branch 'fix-checks' into 'preview'
d725159e · Taras Lazariv · a0bd3124 · e413ef32 · d725159e · d725159e
Commit d725159e authored 3 years ago by Taras Lazariv
--- a/doc.zih.tu-dresden.de/docs/archive/beegfs_on_demand.md
+++ b/doc.zih.tu-dresden.de/docs/archive/beegfs_on_demand.md
@@ -61,8 +61,8 @@ Check the status of the job with `squeue -u \<username>`.

 ## Mount BeeGFS Filesystem

-You can mount BeeGFS filesystem on the ML partition (PowerPC architecture) or on the Haswell
-[partition](../jobs_and_resources/partitions_and_limits.md) (x86_64 architecture)
+You can mount BeeGFS filesystem on the partition ml (PowerPC architecture) or on the
+partition haswell (x86_64 architecture), more information about [partitions](../jobs_and_resources/partitions_and_limits.md).

 ### Mount BeeGFS Filesystem on the Partition `ml`


--- a/doc.zih.tu-dresden.de/docs/archive/install_jupyter.md
+++ b/doc.zih.tu-dresden.de/docs/archive/install_jupyter.md
@@ -131,7 +131,7 @@ c.NotebookApp.allow_remote_access = True
 ```console
 #!/bin/bash -l
 #SBATCH --gres=gpu:1 # request GPU
-#SBATCH --partition=gpu2 # use GPU partition
+#SBATCH --partition=gpu2 # use partition GPU 2
 #SBATCH --output=notebook_output.txt
 #SBATCH --nodes=1
 #SBATCH --ntasks=1

--- a/doc.zih.tu-dresden.de/docs/archive/unicore_rest_api.md
+++ b/doc.zih.tu-dresden.de/docs/archive/unicore_rest_api.md
@@ -11,5 +11,5 @@ Most of the UNICORE features are also available using its REST API.
    * [https://sourceforge.net/p/unicore/wiki/REST_API/](https://sourceforge.net/p/unicore/wiki/REST_API/)
 * Some useful examples of job submission via REST are available at:
    * [https://sourceforge.net/p/unicore/wiki/REST_API_Examples/](https://sourceforge.net/p/unicore/wiki/REST_API_Examples/)
-* The base address for the Taurus system at the ZIH is:
-    * *unicore.zih.tu-dresden.de:8080/TAURUS/rest/core*
+* The base address for the system at the ZIH is:
+    * `unicore.zih.tu-dresden.de:8080/TAURUS/rest/core`
--- a/doc.zih.tu-dresden.de/docs/jobs_and_resources/alpha_centauri.md
+++ b/doc.zih.tu-dresden.de/docs/jobs_and_resources/alpha_centauri.md
@@ -19,7 +19,7 @@ It has 34 nodes, each with:
 ### Modules

 The easiest way is using the [module system](../software/modules.md).
-The software for the `alpha` partition is available in `modenv/hiera` module environment.
+The software for the partition alpha is available in `modenv/hiera` module environment.

 To check the available modules for `modenv/hiera`, use the command


--- a/doc.zih.tu-dresden.de/docs/jobs_and_resources/checkpoint_restart.md
+++ b/doc.zih.tu-dresden.de/docs/jobs_and_resources/checkpoint_restart.md
@@ -56,8 +56,8 @@ checkpoint/restart bits transparently to your batch script. You just have to spe
 total runtime of your calculation and the interval in which you wish to do checkpoints. The latter
 (plus the time it takes to write the checkpoint) will then be the runtime of the individual jobs.
 This should be targeted at below 24 hours in order to be able to run on all
-[haswell64 partitions](../jobs_and_resources/partitions_and_limits.md#runtime-limits). For increased
-fault-tolerance, it can be chosen even shorter.
+[partitions haswell64](../jobs_and_resources/partitions_and_limits.md#runtime-limits). For
+increased fault-tolerance, it can be chosen even shorter.

 To use it, first add a `dmtcp_launch` before your application call in your batch script. In the case
 of MPI applications, you have to add the parameters `--ib --rm` and put it between `srun` and your

--- a/doc.zih.tu-dresden.de/docs/jobs_and_resources/hardware_overview.md
+++ b/doc.zih.tu-dresden.de/docs/jobs_and_resources/hardware_overview.md
 # ZIH Systems

-ZIH systems comprises the *High Performance Computing and Storage Complex* (HRSK-II) and its
-extension *High Performance Computing – Data Analytics* (HPC-DA). In totoal it offers scientists
-about 60,000 CPU cores and a peak performance of more than 1.5 quadrillion floating point operations
-per second. The architecture specifically tailored to data-intensive computing, Big Data analytics,
-and artificial intelligence methods with extensive capabilities for energy measurement and
-performance monitoring provides ideal conditions to achieve the ambitious research goals of the
+ZIH systems comprises the *High Performance Computing and Storage Complex* and its
+extension *High Performance Computing – Data Analytics*. In total it offers scientists
+about 60,000 CPU cores and a peak performance of more than 1.5 quadrillion floating point
+operations per second. The architecture specifically tailored to data-intensive computing, Big Data
+analytics, and artificial intelligence methods with extensive capabilities for energy measurement
+and performance monitoring provides ideal conditions to achieve the ambitious research goals of the
 users and the ZIH.

 ## Login Nodes

--- a/doc.zih.tu-dresden.de/docs/jobs_and_resources/overview.md
+++ b/doc.zih.tu-dresden.de/docs/jobs_and_resources/overview.md
@@ -62,9 +62,9 @@ Normal compute nodes are perfect for this task.
 **OpenMP jobs:** SMP-parallel applications can only run **within a node**, so it is necessary to
 include the [batch system](slurm.md) options `-N 1` and `-n 1`. Using `--cpus-per-task N` Slurm will
 start one task and you will have `N` CPUs. The maximum number of processors for an SMP-parallel
-program is 896 on [partition `julia`](partitions_and_limits.md).
+program is 896 on partition `julia`, see [partitions](partitions_and_limits.md).

-**GPUs** partitions are best suited for **repetitive** and **highly-parallel** computing tasks. If
+Partitions with GPUs are best suited for **repetitive** and **highly-parallel** computing tasks. If
 you have a task with potential [data parallelism](../software/gpu_programming.md) most likely that
 you need the GPUs.  Beyond video rendering, GPUs excel in tasks such as machine learning, financial
 simulations and risk modeling. Use the partitions `gpu2` and `ml` only if you need GPUs! Otherwise

--- a/doc.zih.tu-dresden.de/docs/jobs_and_resources/sd_flex.md
+++ b/doc.zih.tu-dresden.de/docs/jobs_and_resources/sd_flex.md
 # Large Shared-Memory Node - HPE Superdome Flex

 - Hostname: `taurussmp8`
- Access to all shared file systems
+- Access to all shared filesystems
 - Slurm partition `julia`
 - 32 x Intel(R) Xeon(R) Platinum 8276M CPU @ 2.20GHz (28 cores)
 - 48 TB RAM (usable: 47 TB - one TB is used for cache coherence protocols)

--- a/doc.zih.tu-dresden.de/docs/jobs_and_resources/slurm_examples.md
+++ b/doc.zih.tu-dresden.de/docs/jobs_and_resources/slurm_examples.md
@@ -238,13 +238,13 @@ resources.
 Setting `--exclusive` **only** makes sure that there will be **no other jobs running on your nodes**.
 It does not, however, mean that you automatically get access to all the resources which the node
 might provide without explicitly requesting them, e.g. you still have to request a GPU via the
-generic resources parameter (`gres`) to run on the GPU partitions, or you still have to request all
-cores of a node if you need them. CPU cores can either to be used for a task (`--ntasks`) or for
-multi-threading within the same task (`--cpus-per-task`). Since those two options are semantically
-different (e.g., the former will influence how many MPI processes will be spawned by `srun` whereas
-the latter does not), Slurm cannot determine automatically which of the two you might want to use.
-Since we use cgroups for separation of jobs, your job is not allowed to use more resources than
-requested.*
+generic resources parameter (`gres`) to run on the partitions with GPU, or you still have to
+request all cores of a node if you need them. CPU cores can either to be used for a task
+(`--ntasks`) or for multi-threading within the same task (`--cpus-per-task`). Since those two
+options are semantically different (e.g., the former will influence how many MPI processes will be
+spawned by `srun` whereas the latter does not), Slurm cannot determine automatically which of the
+two you might want to use. Since we use cgroups for separation of jobs, your job is not allowed to
+use more resources than requested.*

 If you just want to use all available cores in a node, you have to specify how Slurm should organize
 them, like with `-p haswell -c 24` or `-p haswell --ntasks-per-node=24`.

--- a/doc.zih.tu-dresden.de/docs/jobs_and_resources/slurm_profiling.md
+++ b/doc.zih.tu-dresden.de/docs/jobs_and_resources/slurm_profiling.md
 # Job Profiling

 Slurm offers the option to gather profiling data from every task/node of the job. Analyzing this
-data allows for a better understanding of your jobs in terms of elapsed time, runtime and IO
+data allows for a better understanding of your jobs in terms of elapsed time, runtime and I/O
 behavior, and many more.

 The following data can be gathered:

--- a/doc.zih.tu-dresden.de/docs/software/big_data_frameworks_spark.md
+++ b/doc.zih.tu-dresden.de/docs/software/big_data_frameworks_spark.md
@@ -6,8 +6,8 @@

 [Apache Spark](https://spark.apache.org/), [Apache Flink](https://flink.apache.org/)
 and [Apache Hadoop](https://hadoop.apache.org/) are frameworks for processing and integrating
-Big Data. These frameworks are also offered as software [modules](modules.md) on both `ml` and
-`scs5` partition. You can check module versions and availability with the command
+Big Data. These frameworks are also offered as software [modules](modules.md) in both `ml` and
+`scs5` software environments. You can check module versions and availability with the command

 ```console
 marie@login$ module avail Spark
@@ -46,20 +46,20 @@ as via [Jupyter notebook](#jupyter-notebook). All three ways are outlined in the

 ### Default Configuration

-The Spark module is available for both `scs5` and `ml` partitions.
+The Spark module is available in both `scs5` and `ml` environments.
 Thus, Spark can be executed using different CPU architectures, e.g., Haswell and Power9.

 Let us assume that two nodes should be used for the computation. Use a
 `srun` command similar to the following to start an interactive session
-using the Haswell partition. The following code snippet shows a job submission
-to Haswell nodes with an allocation of two nodes with 60 GB main memory
+using the partition haswell. The following code snippet shows a job submission
+to haswell nodes with an allocation of two nodes with 60 GB main memory
 exclusively for one hour:

 ```console
 marie@login$ srun --partition=haswell -N 2 --mem=60g --exclusive --time=01:00:00 --pty bash -l
 ```

-The command for different resource allocation on the `ml` partition is
+The command for different resource allocation on the partition `ml` is
 similar, e. g. for a job submission to `ml` nodes with an allocation of one
 node, one task per node, two CPUs per task, one GPU per node, with 10000 MB for one hour:


--- a/doc.zih.tu-dresden.de/docs/software/distributed_training.md
+++ b/doc.zih.tu-dresden.de/docs/software/distributed_training.md
@@ -65,9 +65,9 @@ parameters: `--ntasks-per-node` -parameter to the number of GPUs you use
 per node. Also, it could be useful to increase `memomy/cpu` parameters
 if you run larger models. Memory can be set up to:

-`--mem=250000` and `--cpus-per-task=7` for the `ml` partition.
+`--mem=250000` and `--cpus-per-task=7` for the partition `ml`.

-`--mem=60000` and `--cpus-per-task=6` for the `gpu2` partition.
+`--mem=60000` and `--cpus-per-task=6` for the partition `gpu2`.

 Keep in mind that only one memory parameter (`--mem-per-cpu` = <MB> or `--mem`=<MB>) can be
 specified

--- a/doc.zih.tu-dresden.de/docs/software/machine_learning.md
+++ b/doc.zih.tu-dresden.de/docs/software/machine_learning.md
 # Machine Learning

 This is an introduction of how to run machine learning applications on ZIH systems.
-For machine learning purposes, we recommend to use the partitions [Alpha](#alpha-partition) and/or
-[ML](#ml-partition).
+For machine learning purposes, we recommend to use the partitions `alpha` and/or `ml`.

-## ML Partition
+## Partition `ml`

 The compute nodes of the partition ML are built on the base of
 [Power9 architecture](https://www.ibm.com/it-infrastructure/power/power9) from IBM. The system was created
@@ -36,7 +35,7 @@ The following have been reloaded with a version change:  1) modenv/scs5 => moden
 There are tools provided by IBM, that work on partition ML and are related to AI tasks.
 For more information see our [Power AI documentation](power_ai.md).

-## Alpha Partition
+## Partition: Alpha

 Another partition for machine learning tasks is Alpha. It is mainly dedicated to
 [ScaDS.AI](https://scads.ai/) topics. Each node on Alpha has 2x AMD EPYC CPUs, 8x NVIDIA A100-SXM4
@@ -45,7 +44,7 @@ partition in our [Alpha Centauri](../jobs_and_resources/alpha_centauri.md) docum

 ### Modules

-On the partition **Alpha** load the module environment:
+On the partition alpha load the module environment:

 ```console
 marie@alpha$ module load modenv/hiera

--- a/doc.zih.tu-dresden.de/docs/software/power_ai.md
+++ b/doc.zih.tu-dresden.de/docs/software/power_ai.md
@@ -5,7 +5,7 @@ the PowerAI Framework for Machine Learning. In the following the links
 are valid for PowerAI version 1.5.4.

 !!! warning
-    The information provided here is available from IBM and can be used on `ml` partition only!
+    The information provided here is available from IBM and can be used on partition ml only!

 ## General Overview

@@ -47,7 +47,7 @@ are valid for PowerAI version 1.5.4.
  (Open Neural Network Exchange) provides support for moving models
  between those frameworks.
 - [Distributed Deep Learning](https://www.ibm.com/support/knowledgecenter/SS5SF7_1.5.4/navigation/pai_getstarted_ddl.html?view=kc)
-  Distributed Deep Learning (DDL). Works on up to 4 nodes on `ml` partition.
+  Distributed Deep Learning (DDL). Works on up to 4 nodes on partition `ml`.

 ## PowerAI Container


--- a/doc.zih.tu-dresden.de/docs/software/pytorch.md
+++ b/doc.zih.tu-dresden.de/docs/software/pytorch.md
@@ -15,14 +15,14 @@ marie@login$ module spider pytorch

 to find out, which PyTorch modules are available on your partition.

-We recommend using **Alpha** and/or **ML** partitions when working with machine learning workflows
+We recommend using partitions alpha and/or ml when working with machine learning workflows
 and the PyTorch library.
 You can find detailed hardware specification in our
 [hardware documentation](../jobs_and_resources/hardware_overview.md).

 ## PyTorch Console

-On the **Alpha** partition, load the module environment:
+On the partition `alpha`, load the module environment:

 ```console
 marie@login$ srun -p alpha --gres=gpu:1 -n 1 -c 7 --pty --mem-per-cpu=800 bash #Job submission on alpha nodes with 1 gpu on 1 node with 800 Mb per CPU
@@ -33,8 +33,8 @@ Die folgenden Module wurden in einer anderen Version erneut geladen:
 Module GCC/10.2.0, CUDA/11.1.1, OpenMPI/4.0.5, PyTorch/1.9.0 and 54 dependencies loaded.
 ```

-??? hint "Torchvision on alpha partition"
-    On the **Alpha** partition, the module torchvision is not yet available within the module
+??? hint "Torchvision on partition `alpha`"
+    On the partition `alpha`, the module torchvision is not yet available within the module
    system. (19.08.2021)
    Torchvision can be made available by using a virtual environment:

@@ -47,7 +47,7 @@ Module GCC/10.2.0, CUDA/11.1.1, OpenMPI/4.0.5, PyTorch/1.9.0 and 54 dependencies
    Using the **--no-deps** option for "pip install" is necessary here as otherwise the PyTorch
    version might be replaced and you will run into trouble with the cuda drivers.

-On the **ML** partition:
+On the partition `ml`:

 ```console
 marie@login$ srun -p ml --gres=gpu:1 -n 1 -c 7 --pty --mem-per-cpu=800 bash    #Job submission in ml nodes with 1 gpu on 1 node with 800 Mb per CPU

--- a/doc.zih.tu-dresden.de/docs/software/virtual_machines.md
+++ b/doc.zih.tu-dresden.de/docs/software/virtual_machines.md
@@ -45,10 +45,10 @@ times till it succeeds.
 bash-4.2$ cat /tmp/marie_2759627/activate
 #!/bin/bash

-if ! grep -q -- "Key for the VM on the ml partition" "/home/rotscher/.ssh/authorized_keys" &gt;& /dev/null; then
+if ! grep -q -- "Key for the VM on the partition ml" "/home/rotscher/.ssh/authorized_keys" &gt;& /dev/null; then
  cat "/tmp/marie_2759627/kvm.pub" >> "/home/marie/.ssh/authorized_keys"
 else
-  sed -i "s|.*Key for the VM on the ml partition.*|ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQC3siZfQ6vQ6PtXPG0RPZwtJXYYFY73TwGYgM6mhKoWHvg+ZzclbBWVU0OoU42B3Ddofld7TFE8sqkHM6M+9jh8u+pYH4rPZte0irw5/27yM73M93q1FyQLQ8Rbi2hurYl5gihCEqomda7NQVQUjdUNVc6fDAvF72giaoOxNYfvqAkw8lFyStpqTHSpcOIL7pm6f76Jx+DJg98sXAXkuf9QK8MurezYVj1qFMho570tY+83ukA04qQSMEY5QeZ+MJDhF0gh8NXjX/6+YQrdh8TklPgOCmcIOI8lwnPTUUieK109ndLsUFB5H0vKL27dA2LZ3ZK+XRCENdUbpdoG2Czz Key for the VM on the ml partition|" "/home/marie/.ssh/authorized_keys"
+  sed -i "s|.*Key for the VM on the partition ml.*|ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQC3siZfQ6vQ6PtXPG0RPZwtJXYYFY73TwGYgM6mhKoWHvg+ZzclbBWVU0OoU42B3Ddofld7TFE8sqkHM6M+9jh8u+pYH4rPZte0irw5/27yM73M93q1FyQLQ8Rbi2hurYl5gihCEqomda7NQVQUjdUNVc6fDAvF72giaoOxNYfvqAkw8lFyStpqTHSpcOIL7pm6f76Jx+DJg98sXAXkuf9QK8MurezYVj1qFMho570tY+83ukA04qQSMEY5QeZ+MJDhF0gh8NXjX/6+YQrdh8TklPgOCmcIOI8lwnPTUUieK109ndLsUFB5H0vKL27dA2LZ3ZK+XRCENdUbpdoG2Czz Key for the VM on the partition ml|" "/home/marie/.ssh/authorized_keys"
 fi

 ssh -i /tmp/marie_2759627/kvm root@192.168.0.6

--- a/doc.zih.tu-dresden.de/util/grep-forbidden-words.sh
+++ b/doc.zih.tu-dresden.de/util/grep-forbidden-words.sh
@@ -15,7 +15,7 @@ basedir=`dirname "$basedir"`
 ruleset="i	\<io\>	\.io
 s	\<SLURM\>
 i	file \+system	HDFS
-i	\<taurus\>	taurus\.hrsk	/taurus
+i	\<taurus\>	taurus\.hrsk	/taurus	/TAURUS
 i	\<hrskii\>
 i	hpc[ -]\+da\>
 i	\(alpha\|ml\|haswell\|romeo\|gpu\|smp\|julia\|hpdlf\|scs5\)-\?\(interactive\)\?[^a-z]*partition