Brief review w.r.t. markdown

c5713b05 · Martin Schroschk · c696d145 · c5713b05 · c5713b05
Commit c5713b05 authored 3 years ago by Martin Schroschk
--- a/doc.zih.tu-dresden.de/docs/jobs_and_resources/rome_nodes.md
+++ b/doc.zih.tu-dresden.de/docs/jobs_and_resources/rome_nodes.md
@@ -2,50 +2,48 @@
 ## Hardware
- Slurm partiton: romeo
+- Slurm partition: `romeo`
- Module architecture: rome
+- Module architecture: `rome`
- 192 nodes taurusi[7001-7192], each:
+- 192 nodes `taurusi[7001-7192]`, each:
-    - 2x AMD EPYC CPU 7702 (64 cores) @ 2.0GHz, MultiThreading
+    - 2x AMD EPYC CPU 7702 (64 cores) @ 2.0GHz, Simultaneous Multithreading (SMT)
    - 512 GB RAM
-    - 200 GB SSD disk mounted on /tmp
+    - 200 GB SSD disk mounted on `/tmp`
 ## Usage
-There is a total of 128 physical cores in each
+There is a total of 128 physical cores in each node. SMT is also active, so in total, 256 logical
-node. SMT is also active, so in total, 256 logical cores are available
+cores are available per node.
-per node.
 !!! note
-    Multithreading is disabled per default in a job. To make use of it
-    include the Slurm parameter `--hint=multithread` in your job script
-    or command line, or set
-    the environment variable `SLURM_HINT=multithread` before job submission.
-Each node brings 512 GB of main memory, so you can request roughly
+    Multithreading is disabled per default in a job. To make use of it include the Slurm parameter
-1972MB per logical core (using --mem-per-cpu). Note that you will always
+    `--hint=multithread` in your job script or command line, or set the environment variable
-get the memory for the logical core sibling too, even if you do not
+    `SLURM_HINT=multithread` before job submission.
-intend to use SMT.
+Each node brings 512 GB of main memory, so you can request roughly 1972 MB per logical core (using
+`--mem-per-cpu`). Note that you will always get the memory for the logical core sibling too, even if
+you do not intend to use SMT.
 !!! note
-    If you are running a job here with only ONE process (maybe
-    multiple cores), please explicitly set the option `-n 1` !
-Be aware that software built with Intel compilers and `-x*` optimization
+    If you are running a job here with only ONE process (maybe multiple cores), please explicitly
-flags will not run on those AMD processors! That's why most older
+    set the option `-n 1`!
-modules built with intel toolchains are not available on **romeo**.
+Be aware that software built with Intel compilers and `-x*` optimization flags will not run on those
+AMD processors! That's why most older modules built with Intel toolchains are not available on
+partition `romeo`.
-We provide the script: `ml_arch_avail` that you can use to check if a
+We provide the script `ml_arch_avail` that can be used to check if a certain module is available on
-certain module is available on rome architecture.
+`rome` architecture.
 ## Example, running CP2K on Rome
 First, check what CP2K modules are available in general:
 `module load spider CP2K` or `module avail CP2K`.
-You will see that there are several different CP2K versions avail, built
+You will see that there are several different CP2K versions avail, built with different toolchains.
-with different toolchains. Now let's assume you have to decided you want
+Now let's assume you have to decided you want to run CP2K version 6 at least, so to check if those
-to run CP2K version 6 at least, so to check if those modules are built
+modules are built for rome, use:
-for rome, use:
 ```console
 marie@login$ ml_arch_avail CP2K/6
@@ -55,13 +53,11 @@ CP2K/6.1-intel-2018a: sandy, haswell
 CP2K/6.1-intel-2018a-spglib: haswell
 ```
-There you will see that only the modules built with **foss** toolchain
+There you will see that only the modules built with toolchain `foss` are available on architecture
-are available on architecture "rome", not the ones built with **intel**.
+`rome`, not the ones built with `intel`. So you can load, e.g. `ml CP2K/6.1-foss-2019a`.
-So you can load e.g. `ml CP2K/6.1-foss-2019a`.
-Then, when writing your batch script, you have to specify the **romeo**
+Then, when writing your batch script, you have to specify the partition `romeo`. Also, if e.g. you
-partition. Also, if e.g. you wanted to use an entire ROME node (no SMT)
+wanted to use an entire ROME node (no SMT) and fill it with MPI ranks, it could look like this:
-and fill it with MPI ranks, it could look like this:
 ```bash
 #!/bin/bash
@@ -73,27 +69,26 @@ and fill it with MPI ranks, it could look like this:
 srun cp2k.popt input.inp
 ```
-## Using the Intel toolchain on Rome
+## Using the Intel Toolchain on Rome
-Currently, we have only newer toolchains starting at `intel/2019b`
+Currently, we have only newer toolchains starting at `intel/2019b` installed for the Rome nodes.
-installed for the Rome nodes. Even though they have AMD CPUs, you can
+Even though they have AMD CPUs, you can still use the Intel compilers on there and they don't even
-still use the Intel compilers on there and they don't even create
+create bad-performing code. When using the Intel Math Kernel Library (MKL) up to version 2019,
-bad-performing code. When using the MKL up to version 2019, though,
+though, you should set the following environment variable to make sure that AVX2 is used:
-you should set the following environment variable to make sure that AVX2
-is used:
 ```bash
 export MKL_DEBUG_CPU_TYPE=5
 ```
-Without it, the MKL does a CPUID check and disables AVX2/FMA on
+Without it, the MKL does a CPUID check and disables AVX2/FMA on non-Intel CPUs, leading to much
-non-Intel CPUs, leading to much worse performance.
+worse performance.
 !!! note
-    In version 2020, Intel has removed this environment variable and added separate Zen
-    codepaths to the library. However, they are still incomplete and do not
+    In version 2020, Intel has removed this environment variable and added separate Zen codepaths to
-    cover every BLAS function. Also, the Intel AVX2 codepaths still seem to
+    the library. However, they are still incomplete and do not cover every BLAS function. Also, the
-    provide somewhat better performance, so a new workaround would be to
+    Intel AVX2 codepaths still seem to provide somewhat better performance, so a new workaround
-    overwrite the `mkl_serv_intel_cpu_true` symbol with a custom function:
+    would be to overwrite the `mkl_serv_intel_cpu_true` symbol with a custom function:
 ```c
 int mkl_serv_intel_cpu_true() {
@@ -108,13 +103,11 @@ marie@login$ gcc -shared -fPIC -o libfakeintel.so fakeintel.c
 marie@login$ export LD_PRELOAD=libfakeintel.so
 ```
-As for compiler optimization flags, `-xHOST` does not seem to produce
+As for compiler optimization flags, `-xHOST` does not seem to produce best-performing code in every
-best-performing code in every case on Rome. You might want to try
+case on Rome. You might want to try `-mavx2 -fma` instead.
-`-mavx2 -fma` instead.
 ### Intel MPI
-We have seen only half the theoretical peak bandwidth via Infiniband
+We have seen only half the theoretical peak bandwidth via Infiniband between two nodes, whereas
-between two nodes, whereas OpenMPI got close to the peak bandwidth, so
+OpenMPI got close to the peak bandwidth, so you might want to avoid using Intel MPI on partition
-you might want to avoid using Intel MPI on romeo if your application
+`rome` if your application heavily relies on MPI communication until this issue is resolved.
-heavily relies on MPI communication until this issue is resolved.
--- a/doc.zih.tu-dresden.de/wordlist.aspell
+++ b/doc.zih.tu-dresden.de/wordlist.aspell
@@ -6,6 +6,7 @@ Amdahl's
 analytics
 anonymized
 APIs
+AVX
 BeeGFS
 benchmarking
 BLAS
@@ -22,6 +23,7 @@ Chemnitz
 citable
 conda
 CPU
+CPUID
 CPUs
 css
 CSV
@@ -56,6 +58,7 @@ FFTW
 filesystem
 filesystems
 Flink
+FMA
 foreach
 Fortran
 Gaussian
@@ -130,6 +133,7 @@ mpifort
 mpirun
 multicore
 multithreaded
+Multithreading
 NAMD
 natively
 NCCL
@@ -175,6 +179,7 @@ PowerAI
 ppc
 Preload
 preloaded
+preloading
 PSOCK
 Pthreads
 pymdownx
@@ -236,6 +241,8 @@ Theano
 tmp
 todo
 ToDo
+toolchain
+toolchains
 tracefile
 tracefiles
 transferability