Skip to content
Snippets Groups Projects
Commit 7cefe2eb authored by Taras Lazariv's avatar Taras Lazariv
Browse files

Minor style changes

parent cce78318
No related branches found
No related tags found
3 merge requests!322Merge preview into main,!319Merge preview into main,!239vorschlag für HW-Steckbrief
...@@ -16,25 +16,23 @@ node. SMT is also active, so in total, 256 logical cores are available ...@@ -16,25 +16,23 @@ node. SMT is also active, so in total, 256 logical cores are available
per node. per node.
!!! note !!! note
Multithreading is disabled per default in a job. To make use of it
Multithreading is disabled per default in a job. To make use of it include the Slurm parameter `--hint=multithread` in your job script
include the Slurm parameter `--hint=multithread` in your job script or command line, or set
or command line, or set the environment variable `SLURM_HINT=multithread` before job submission.
the environment variable `SLURM_HINT=multithread` before jub submission.
Each node brings 512 GB of main memory, so you can request roughly Each node brings 512 GB of main memory, so you can request roughly
1972MB per logical core (using --mem-per-cpu). Note that you will always 1972MB per logical core (using --mem-per-cpu). Note that you will always
get the memory for the logical core sibling too, even if you do not get the memory for the logical core sibling too, even if you do not
intend to use SMT. intend to use SMT.
!!! Note !!! note
If you are running a job here with only ONE process (maybe
If you are running a job here with only ONE process (maybe multiple cores), please explicitly set the option `-n 1` !
multiple cores), please explicitely set the option `-n 1` !
Be aware that software built with Intel compilers and `-x*` optimization Be aware that software built with Intel compilers and `-x*` optimization
flags will not run on those AMD processors! That's why most older flags will not run on those AMD processors! That's why most older
modules built with intel toolchains are not availabe on **romeo**. modules built with intel toolchains are not available on **romeo**.
We provide the script: `ml_arch_avail` that you can use to check if a We provide the script: `ml_arch_avail` that you can use to check if a
certain module is available on rome architecture. certain module is available on rome architecture.
...@@ -80,7 +78,7 @@ srun cp2k.popt input.inp ...@@ -80,7 +78,7 @@ srun cp2k.popt input.inp
Currently, we have only newer toolchains starting at `intel/2019b` Currently, we have only newer toolchains starting at `intel/2019b`
installed for the Rome nodes. Even though they have AMD CPUs, you can installed for the Rome nodes. Even though they have AMD CPUs, you can
still use the Intel compilers on there and they don't even create still use the Intel compilers on there and they don't even create
bad-performaning code. When using the MKL up to version 2019, though, bad-performing code. When using the MKL up to version 2019, though,
you should set the following environment variable to make sure that AVX2 you should set the following environment variable to make sure that AVX2
is used: is used:
...@@ -89,12 +87,13 @@ export MKL_DEBUG_CPU_TYPE=5 ...@@ -89,12 +87,13 @@ export MKL_DEBUG_CPU_TYPE=5
``` ```
Without it, the MKL does a CPUID check and disables AVX2/FMA on Without it, the MKL does a CPUID check and disables AVX2/FMA on
non-Intel CPUs, leading to much worse performance. **NOTE:** in version non-Intel CPUs, leading to much worse performance.
2020, Intel has removed this environment variable and added separate Zen !!! note
codepaths to the library. However, they are still incomplete and do not In version 2020, Intel has removed this environment variable and added separate Zen
cover every BLAS function. Also, the Intel AVX2 codepaths still seem to codepaths to the library. However, they are still incomplete and do not
provide somewhat better performance, so a new workaround would be to cover every BLAS function. Also, the Intel AVX2 codepaths still seem to
overwrite the `mkl_serv_intel_cpu_true` symbol with a custom function: provide somewhat better performance, so a new workaround would be to
overwrite the `mkl_serv_intel_cpu_true` symbol with a custom function:
```c ```c
int mkl_serv_intel_cpu_true() { int mkl_serv_intel_cpu_true() {
...@@ -105,8 +104,8 @@ int mkl_serv_intel_cpu_true() { ...@@ -105,8 +104,8 @@ int mkl_serv_intel_cpu_true() {
and preloading this in a library: and preloading this in a library:
```console ```console
marie@login$ gcc -shared -fPIC -o libfakeintel.so fakeintel.c marie@login$ gcc -shared -fPIC -o libfakeintel.so fakeintel.c
marie@login$ export LD_PRELOAD=libfakeintel.so marie@login$ export LD_PRELOAD=libfakeintel.so
``` ```
As for compiler optimization flags, `-xHOST` does not seem to produce As for compiler optimization flags, `-xHOST` does not seem to produce
...@@ -118,4 +117,4 @@ best-performing code in every case on Rome. You might want to try ...@@ -118,4 +117,4 @@ best-performing code in every case on Rome. You might want to try
We have seen only half the theoretical peak bandwidth via Infiniband We have seen only half the theoretical peak bandwidth via Infiniband
between two nodes, whereas OpenMPI got close to the peak bandwidth, so between two nodes, whereas OpenMPI got close to the peak bandwidth, so
you might want to avoid using Intel MPI on romeo if your application you might want to avoid using Intel MPI on romeo if your application
heavily relies on MPI communication until this issue is resolved. heavily relies on MPI communication until this issue is resolved.
\ No newline at end of file
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment