Skip to content
GitLab
Explore
Sign in
Primary navigation
Search or go to…
Project
hpc-compendium
Manage
Activity
Members
Labels
Plan
Issues
Issue boards
Milestones
Wiki
Code
Merge requests
Repository
Branches
Commits
Tags
Repository graph
Compare revisions
Snippets
Deploy
Releases
Package Registry
Container Registry
Model registry
Operate
Terraform modules
Monitor
Incidents
Service Desk
Analyze
Value stream analytics
Contributor analytics
Repository analytics
Model experiments
Help
Help
Support
GitLab documentation
Compare GitLab plans
Community forum
Contribute to GitLab
Provide feedback
Terms and privacy
Keyboard shortcuts
?
Snippets
Groups
Projects
Show more breadcrumbs
ZIH
hpcsupport
hpc-compendium
Commits
c5713b05
Commit
c5713b05
authored
3 years ago
by
Martin Schroschk
Browse files
Options
Downloads
Patches
Plain Diff
Brief review w.r.t. markdown
parent
c696d145
No related branches found
No related tags found
4 merge requests
!392
Merge preview into contrib guide for browser users
,
!333
Draft: update NGC containers
,
!327
Merge preview into main
,
!317
Jobs and resources
Changes
2
Hide whitespace changes
Inline
Side-by-side
Showing
2 changed files
doc.zih.tu-dresden.de/docs/jobs_and_resources/rome_nodes.md
+47
-54
47 additions, 54 deletions
doc.zih.tu-dresden.de/docs/jobs_and_resources/rome_nodes.md
doc.zih.tu-dresden.de/wordlist.aspell
+7
-0
7 additions, 0 deletions
doc.zih.tu-dresden.de/wordlist.aspell
with
54 additions
and
54 deletions
doc.zih.tu-dresden.de/docs/jobs_and_resources/rome_nodes.md
+
47
−
54
View file @
c5713b05
...
@@ -2,50 +2,48 @@
...
@@ -2,50 +2,48 @@
## Hardware
## Hardware
-
Slurm partiton: romeo
-
Slurm partit
i
on:
`
romeo
`
-
Module architecture: rome
-
Module architecture:
`
rome
`
-
192 nodes taurusi[7001-7192], each:
-
192 nodes
`
taurusi[7001-7192]
`
, each:
-
2x AMD EPYC CPU 7702 (64 cores) @ 2.0GHz, Multi
T
hreading
-
2x AMD EPYC CPU 7702 (64 cores) @ 2.0GHz,
Simultaneous
Multi
t
hreading
(SMT)
-
512 GB RAM
-
512 GB RAM
-
200 GB SSD disk mounted on /tmp
-
200 GB SSD disk mounted on
`
/tmp
`
## Usage
## Usage
There is a total of 128 physical cores in each
There is a total of 128 physical cores in each node. SMT is also active, so in total, 256 logical
node. SMT is also active, so in total, 256 logical cores are available
cores are available per node.
per node.
!!! note
!!! note
Multithreading is disabled per default in a job. To make use of it
include the Slurm parameter
`--hint=multithread`
in your job script
or command line, or set
the environment variable
`SLURM_HINT=multithread`
before job submission.
Each node brings 512 GB of main memory, so you can request roughly
Multithreading is disabled per default in a job. To make use of it include the Slurm parameter
1972MB per logical core (using --mem-per-cpu). Note that you will always
`--hint=multithread` in your job script or command line, or set the environment variable
get the memory for the logical core sibling too, even if you do not
`SLURM_HINT=multithread` before job submission.
intend to use SMT.
Each node brings 512 GB of main memory, so you can request roughly 1972 MB per logical core (using
`--mem-per-cpu`
). Note that you will always get the memory for the logical core sibling too, even if
you do not intend to use SMT.
!!! note
!!! note
If you are running a job here with only ONE process (maybe
multiple cores), please explicitly set the option
`-n 1`
!
Be aware that software built with Intel compilers and
`-x*`
optimization
If you are running a job here with only ONE process (maybe multiple cores), please explicitly
flags will not run on those AMD processors! That's why most older
set the option `-n 1`!
modules built with intel toolchains are not available on
**romeo**
.
Be aware that software built with Intel compilers and
`-x*`
optimization flags will not run on those
AMD processors! That's why most older modules built with Intel toolchains are not available on
partition
`romeo`
.
We provide the script
:
`ml_arch_avail`
that
you
can use to check if a
We provide the script
`ml_arch_avail`
that can
be
use
d
to check if a
certain module is available on
certain module is available on
rome architecture.
`
rome
`
architecture.
## Example, running CP2K on Rome
## Example, running CP2K on Rome
First, check what CP2K modules are available in general:
First, check what CP2K modules are available in general:
`module load spider CP2K`
or
`module avail CP2K`
.
`module load spider CP2K`
or
`module avail CP2K`
.
You will see that there are several different CP2K versions avail, built
You will see that there are several different CP2K versions avail, built with different toolchains.
with different toolchains. Now let's assume you have to decided you want
Now let's assume you have to decided you want to run CP2K version 6 at least, so to check if those
to run CP2K version 6 at least, so to check if those modules are built
modules are built for rome, use:
for rome, use:
```
console
```
console
marie@login$
ml_arch_avail CP2K/6
marie@login$
ml_arch_avail CP2K/6
...
@@ -55,13 +53,11 @@ CP2K/6.1-intel-2018a: sandy, haswell
...
@@ -55,13 +53,11 @@ CP2K/6.1-intel-2018a: sandy, haswell
CP2K/6.1-intel-2018a-spglib: haswell
CP2K/6.1-intel-2018a-spglib: haswell
```
```
There you will see that only the modules built with
**foss**
toolchain
There you will see that only the modules built with toolchain
`foss`
are available on architecture
are available on architecture "rome", not the ones built with
**intel**
.
`rome`
, not the ones built with
`intel`
. So you can load, e.g.
`ml CP2K/6.1-foss-2019a`
.
So you can load e.g.
`ml CP2K/6.1-foss-2019a`
.
Then, when writing your batch script, you have to specify the
**romeo**
Then, when writing your batch script, you have to specify the partition
`romeo`
. Also, if e.g. you
partition. Also, if e.g. you wanted to use an entire ROME node (no SMT)
wanted to use an entire ROME node (no SMT) and fill it with MPI ranks, it could look like this:
and fill it with MPI ranks, it could look like this:
```
bash
```
bash
#!/bin/bash
#!/bin/bash
...
@@ -73,27 +69,26 @@ and fill it with MPI ranks, it could look like this:
...
@@ -73,27 +69,26 @@ and fill it with MPI ranks, it could look like this:
srun cp2k.popt input.inp
srun cp2k.popt input.inp
```
```
## Using the Intel
t
oolchain on Rome
## Using the Intel
T
oolchain on Rome
Currently, we have only newer toolchains starting at
`intel/2019b`
Currently, we have only newer toolchains starting at
`intel/2019b`
installed for the Rome nodes.
installed for the Rome nodes. Even though they have AMD CPUs, you can
Even though they have AMD CPUs, you can still use the Intel compilers on there and they don't even
still use the Intel compilers on there and they don't even create
create bad-performing code. When using the Intel Math Kernel Library (MKL) up to version 2019,
bad-performing code. When using the MKL up to version 2019, though,
though, you should set the following environment variable to make sure that AVX2 is used:
you should set the following environment variable to make sure that AVX2
is used:
```
bash
```
bash
export
MKL_DEBUG_CPU_TYPE
=
5
export
MKL_DEBUG_CPU_TYPE
=
5
```
```
Without it, the MKL does a CPUID check and disables AVX2/FMA on
Without it, the MKL does a CPUID check and disables AVX2/FMA on non-Intel CPUs, leading to much
non-Intel CPUs, leading to much worse performance.
worse performance.
!!! note
!!! note
In version 2020, Intel has removed this environment variable and added separate Zen
codepaths to the library. However, they are still incomplete and do not
In version 2020, Intel has removed this environment variable and added separate Zen codepaths to
cover every BLAS function. Also, the
Intel AVX2 codepaths still seem to
the library. However, they are still incomplete and do not
cover every BLAS function. Also, the
provide somewhat better performance, so a new workaround
would be to
Intel AVX2 codepaths still seem to
provide somewhat better performance, so a new workaround
overwrite the
`mkl_serv_intel_cpu_true`
symbol with a custom function:
would be to
overwrite the `mkl_serv_intel_cpu_true` symbol with a custom function:
```
c
```
c
int
mkl_serv_intel_cpu_true
()
{
int
mkl_serv_intel_cpu_true
()
{
...
@@ -108,13 +103,11 @@ marie@login$ gcc -shared -fPIC -o libfakeintel.so fakeintel.c
...
@@ -108,13 +103,11 @@ marie@login$ gcc -shared -fPIC -o libfakeintel.so fakeintel.c
marie@login$
export
LD_PRELOAD
=
libfakeintel.so
marie@login$
export
LD_PRELOAD
=
libfakeintel.so
```
```
As for compiler optimization flags,
`-xHOST`
does not seem to produce
As for compiler optimization flags,
`-xHOST`
does not seem to produce best-performing code in every
best-performing code in every case on Rome. You might want to try
case on Rome. You might want to try
`-mavx2 -fma`
instead.
`-mavx2 -fma`
instead.
### Intel MPI
### Intel MPI
We have seen only half the theoretical peak bandwidth via Infiniband
We have seen only half the theoretical peak bandwidth via Infiniband between two nodes, whereas
between two nodes, whereas OpenMPI got close to the peak bandwidth, so
OpenMPI got close to the peak bandwidth, so you might want to avoid using Intel MPI on partition
you might want to avoid using Intel MPI on romeo if your application
`rome`
if your application heavily relies on MPI communication until this issue is resolved.
heavily relies on MPI communication until this issue is resolved.
This diff is collapsed.
Click to expand it.
doc.zih.tu-dresden.de/wordlist.aspell
+
7
−
0
View file @
c5713b05
...
@@ -6,6 +6,7 @@ Amdahl's
...
@@ -6,6 +6,7 @@ Amdahl's
analytics
analytics
anonymized
anonymized
APIs
APIs
AVX
BeeGFS
BeeGFS
benchmarking
benchmarking
BLAS
BLAS
...
@@ -22,6 +23,7 @@ Chemnitz
...
@@ -22,6 +23,7 @@ Chemnitz
citable
citable
conda
conda
CPU
CPU
CPUID
CPUs
CPUs
css
css
CSV
CSV
...
@@ -56,6 +58,7 @@ FFTW
...
@@ -56,6 +58,7 @@ FFTW
filesystem
filesystem
filesystems
filesystems
Flink
Flink
FMA
foreach
foreach
Fortran
Fortran
Gaussian
Gaussian
...
@@ -130,6 +133,7 @@ mpifort
...
@@ -130,6 +133,7 @@ mpifort
mpirun
mpirun
multicore
multicore
multithreaded
multithreaded
Multithreading
NAMD
NAMD
natively
natively
NCCL
NCCL
...
@@ -175,6 +179,7 @@ PowerAI
...
@@ -175,6 +179,7 @@ PowerAI
ppc
ppc
Preload
Preload
preloaded
preloaded
preloading
PSOCK
PSOCK
Pthreads
Pthreads
pymdownx
pymdownx
...
@@ -236,6 +241,8 @@ Theano
...
@@ -236,6 +241,8 @@ Theano
tmp
tmp
todo
todo
ToDo
ToDo
toolchain
toolchains
tracefile
tracefile
tracefiles
tracefiles
transferability
transferability
...
...
This diff is collapsed.
Click to expand it.
Preview
0%
Loading
Try again
or
attach a new file
.
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Save comment
Cancel
Please
register
or
sign in
to comment