Skip to content
GitLab
Explore
Sign in
Primary navigation
Search or go to…
Project
hpc-compendium
Manage
Activity
Members
Labels
Plan
Issues
Issue boards
Milestones
Wiki
Code
Merge requests
Repository
Branches
Commits
Tags
Repository graph
Compare revisions
Snippets
Deploy
Releases
Package Registry
Container Registry
Model registry
Operate
Terraform modules
Monitor
Incidents
Service Desk
Analyze
Value stream analytics
Contributor analytics
Repository analytics
Model experiments
Help
Help
Support
GitLab documentation
Compare GitLab plans
Community forum
Contribute to GitLab
Provide feedback
Terms and privacy
Keyboard shortcuts
?
Snippets
Groups
Projects
Show more breadcrumbs
ZIH
hpcsupport
hpc-compendium
Commits
7cefe2eb
Commit
7cefe2eb
authored
3 years ago
by
Taras Lazariv
Browse files
Options
Downloads
Patches
Plain Diff
Minor style changes
parent
cce78318
No related branches found
Branches containing commit
No related tags found
Tags containing commit
3 merge requests
!322
Merge preview into main
,
!319
Merge preview into main
,
!239
vorschlag für HW-Steckbrief
Changes
1
Hide whitespace changes
Inline
Side-by-side
Showing
1 changed file
doc.zih.tu-dresden.de/docs/jobs_and_resources/rome_nodes.md
+19
-20
19 additions, 20 deletions
doc.zih.tu-dresden.de/docs/jobs_and_resources/rome_nodes.md
with
19 additions
and
20 deletions
doc.zih.tu-dresden.de/docs/jobs_and_resources/rome_nodes.md
+
19
−
20
View file @
7cefe2eb
...
@@ -16,25 +16,23 @@ node. SMT is also active, so in total, 256 logical cores are available
...
@@ -16,25 +16,23 @@ node. SMT is also active, so in total, 256 logical cores are available
per node.
per node.
!!! note
!!! note
Multithreading is disabled per default in a job. To make use of it
Multithreading is disabled per default in a job. To make use of it
include the Slurm parameter
`--hint=multithread`
in your job script
include the Slurm parameter `--hint=multithread` in your job script
or command line, or set
or command line, or set
the environment variable
`SLURM_HINT=multithread`
before job submission.
the environment variable `SLURM_HINT=multithread` before jub submission.
Each node brings 512 GB of main memory, so you can request roughly
Each node brings 512 GB of main memory, so you can request roughly
1972MB per logical core (using --mem-per-cpu). Note that you will always
1972MB per logical core (using --mem-per-cpu). Note that you will always
get the memory for the logical core sibling too, even if you do not
get the memory for the logical core sibling too, even if you do not
intend to use SMT.
intend to use SMT.
!!! Note
!!! note
If you are running a job here with only ONE process (maybe
If you are running a job here with only ONE process (maybe
multiple cores), please explicitly set the option
`-n 1`
!
multiple cores), please explicitely set the option `-n 1` !
Be aware that software built with Intel compilers and
`-x*`
optimization
Be aware that software built with Intel compilers and
`-x*`
optimization
flags will not run on those AMD processors! That's why most older
flags will not run on those AMD processors! That's why most older
modules built with intel toolchains are not availabe on
**romeo**
.
modules built with intel toolchains are not availab
l
e on
**romeo**
.
We provide the script:
`ml_arch_avail`
that you can use to check if a
We provide the script:
`ml_arch_avail`
that you can use to check if a
certain module is available on rome architecture.
certain module is available on rome architecture.
...
@@ -80,7 +78,7 @@ srun cp2k.popt input.inp
...
@@ -80,7 +78,7 @@ srun cp2k.popt input.inp
Currently, we have only newer toolchains starting at
`intel/2019b`
Currently, we have only newer toolchains starting at
`intel/2019b`
installed for the Rome nodes. Even though they have AMD CPUs, you can
installed for the Rome nodes. Even though they have AMD CPUs, you can
still use the Intel compilers on there and they don't even create
still use the Intel compilers on there and they don't even create
bad-perform
an
ing code. When using the MKL up to version 2019, though,
bad-performing code. When using the MKL up to version 2019, though,
you should set the following environment variable to make sure that AVX2
you should set the following environment variable to make sure that AVX2
is used:
is used:
...
@@ -89,12 +87,13 @@ export MKL_DEBUG_CPU_TYPE=5
...
@@ -89,12 +87,13 @@ export MKL_DEBUG_CPU_TYPE=5
```
```
Without it, the MKL does a CPUID check and disables AVX2/FMA on
Without it, the MKL does a CPUID check and disables AVX2/FMA on
non-Intel CPUs, leading to much worse performance.
**NOTE:**
in version
non-Intel CPUs, leading to much worse performance.
2020, Intel has removed this environment variable and added separate Zen
!!! note
codepaths to the library. However, they are still incomplete and do not
In version 2020, Intel has removed this environment variable and added separate Zen
cover every BLAS function. Also, the Intel AVX2 codepaths still seem to
codepaths to the library. However, they are still incomplete and do not
provide somewhat better performance, so a new workaround would be to
cover every BLAS function. Also, the Intel AVX2 codepaths still seem to
overwrite the
`mkl_serv_intel_cpu_true`
symbol with a custom function:
provide somewhat better performance, so a new workaround would be to
overwrite the
`mkl_serv_intel_cpu_true`
symbol with a custom function:
```
c
```
c
int
mkl_serv_intel_cpu_true
()
{
int
mkl_serv_intel_cpu_true
()
{
...
@@ -105,8 +104,8 @@ int mkl_serv_intel_cpu_true() {
...
@@ -105,8 +104,8 @@ int mkl_serv_intel_cpu_true() {
and preloading this in a library:
and preloading this in a library:
```
console
```
console
marie@login$
gcc
-shared
-fPIC
-o
libfakeintel.so fakeintel.c
marie@login$
gcc
-shared
-fPIC
-o
libfakeintel.so fakeintel.c
marie@login$
export
LD_PRELOAD
=
libfakeintel.so
marie@login$
export
LD_PRELOAD
=
libfakeintel.so
```
```
As for compiler optimization flags,
`-xHOST`
does not seem to produce
As for compiler optimization flags,
`-xHOST`
does not seem to produce
...
@@ -118,4 +117,4 @@ best-performing code in every case on Rome. You might want to try
...
@@ -118,4 +117,4 @@ best-performing code in every case on Rome. You might want to try
We have seen only half the theoretical peak bandwidth via Infiniband
We have seen only half the theoretical peak bandwidth via Infiniband
between two nodes, whereas OpenMPI got close to the peak bandwidth, so
between two nodes, whereas OpenMPI got close to the peak bandwidth, so
you might want to avoid using Intel MPI on romeo if your application
you might want to avoid using Intel MPI on romeo if your application
heavily relies on MPI communication until this issue is resolved.
heavily relies on MPI communication until this issue is resolved.
\ No newline at end of file
This diff is collapsed.
Click to expand it.
Preview
0%
Loading
Try again
or
attach a new file
.
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Save comment
Cancel
Please
register
or
sign in
to comment