Skip to content
Snippets Groups Projects
Commit ea67e57b authored by Martin Schroschk's avatar Martin Schroschk
Browse files

Merge branch '476-fix-remeo-smt' into 'preview'

Resolve "Romeo SMT enabled by default?"

Closes #476

See merge request !821
parents 1050b7a4 165b164c
No related branches found
No related tags found
2 merge requests!835Automated merge from preview to main,!821Resolve "Romeo SMT enabled by default?"
...@@ -12,6 +12,14 @@ The hardware specification is documented on the page ...@@ -12,6 +12,14 @@ The hardware specification is documented on the page
The NVIDIA A100 GPUs may only be used with **CUDA 11** or later. Earlier versions do not The NVIDIA A100 GPUs may only be used with **CUDA 11** or later. Earlier versions do not
recognize the new hardware properly. Make sure the software you are using is built with CUDA11. recognize the new hardware properly. Make sure the software you are using is built with CUDA11.
There is a total of 48 physical cores in each node. SMT is also active, so in total, 96 logical
cores are available per node.
!!! note
Multithreading is disabled per default in a job.
See the [Slurm page](slurm.md) on how to enable it.
### Modules ### Modules
The easiest way is using the [module system](../software/modules.md). The easiest way is using the [module system](../software/modules.md).
......
...@@ -26,7 +26,7 @@ users and the ZIH. ...@@ -26,7 +26,7 @@ users and the ZIH.
- 34 nodes, each with - 34 nodes, each with
- 8 x NVIDIA A100-SXM4 Tensor Core-GPUs - 8 x NVIDIA A100-SXM4 Tensor Core-GPUs
- 2 x AMD EPYC CPU 7352 (24 cores) @ 2.3 GHz, Multithreading disabled - 2 x AMD EPYC CPU 7352 (24 cores) @ 2.3 GHz, Multithreading available
- 1 TB RAM - 1 TB RAM
- 3.5 TB local memory on NVMe device at `/tmp` - 3.5 TB local memory on NVMe device at `/tmp`
- Hostnames: `taurusi[8001-8034]` - Hostnames: `taurusi[8001-8034]`
...@@ -36,7 +36,7 @@ users and the ZIH. ...@@ -36,7 +36,7 @@ users and the ZIH.
## Island 7 - AMD Rome CPUs ## Island 7 - AMD Rome CPUs
- 192 nodes, each with - 192 nodes, each with
- 2 x AMD EPYC CPU 7702 (64 cores) @ 2.0 GHz, Multithreading enabled, - 2 x AMD EPYC CPU 7702 (64 cores) @ 2.0 GHz, Multithreading available
- 512 GB RAM - 512 GB RAM
- 200 GB local memory on SSD at `/tmp` - 200 GB local memory on SSD at `/tmp`
- Hostnames: `taurusi[7001-7192]` - Hostnames: `taurusi[7001-7192]`
......
...@@ -316,6 +316,32 @@ provide a comprehensive collection of job examples. ...@@ -316,6 +316,32 @@ provide a comprehensive collection of job examples.
* Submisson: `marie@login$ sbatch batch_script.sh` * Submisson: `marie@login$ sbatch batch_script.sh`
* Run with fewer MPI tasks: `marie@login$ sbatch --ntasks=14 batch_script.sh` * Run with fewer MPI tasks: `marie@login$ sbatch --ntasks=14 batch_script.sh`
## Using Simultaneous Multithreading (SMT)
Most modern architectures offer simultaneous multithreading (SMT), where physical cores of a CPU are
split into virtual cores (aka. threads). This technique allows to run two instruction streams per
physical core in parallel.
At ZIH systems, SMT is available at the partitions `rome` and `alpha`. It is deactivated by
default, because the environment variable `SLURM_HINT` is set to `nomultithread`.
If you wish to make use of the SMT cores, you need to explicitly activate it.
In principle, there are two different ways:
1. Change the value of the environment variable via `export SLURM_HINT=multithread` in your current
shell and submit your job file, or invoke your `srun` or `salloc` command line.
1. Clear the environment variable via `unset SLURM_HINT` and provide the option `--hint=multithread`
to `sbatch`, `srun` or `salloc` command line.
??? warning
If you like to activate SMT via the directive
```
#SBATCH --hint=multithread
```
within your job file, you also have to clear the environment variable `SLURM_HINT` before
submitting the job file. Otherwise, the environment varibale `SLURM_HINT` takes precedence.
## Heterogeneous Jobs ## Heterogeneous Jobs
A heterogeneous job consists of several job components, all of which can have individual job A heterogeneous job consists of several job components, all of which can have individual job
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment