Skip to content
Snippets Groups Projects
Commit 2698d527 authored by Martin Schroschk's avatar Martin Schroschk
Browse files

Review: make node topo graphics foldable

parent 52c3794d
No related branches found
No related tags found
4 merge requests!392Merge preview into contrib guide for browser users,!333Draft: update NGC containers,!327Merge preview into main,!317Jobs and resources
# ZIH Systems # ZIH Systems
The High Performance Computing and Storage Complex (HRSK-II) and its extension High Performance ZIH systems comprises the *High Performance Computing and Storage Complex* (HRSK-II) and its
Computing – Data Analytics (HPC-DA) offers scientists about 60,000 CPU cores and a peak performance extension *High Performance Computing – Data Analytics* (HPC-DA). In totoal it offers scientists
of more than 1.5 quadrillion floating point operations per second. The architecture specifically about 60,000 CPU cores and a peak performance of more than 1.5 quadrillion floating point operations
tailored to data-intensive computing, Big Data analytics, and artificial intelligence methods with per second. The architecture specifically tailored to data-intensive computing, Big Data analytics,
extensive capabilities for energy measurement and performance monitoring provides ideal conditions and artificial intelligence methods with extensive capabilities for energy measurement and
to achieve the ambitious research goals of the users and the ZIH. performance monitoring provides ideal conditions to achieve the ambitious research goals of the
users and the ZIH.
## Login Nodes ## Login Nodes
- Login-Nodes (`tauruslogin[3-6].hrsk.tu-dresden.de`) - Login-Nodes (`tauruslogin[3-6].hrsk.tu-dresden.de`)
- each with 2x Intel(R) Xeon(R) CPU E5-2680 v3 each with 12 cores - each with 2x Intel(R) Xeon(R) CPU E5-2680 v3 each with 12 cores
@ 2.50GHz, MultiThreading Disabled, 64 GB RAM, 128 GB SSD local @ 2.50GHz, MultiThreading Disabled, 64 GB RAM, 128 GB SSD local disk
disk
- IPs: 141.30.73.\[102-105\] - IPs: 141.30.73.\[102-105\]
- Transfer-Nodes (`taurusexport3/4.hrsk.tu-dresden.de`, DNS Alias - Transfer-Nodes (`taurusexport3/4.hrsk.tu-dresden.de`, DNS Alias
`taurusexport.hrsk.tu-dresden.de`) `taurusexport.hrsk.tu-dresden.de`)
- 2 Servers without interactive login, only available via file - 2 Servers without interactive login, only available via file transfer protocols (`rsync`, `ftp`)
transfer protocols (rsync, ftp)
- IPs: 141.30.73.82/83 - IPs: 141.30.73.82/83
- Direct access to these nodes is granted via IP whitelisting (contact - Direct access to these nodes is granted via IP whitelisting (contact
<hpcsupport@zih.tu-dresden.de>) - otherwise use TU Dresden VPN. hpcsupport@zih.tu-dresden.de) - otherwise use TU Dresden VPN.
## AMD Rome CPUs + NVIDIA A100 ## AMD Rome CPUs + NVIDIA A100
- 32 nodes, each with - 32 nodes, each with
- 8 x NVIDIA A100-SXM4 - 8 x NVIDIA A100-SXM4
- 2 x AMD EPYC CPU 7352 (24 cores) @ 2.3 GHz, MultiThreading - 2 x AMD EPYC CPU 7352 (24 cores) @ 2.3 GHz, MultiThreading disabled
disabled - 1 TB RAM
- 1 TB RAM - 3.5 TB local memory at NVMe device at `/tmp`
- 3.5 TB /tmp local NVMe device - Hostnames: `taurusi[8001-8034]`
- Hostnames: taurusi\[8001-8034\] - Slurm partition `alpha`
- SLURM partition `alpha` - Dedicated mostly for ScaDS-AI
- dedicated mostly for ScaDS-AI
## Island 7 - AMD Rome CPUs ## Island 7 - AMD Rome CPUs
- 192 nodes, each with - 192 nodes, each with
- 2x AMD EPYC CPU 7702 (64 cores) @ 2.0GHz, MultiThreading - 2x AMD EPYC CPU 7702 (64 cores) @ 2.0GHz, MultiThreading
enabled, enabled,
- 512 GB RAM - 512 GB RAM
- 200 GB /tmp on local SSD local disk - 200 GB /tmp on local SSD local disk
- Hostnames: taurusi\[7001-7192\] - Hostnames: taurusi\[7001-7192\]
- SLURM partition `romeo` - Slurm partition `romeo`
- more information under [RomeNodes](rome_nodes.md) - More information under [RomeNodes](rome_nodes.md)
## Large SMP System HPE Superdome Flex ## Large SMP System HPE Superdome Flex
- 32 x Intel(R) Xeon(R) Platinum 8276M CPU @ 2.20GHz (28 cores) - 32 x Intel(R) Xeon(R) Platinum 8276M CPU @ 2.20GHz (28 cores)
- 47 TB RAM - 47 TB RAM
- currently configured as one single node - Currently configured as one single node
- Hostname: taurussmp8 - Hostname: `taurussmp8`
- SLURM partition `julia` - Slurm partition `julia`
- more information under [HPE SD Flex](sd_flex.md) - More information under [HPE SD Flex](sd_flex.md)
## IBM Power9 Nodes for Machine Learning ## IBM Power9 Nodes for Machine Learning
For machine learning, we have 32 IBM AC922 nodes installed with this For machine learning, we have 32 IBM AC922 nodes installed with this configuration:
configuration:
- 2 x IBM Power9 CPU (2.80 GHz, 3.10 GHz boost, 22 cores) - 2 x IBM Power9 CPU (2.80 GHz, 3.10 GHz boost, 22 cores)
- 256 GB RAM DDR4 2666MHz - 256 GB RAM DDR4 2666MHz
- 6x NVIDIA VOLTA V100 with 32GB HBM2 - 6x NVIDIA VOLTA V100 with 32GB HBM2
- NVLINK bandwidth 150 GB/s between GPUs and host - NVLINK bandwidth 150 GB/s between GPUs and host
- SLURM partition `ml` - Slurm partition `ml`
- Hostnames: taurusml\[1-32\] - Hostnames: `taurusml[1-32]`
## Island 4 to 6 - Intel Haswell CPUs ## Island 4 to 6 - Intel Haswell CPUs
- 1456 nodes, each with 2x Intel(R) Xeon(R) CPU E5-2680 v3 (12 cores) - 1456 nodes, each with 2x Intel(R) Xeon(R) CPU E5-2680 v3 (12 cores)
@ 2.50GHz, MultiThreading disabled, 128 GB SSD local disk @ 2.50GHz, MultiThreading disabled, 128 GB SSD local disk
- Hostname: taurusi4\[001-232\], taurusi5\[001-612\], - Hostname: `taurusi4[001-232]`, `taurusi5[001-612]`,
taurusi6\[001-612\] `taurusi6[001-612]`
- varying amounts of main memory (selected automatically by the batch - Varying amounts of main memory (selected automatically by the batch
system for you according to your job requirements) system for you according to your job requirements)
- 1328 nodes with 2.67 GB RAM per core (64 GB total): - 1328 nodes with 2.67 GB RAM per core (64 GB total):
taurusi\[4001-4104,5001-5612,6001-6612\] `taurusi[4001-4104,5001-5612,6001-6612]`
- 84 nodes with 5.34 GB RAM per core (128 GB total): - 84 nodes with 5.34 GB RAM per core (128 GB total):
taurusi\[4105-4188\] `taurusi[4105-4188]`
- 44 nodes with 10.67 GB RAM per core (256 GB total): - 44 nodes with 10.67 GB RAM per core (256 GB total):
taurusi\[4189-4232\] `taurusi[4189-4232]`
- SLURM Partition `haswell` - Slurm Partition `haswell`
- [Node topology]
??? hint "Node topology"
![Node topology](misc/i4000.png)
{: align=center} ![Node topology](misc/i4000.png)
{: align=center}
### Extension of Island 4 with Broadwell CPUs ### Extension of Island 4 with Broadwell CPUs
* 32 nodes, eachs witch 2 x Intel(R) Xeon(R) CPU E5-2680 v4 @ 2.40GHz * 32 nodes, eachs witch 2 x Intel(R) Xeon(R) CPU E5-2680 v4 @ 2.40GHz
(**14 cores**) , MultiThreading disabled, 64 GB RAM, 256 GB SSD (**14 cores**), MultiThreading disabled, 64 GB RAM, 256 GB SSD local disk
local disk
* from the users' perspective: Broadwell is like Haswell * from the users' perspective: Broadwell is like Haswell
* Hostname: `taurusi[4233-4264]` * Hostname: `taurusi[4233-4264]`
* Slurm partition `broadwell` * Slurm partition `broadwell`
...@@ -106,21 +103,25 @@ configuration: ...@@ -106,21 +103,25 @@ configuration:
## SMP Nodes - up to 2 TB RAM ## SMP Nodes - up to 2 TB RAM
- 5 Nodes each with 4x Intel(R) Xeon(R) CPU E7-4850 v3 (14 cores) @ - 5 Nodes each with 4x Intel(R) Xeon(R) CPU E7-4850 v3 (14 cores) @
2.20GHz, MultiThreading Disabled, 2 TB RAM 2.20GHz, MultiThreading Disabled, 2 TB RAM
- Hostname: `taurussmp[3-7]` - Hostname: `taurussmp[3-7]`
- SLURM Partition `smp2` - Slurm partition `smp2`
![Node topology](misc/smp2.png) ??? hint "Node topology"
{: align=center}
![Node topology](misc/smp2.png)
{: align=center}
## Island 2 Phase 1 - Intel Sandybridge CPUs + NVIDIA K20x GPUs ## Island 2 Phase 1 - Intel Sandybridge CPUs + NVIDIA K20x GPUs
- 44 nodes, each with 2x Intel(R) Xeon(R) CPU E5-2450 (8 cores) @ - 44 nodes, each with 2x Intel(R) Xeon(R) CPU E5-2450 (8 cores) @
2.10GHz, MultiThreading Disabled, 48 GB RAM (3 GB per core), 128 GB 2.10GHz, MultiThreading Disabled, 48 GB RAM (3 GB per core), 128 GB
SSD local disk, 2x NVIDIA Tesla K20x (6 GB GDDR RAM) GPUs SSD local disk, 2x NVIDIA Tesla K20x (6 GB GDDR RAM) GPUs
- Hostname: `taurusi2[001-044]` - Hostname: `taurusi2[001-044]`
- SLURM Partition `gpu1` - Slurm partition `gpu1`
??? hint "Node topology"
![Node topology](misc/i2000.png) ![Node topology](misc/i2000.png)
{: align=center} {: align=center}
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment