Skip to content
GitLab
Explore
Sign in
Primary navigation
Search or go to…
Project
hpc-compendium
Manage
Activity
Members
Labels
Plan
Issues
Issue boards
Milestones
Wiki
Code
Merge requests
Repository
Branches
Commits
Tags
Repository graph
Compare revisions
Snippets
Deploy
Releases
Package Registry
Container Registry
Model registry
Operate
Terraform modules
Monitor
Incidents
Service Desk
Analyze
Value stream analytics
Contributor analytics
Repository analytics
Model experiments
Help
Help
Support
GitLab documentation
Compare GitLab plans
Community forum
Contribute to GitLab
Provide feedback
Terms and privacy
Keyboard shortcuts
?
Snippets
Groups
Projects
Show more breadcrumbs
ZIH
hpcsupport
hpc-compendium
Commits
2698d527
Commit
2698d527
authored
3 years ago
by
Martin Schroschk
Browse files
Options
Downloads
Patches
Plain Diff
Review: make node topo graphics foldable
parent
52c3794d
No related branches found
No related tags found
4 merge requests
!392
Merge preview into contrib guide for browser users
,
!333
Draft: update NGC containers
,
!327
Merge preview into main
,
!317
Jobs and resources
Changes
1
Hide whitespace changes
Inline
Side-by-side
Showing
1 changed file
doc.zih.tu-dresden.de/docs/jobs_and_resources/hardware_overview.md
+74
-73
74 additions, 73 deletions
...u-dresden.de/docs/jobs_and_resources/hardware_overview.md
with
74 additions
and
73 deletions
doc.zih.tu-dresden.de/docs/jobs_and_resources/hardware_overview.md
+
74
−
73
View file @
2698d527
# ZIH Systems
# ZIH Systems
The High Performance Computing and Storage Complex (HRSK-II) and its extension High Performance
ZIH systems comprises the
*High Performance Computing and Storage Complex*
(HRSK-II) and its
Computing – Data Analytics (HPC-DA) offers scientists about 60,000 CPU cores and a peak performance
extension
*High Performance Computing – Data Analytics*
(HPC-DA). In totoal it offers scientists
of more than 1.5 quadrillion floating point operations per second. The architecture specifically
about 60,000 CPU cores and a peak performance of more than 1.5 quadrillion floating point operations
tailored to data-intensive computing, Big Data analytics, and artificial intelligence methods with
per second. The architecture specifically tailored to data-intensive computing, Big Data analytics,
extensive capabilities for energy measurement and performance monitoring provides ideal conditions
and artificial intelligence methods with extensive capabilities for energy measurement and
to achieve the ambitious research goals of the users and the ZIH.
performance monitoring provides ideal conditions to achieve the ambitious research goals of the
users and the ZIH.
## Login Nodes
## Login Nodes
-
Login-Nodes (
`tauruslogin[3-6].hrsk.tu-dresden.de`
)
-
Login-Nodes (
`tauruslogin[3-6].hrsk.tu-dresden.de`
)
-
each with 2x Intel(R) Xeon(R) CPU E5-2680 v3 each with 12 cores
-
each with 2x Intel(R) Xeon(R) CPU E5-2680 v3 each with 12 cores
@ 2.50GHz, MultiThreading Disabled, 64 GB RAM, 128 GB SSD local
@ 2.50GHz, MultiThreading Disabled, 64 GB RAM, 128 GB SSD local disk
disk
-
IPs: 141.30.73.
\[
102-105
\]
-
IPs: 141.30.73.
\[
102-105
\]
-
Transfer-Nodes (
`taurusexport3/4.hrsk.tu-dresden.de`
, DNS Alias
-
Transfer-Nodes (
`taurusexport3/4.hrsk.tu-dresden.de`
, DNS Alias
`taurusexport.hrsk.tu-dresden.de`
)
`taurusexport.hrsk.tu-dresden.de`
)
-
2 Servers without interactive login, only available via file
-
2 Servers without interactive login, only available via file transfer protocols (
`rsync`
,
`ftp`
)
transfer protocols (rsync, ftp)
-
IPs: 141.30.73.82/83
-
IPs: 141.30.73.82/83
-
Direct access to these nodes is granted via IP whitelisting (contact
-
Direct access to these nodes is granted via IP whitelisting (contact
<
hpcsupport@zih.tu-dresden.de
>
) - otherwise use TU Dresden VPN.
hpcsupport@zih.tu-dresden.de) - otherwise use TU Dresden VPN.
## AMD Rome CPUs + NVIDIA A100
## AMD Rome CPUs + NVIDIA A100
-
32 nodes, each with
-
32 nodes, each with
-
8 x NVIDIA A100-SXM4
-
8 x NVIDIA A100-SXM4
-
2 x AMD EPYC CPU 7352 (24 cores) @ 2.3 GHz, MultiThreading
-
2 x AMD EPYC CPU 7352 (24 cores) @ 2.3 GHz, MultiThreading disabled
disabled
-
1 TB RAM
-
1 TB RAM
-
3.5 TB local memory at NVMe device at
`/tmp`
-
3.5 TB /tmp local NVMe device
-
Hostnames:
`taurusi[8001-8034]`
-
Hostnames: taurusi
\[
8001-8034
\]
-
Slurm partition
`alpha`
-
SLURM partition
`alpha`
-
Dedicated mostly for ScaDS-AI
-
dedicated mostly for ScaDS-AI
## Island 7 - AMD Rome CPUs
## Island 7 - AMD Rome CPUs
-
192 nodes, each with
-
192 nodes, each with
-
2x AMD EPYC CPU 7702 (64 cores) @ 2.0GHz, MultiThreading
-
2x AMD EPYC CPU 7702 (64 cores) @ 2.0GHz, MultiThreading
enabled,
enabled,
-
512 GB RAM
-
512 GB RAM
-
200 GB /tmp on local SSD local disk
-
200 GB /tmp on local SSD local disk
-
Hostnames: taurusi
\[
7001-7192
\]
-
Hostnames: taurusi
\[
7001-7192
\]
-
SLURM
partition
`romeo`
-
Slurm
partition
`romeo`
-
m
ore information under
[
RomeNodes
](
rome_nodes.md
)
-
M
ore information under
[
RomeNodes
](
rome_nodes.md
)
## Large SMP System HPE Superdome Flex
## Large SMP System HPE Superdome Flex
-
32 x Intel(R) Xeon(R) Platinum 8276M CPU @ 2.20GHz (28 cores)
-
32 x Intel(R) Xeon(R) Platinum 8276M CPU @ 2.20GHz (28 cores)
-
47 TB RAM
-
47 TB RAM
-
c
urrently configured as one single node
-
C
urrently configured as one single node
-
Hostname: taurussmp8
-
Hostname:
`
taurussmp8
`
-
SLURM
partition
`julia`
-
Slurm
partition
`julia`
-
m
ore information under
[
HPE SD Flex
](
sd_flex.md
)
-
M
ore information under
[
HPE SD Flex
](
sd_flex.md
)
## IBM Power9 Nodes for Machine Learning
## IBM Power9 Nodes for Machine Learning
For machine learning, we have 32 IBM AC922 nodes installed with this
For machine learning, we have 32 IBM AC922 nodes installed with this configuration:
configuration:
-
2 x IBM Power9 CPU (2.80 GHz, 3.10 GHz boost, 22 cores)
-
2 x IBM Power9 CPU (2.80 GHz, 3.10 GHz boost, 22 cores)
-
256 GB RAM DDR4 2666MHz
-
256 GB RAM DDR4 2666MHz
-
6x NVIDIA VOLTA V100 with 32GB HBM2
-
6x NVIDIA VOLTA V100 with 32GB HBM2
-
NVLINK bandwidth 150 GB/s between GPUs and host
-
NVLINK bandwidth 150 GB/s between GPUs and host
-
SLURM
partition
`ml`
-
Slurm
partition
`ml`
-
Hostnames: taurusml
\
[
1-32
\
]
-
Hostnames:
`
taurusml[1-32]
`
## Island 4 to 6 - Intel Haswell CPUs
## Island 4 to 6 - Intel Haswell CPUs
-
1456 nodes, each with 2x Intel(R) Xeon(R) CPU E5-2680 v3 (12 cores)
-
1456 nodes, each with 2x Intel(R) Xeon(R) CPU E5-2680 v3 (12 cores)
@ 2.50GHz, MultiThreading disabled, 128 GB SSD local disk
@ 2.50GHz, MultiThreading disabled, 128 GB SSD local disk
-
Hostname: taurusi4
\[
001-232
\]
, taurusi5
\[
001-612
\]
,
-
Hostname:
`taurusi4[001-232]`
,
`taurusi5[001-612]`
,
taurusi6
\[
001-612
\]
`taurusi6[001-612]`
-
varying amounts of main memory (selected automatically by the batch
-
Varying amounts of main memory (selected automatically by the batch
system for you according to your job requirements)
system for you according to your job requirements)
-
1328 nodes with 2.67 GB RAM per core (64 GB total):
-
1328 nodes with 2.67 GB RAM per core (64 GB total):
taurusi
\[
4001-4104,5001-5612,6001-6612
\]
`taurusi[4001-4104,5001-5612,6001-6612]`
-
84 nodes with 5.34 GB RAM per core (128 GB total):
-
84 nodes with 5.34 GB RAM per core (128 GB total):
taurusi
\[
4105-4188
\]
`taurusi[4105-4188]`
-
44 nodes with 10.67 GB RAM per core (256 GB total):
-
44 nodes with 10.67 GB RAM per core (256 GB total):
taurusi
\[
4189-4232
\]
`taurusi[4189-4232]`
-
SLURM Partition
`haswell`
-
Slurm Partition
`haswell`
-
[Node topology]
??? hint "Node topology"

{: align=center}

{: align=center}
### Extension of Island 4 with Broadwell CPUs
### Extension of Island 4 with Broadwell CPUs
*
32 nodes, eachs witch 2 x Intel(R) Xeon(R) CPU E5-2680 v4 @ 2.40GHz
*
32 nodes, eachs witch 2 x Intel(R) Xeon(R) CPU E5-2680 v4 @ 2.40GHz
(
**14 cores**
) , MultiThreading disabled, 64 GB RAM, 256 GB SSD
(
**14 cores**
), MultiThreading disabled, 64 GB RAM, 256 GB SSD local disk
local disk
*
from the users' perspective: Broadwell is like Haswell
*
from the users' perspective: Broadwell is like Haswell
*
Hostname:
`taurusi[4233-4264]`
*
Hostname:
`taurusi[4233-4264]`
*
Slurm partition
`broadwell`
*
Slurm partition
`broadwell`
...
@@ -106,21 +103,25 @@ configuration:
...
@@ -106,21 +103,25 @@ configuration:
## SMP Nodes - up to 2 TB RAM
## SMP Nodes - up to 2 TB RAM
-
5 Nodes each with 4x Intel(R) Xeon(R) CPU E7-4850 v3 (14 cores) @
-
5 Nodes each with 4x Intel(R) Xeon(R) CPU E7-4850 v3 (14 cores) @
2.
20GHz, MultiThreading Disabled, 2 TB RAM
2.
20GHz, MultiThreading Disabled, 2 TB RAM
-
Hostname:
`taurussmp[3-7]`
-
Hostname:
`taurussmp[3-7]`
-
SLURM P
artition
`smp2`
-
Slurm p
artition
`smp2`

??? hint "Node topology"
{: align=center}

{: align=center}
## Island 2 Phase 1 - Intel Sandybridge CPUs + NVIDIA K20x GPUs
## Island 2 Phase 1 - Intel Sandybridge CPUs + NVIDIA K20x GPUs
-
44 nodes, each with 2x Intel(R) Xeon(R) CPU E5-2450 (8 cores) @
-
44 nodes, each with 2x Intel(R) Xeon(R) CPU E5-2450 (8 cores) @
2.
10GHz, MultiThreading Disabled, 48 GB RAM (3 GB per core), 128 GB
2.
10GHz, MultiThreading Disabled, 48 GB RAM (3 GB per core), 128 GB
SSD local disk, 2x NVIDIA Tesla K20x (6 GB GDDR RAM) GPUs
SSD local disk, 2x NVIDIA Tesla K20x (6 GB GDDR RAM) GPUs
-
Hostname:
`taurusi2[001-044]`
-
Hostname:
`taurusi2[001-044]`
-
SLURM Partition
`gpu1`
-
Slurm partition
`gpu1`
??? hint "Node topology"


{: align=center}
{: align=center}
This diff is collapsed.
Click to expand it.
Preview
0%
Loading
Try again
or
attach a new file
.
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Save comment
Cancel
Please
register
or
sign in
to comment