Skip to content
GitLab
Explore
Sign in
Primary navigation
Search or go to…
Project
hpc-compendium
Manage
Activity
Members
Labels
Plan
Issues
Issue boards
Milestones
Wiki
Code
Merge requests
Repository
Branches
Commits
Tags
Repository graph
Compare revisions
Snippets
Deploy
Releases
Package Registry
Container Registry
Model registry
Operate
Terraform modules
Monitor
Incidents
Service Desk
Analyze
Value stream analytics
Contributor analytics
Repository analytics
Model experiments
Help
Help
Support
GitLab documentation
Compare GitLab plans
Community forum
Contribute to GitLab
Provide feedback
Terms and privacy
Keyboard shortcuts
?
Snippets
Groups
Projects
Show more breadcrumbs
ZIH
hpcsupport
hpc-compendium
Commits
ebca616a
Commit
ebca616a
authored
1 year ago
by
Ulf Markwardt
Browse files
Options
Downloads
Patches
Plain Diff
update
parent
8efb9f00
No related branches found
No related tags found
2 merge requests
!850
Automated merge from preview to main
,
!845
Barnard
Changes
1
Hide whitespace changes
Inline
Side-by-side
Showing
1 changed file
doc.zih.tu-dresden.de/docs/jobs_and_resources/hardware_overview.md
+70
-47
70 additions, 47 deletions
...u-dresden.de/docs/jobs_and_resources/hardware_overview.md
with
70 additions
and
47 deletions
doc.zih.tu-dresden.de/docs/jobs_and_resources/hardware_overview.md
+
70
−
47
View file @
ebca616a
# HPC Resources
# HPC Resources
The architecture specifically tailored to data-intensive computing, Big Data
HPC resources in ZIH systems comprise the
*High Performance Computing and Storage Complex*
and its
analytics, and artificial intelligence methods with extensive capabilities
extension
*High Performance Computing – Data Analytics*
. In total it offers scientists
for performance monitoring provides ideal conditions to achieve the ambitious
about 60,000 CPU cores and a peak performance of more than 1.5 quadrillion floating point
research goals of the users and the ZIH.
operations per second. The architecture specifically tailored to data-intensive computing, Big Data
analytics, and artificial intelligence methods with extensive capabilities for energy measurement
## Overview
and performance monitoring provides ideal conditions to achieve the ambitious research goals of the
users and the ZIH.
From the users' pespective, there are seperate clusters, all of them with their subdomains:
## Login and Export Nodes
| Name | Description | Year| DNS |
| --- | --- | --- | --- |
-
4 Login-Nodes
`tauruslogin[3-6].hrsk.tu-dresden.de`
|
**Barnard**
| CPU cluster |2023| n[1001-1630].barnard.hpc.tu-dresden.de |
-
Each login node is equipped with 2x Intel(R) Xeon(R) CPU E5-2680 v3 with 24 cores in total @
|
**Romeo**
| CPU cluster |2020|i[8001-8190].romeo.hpc.tu-dresden.de |
2.
50 GHz, Multithreading disabled, 64 GB RAM, 128 GB SSD local disk
|
**Alpha Centauri**
| GPU cluster |2021|i[8001-8037].alpha.hpc.tu-dresden.de |
-
IPs: 141.30.73.
\[
102-105
\]
|
**Julia**
| single SMP system |2021|smp8.julia.hpc.tu-dresden.de |
-
2 Data-Transfer-Nodes
`taurusexport[3-4].hrsk.tu-dresden.de`
|
**Power**
| IBM Power/GPU system |2018|ml[1-29].power9.hpc.tu-dresden.de |
-
DNS Alias
`taurusexport.hrsk.tu-dresden.de`
-
2 Servers without interactive login, only available via file transfer protocols
They run with their own Slurm batch system. Job submission is possible only from
(
`rsync`
,
`ftp`
)
their respective login nodes.
-
IPs: 141.30.73.
\[
82,83
\]
-
Further information on the usage is documented on the site
All clusters have access to these shared parallel file systems:
[
Export Nodes
](
../data_transfer/export_nodes.md
)
| File system | Usable directory | Type | Capacity | Purpose |
| --- | --- | --- | --- | --- |
| Home |
`/home`
| Lustre | quota per user: 20 GB | permanant user data |
| Project |
`/projects`
| Lustre | quota per project | permanant project data |
| Scratch for large data / streaming |
`/data/horse`
| Lustre | 20 PB | h
| Scratch for random access |
`/data/rabbit`
| Lustre | 2 PB |
These mount points are planned (September 2023):
| Scratch for random access |
`/data/weasel`
| WEKA | 232 TB |
| Scratch for random access |
`/data/squirrel`
| BeeGFS | xxx TB |
## Barnard - Intel Sapphire Rapids CPUs
-
630 diskless nodes, each with
-
2 x Intel(R) Xeon(R) CPU E5-2680 v3 (52 cores) @ 2.50 GHz, Multithreading enabled
-
512 GB RAM
-
Hostnames:
`n1[001-630].barnard.hpc.tu-dresden.de`
-
Login nodes:
`login[1-4].barnard.hpc.tu-dresden.de`
## AMD Rome CPUs + NVIDIA A100
## AMD Rome CPUs + NVIDIA A100
...
@@ -48,8 +29,8 @@ These mount points are planned (September 2023):
...
@@ -48,8 +29,8 @@ These mount points are planned (September 2023):
-
2 x AMD EPYC CPU 7352 (24 cores) @ 2.3 GHz, Multithreading available
-
2 x AMD EPYC CPU 7352 (24 cores) @ 2.3 GHz, Multithreading available
-
1 TB RAM
-
1 TB RAM
-
3.5 TB local memory on NVMe device at
`/tmp`
-
3.5 TB local memory on NVMe device at
`/tmp`
-
Hostnames:
`taurusi[8001-8034]`
->
`i[8001-8037].alpha.hpc.tu-dresden.de`
-
Hostnames:
`taurusi[8001-8034]`
-
Login nodes:
`login[1-2].alpha.hpc.tu-dresden.de
`
-
Slurm partition:
`alpha
`
-
Further information on the usage is documented on the site
[
Alpha Centauri Nodes
](
alpha_centauri.md
)
-
Further information on the usage is documented on the site
[
Alpha Centauri Nodes
](
alpha_centauri.md
)
## Island 7 - AMD Rome CPUs
## Island 7 - AMD Rome CPUs
...
@@ -58,8 +39,8 @@ These mount points are planned (September 2023):
...
@@ -58,8 +39,8 @@ These mount points are planned (September 2023):
-
2 x AMD EPYC CPU 7702 (64 cores) @ 2.0 GHz, Multithreading available
-
2 x AMD EPYC CPU 7702 (64 cores) @ 2.0 GHz, Multithreading available
-
512 GB RAM
-
512 GB RAM
-
200 GB local memory on SSD at
`/tmp`
-
200 GB local memory on SSD at
`/tmp`
-
Hostnames:
`taurusi[7001-7192]`
->
`i[7001-7190].romeo.hpc.tu-dresden.de`
-
Hostnames:
`taurusi[7001-7192]`
-
Login nodes:
`login[1-2].romeo.hpc.tu-dresden.de
`
-
Slurm partition:
`romeo
`
-
Further information on the usage is documented on the site
[
AMD Rome Nodes
](
rome_nodes.md
)
-
Further information on the usage is documented on the site
[
AMD Rome Nodes
](
rome_nodes.md
)
## Large SMP System HPE Superdome Flex
## Large SMP System HPE Superdome Flex
...
@@ -70,7 +51,8 @@ These mount points are planned (September 2023):
...
@@ -70,7 +51,8 @@ These mount points are planned (September 2023):
-
Configured as one single node
-
Configured as one single node
-
48 TB RAM (usable: 47 TB - one TB is used for cache coherence protocols)
-
48 TB RAM (usable: 47 TB - one TB is used for cache coherence protocols)
-
370 TB of fast NVME storage available at
`/nvme/<projectname>`
-
370 TB of fast NVME storage available at
`/nvme/<projectname>`
-
Hostname:
`taurussmp8`
->
`smp8.julia.hpc.tu-dresden.de`
-
Hostname:
`taurussmp8`
-
Slurm partition:
`julia`
-
Further information on the usage is documented on the site
[
HPE Superdome Flex
](
sd_flex.md
)
-
Further information on the usage is documented on the site
[
HPE Superdome Flex
](
sd_flex.md
)
## IBM Power9 Nodes for Machine Learning
## IBM Power9 Nodes for Machine Learning
...
@@ -82,5 +64,46 @@ For machine learning, we have IBM AC922 nodes installed with this configuration:
...
@@ -82,5 +64,46 @@ For machine learning, we have IBM AC922 nodes installed with this configuration:
-
256 GB RAM DDR4 2666 MHz
-
256 GB RAM DDR4 2666 MHz
-
6 x NVIDIA VOLTA V100 with 32 GB HBM2
-
6 x NVIDIA VOLTA V100 with 32 GB HBM2
-
NVLINK bandwidth 150 GB/s between GPUs and host
-
NVLINK bandwidth 150 GB/s between GPUs and host
-
Hostnames:
`taurusml[1-32]`
->
`ml[1-29].power9.hpc.tu-dresden.de`
-
Hostnames:
`taurusml[1-32]`
-
Login nodes:
`login[1-2].power9.hpc.tu-dresden.de`
`
-
Slurm partition:
`ml`
## Island 6 - Intel Haswell CPUs
-
612 nodes, each with
-
2 x Intel(R) Xeon(R) CPU E5-2680 v3 (12 cores) @ 2.50 GHz, Multithreading disabled
-
128 GB local memory on SSD
-
Varying amounts of main memory (selected automatically by the batch system for you according to
your job requirements)
*
594 nodes with 2.67 GB RAM per core (64 GB in total):
`taurusi[6001-6540,6559-6612]`
-
18 nodes with 10.67 GB RAM per core (256 GB in total):
`taurusi[6541-6558]`
-
Hostnames:
`taurusi[6001-6612]`
-
Slurm Partition:
`haswell`
??? hint "Node topology"

{: align=center}
## Island 2 Phase 2 - Intel Haswell CPUs + NVIDIA K80 GPUs
-
64 nodes, each with
-
2 x Intel(R) Xeon(R) CPU E5-E5-2680 v3 (12 cores) @ 2.50 GHz, Multithreading disabled
-
64 GB RAM (2.67 GB per core)
-
128 GB local memory on SSD
-
4 x NVIDIA Tesla K80 (12 GB GDDR RAM) GPUs
-
Hostnames:
`taurusi[2045-2108]`
-
Slurm Partition:
`gpu2`
-
Node topology, same as
[
island 4 - 6
](
#island-6-intel-haswell-cpus
)
## SMP Nodes - up to 2 TB RAM
-
5 Nodes, each with
-
4 x Intel(R) Xeon(R) CPU E7-4850 v3 (14 cores) @ 2.20 GHz, Multithreading disabled
-
2 TB RAM
-
Hostnames:
`taurussmp[3-7]`
-
Slurm partition:
`smp2`
??? hint "Node topology"

{: align=center}
This diff is collapsed.
Click to expand it.
Preview
0%
Loading
Try again
or
attach a new file
.
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Save comment
Cancel
Please
register
or
sign in
to comment