update

ebca616a · Ulf Markwardt · 8efb9f00 · ebca616a
Commit ebca616a authored 1 year ago by Ulf Markwardt
--- a/doc.zih.tu-dresden.de/docs/jobs_and_resources/hardware_overview.md
+++ b/doc.zih.tu-dresden.de/docs/jobs_and_resources/hardware_overview.md
 # HPC Resources
-The architecture specifically tailored to data-intensive computing, Big Data
+HPC resources in ZIH systems comprise the *High Performance Computing and Storage Complex* and its
-analytics, and artificial intelligence methods with extensive capabilities
+extension *High Performance Computing – Data Analytics*. In total it offers scientists
-for performance monitoring provides ideal conditions to achieve the ambitious
+about 60,000 CPU cores and a peak performance of more than 1.5 quadrillion floating point
-research goals of the users and the ZIH.
+operations per second. The architecture specifically tailored to data-intensive computing, Big Data
+analytics, and artificial intelligence methods with extensive capabilities for energy measurement
-## Overview
+and performance monitoring provides ideal conditions to achieve the ambitious research goals of the
+users and the ZIH.
-From the users' pespective, there are seperate clusters, all of them with their subdomains:
+## Login and Export Nodes
-| Name | Description | Year| DNS |
-| --- | --- | --- | --- |
+- 4 Login-Nodes `tauruslogin[3-6].hrsk.tu-dresden.de`
-| **Barnard** | CPU cluster |2023| n[1001-1630].barnard.hpc.tu-dresden.de |
+    - Each login node is equipped with 2x Intel(R) Xeon(R) CPU E5-2680 v3 with 24 cores in total @
-| **Romeo** | CPU cluster |2020|i[8001-8190].romeo.hpc.tu-dresden.de |
+      2.50 GHz, Multithreading disabled, 64 GB RAM, 128 GB SSD local disk
-| **Alpha Centauri** | GPU cluster |2021|i[8001-8037].alpha.hpc.tu-dresden.de |
+    - IPs: 141.30.73.\[102-105\]
-| **Julia** | single SMP system |2021|smp8.julia.hpc.tu-dresden.de |
+- 2 Data-Transfer-Nodes `taurusexport[3-4].hrsk.tu-dresden.de`
-| **Power** | IBM Power/GPU system |2018|ml[1-29].power9.hpc.tu-dresden.de |
+    - DNS Alias `taurusexport.hrsk.tu-dresden.de`
+    - 2 Servers without interactive login, only available via file transfer protocols
-They run with their own Slurm batch system. Job submission is possible only from
+      (`rsync`, `ftp`)
-their respective login nodes.
+    - IPs: 141.30.73.\[82,83\]
+    - Further information on the usage is documented on the site
-All clusters have access to these shared parallel file systems:
+      [Export Nodes](../data_transfer/export_nodes.md)
-| File system | Usable directory | Type | Capacity | Purpose |
-| --- | --- | --- | --- | --- |
-| Home | `/home` | Lustre | quota per user: 20 GB | permanant user data |
-| Project | `/projects` | Lustre | quota per project | permanant project data |
-| Scratch for large data / streaming | `/data/horse` | Lustre | 20 PB | h
-| Scratch for random access | `/data/rabbit` | Lustre | 2 PB |
-These mount points are planned (September 2023):
-| Scratch for random access | `/data/weasel` | WEKA | 232 TB |
-| Scratch for random access | `/data/squirrel` | BeeGFS | xxx TB |
-## Barnard - Intel Sapphire Rapids CPUs
- 630 diskless nodes, each with
-    - 2 x Intel(R) Xeon(R) CPU E5-2680 v3 (52 cores) @ 2.50 GHz, Multithreading enabled
-    - 512 GB RAM
- Hostnames: `n1[001-630].barnard.hpc.tu-dresden.de`
- Login nodes: `login[1-4].barnard.hpc.tu-dresden.de`
 ## AMD Rome CPUs + NVIDIA A100
@@ -48,8 +29,8 @@ These mount points are planned (September 2023):
    - 2 x AMD EPYC CPU 7352 (24 cores) @ 2.3 GHz, Multithreading available
    - 1 TB RAM
    - 3.5 TB local memory on NVMe device at `/tmp`
- Hostnames: `taurusi[8001-8034]`  -> `i[8001-8037].alpha.hpc.tu-dresden.de`
+- Hostnames: `taurusi[8001-8034]`
- Login nodes: `login[1-2].alpha.hpc.tu-dresden.de`
+- Slurm partition: `alpha`
 - Further information on the usage is documented on the site [Alpha Centauri Nodes](alpha_centauri.md)
 ## Island 7 - AMD Rome CPUs
@@ -58,8 +39,8 @@ These mount points are planned (September 2023):
    - 2 x AMD EPYC CPU 7702 (64 cores) @ 2.0 GHz, Multithreading available
    - 512 GB RAM
    - 200 GB local memory on SSD at `/tmp`
- Hostnames: `taurusi[7001-7192]` -> `i[7001-7190].romeo.hpc.tu-dresden.de`
+- Hostnames: `taurusi[7001-7192]`
- Login nodes: `login[1-2].romeo.hpc.tu-dresden.de`
+- Slurm partition: `romeo`
 - Further information on the usage is documented on the site [AMD Rome Nodes](rome_nodes.md)
 ## Large SMP System HPE Superdome Flex
@@ -70,7 +51,8 @@ These mount points are planned (September 2023):
 - Configured as one single node
 - 48 TB RAM (usable: 47 TB - one TB is used for cache coherence protocols)
 - 370 TB of fast NVME storage available at `/nvme/<projectname>`
- Hostname: `taurussmp8` -> `smp8.julia.hpc.tu-dresden.de`
+- Hostname: `taurussmp8`
+- Slurm partition: `julia`
 - Further information on the usage is documented on the site [HPE Superdome Flex](sd_flex.md)
 ## IBM Power9 Nodes for Machine Learning
@@ -82,5 +64,46 @@ For machine learning, we have IBM AC922 nodes installed with this configuration:
    - 256 GB RAM DDR4 2666 MHz
    - 6 x NVIDIA VOLTA V100 with 32 GB HBM2
    - NVLINK bandwidth 150 GB/s between GPUs and host
- Hostnames: `taurusml[1-32]` -> `ml[1-29].power9.hpc.tu-dresden.de`
+- Hostnames: `taurusml[1-32]`
- Login nodes: `login[1-2].power9.hpc.tu-dresden.de``
+- Slurm partition: `ml`
+## Island 6 - Intel Haswell CPUs
+- 612 nodes, each with
+    - 2 x Intel(R) Xeon(R) CPU E5-2680 v3 (12 cores) @ 2.50 GHz, Multithreading disabled
+    - 128 GB local memory on SSD
+- Varying amounts of main memory (selected automatically by the batch system for you according to
+  your job requirements)
+  * 594 nodes with 2.67 GB RAM per core (64 GB in total): `taurusi[6001-6540,6559-6612]`
+    - 18 nodes with 10.67 GB RAM per core (256 GB in total): `taurusi[6541-6558]`
+- Hostnames: `taurusi[6001-6612]`
+- Slurm Partition: `haswell`
+??? hint "Node topology"
+    ![Node topology](misc/i4000.png)
+    {: align=center}
+## Island 2 Phase 2 - Intel Haswell CPUs + NVIDIA K80 GPUs
+- 64 nodes, each with
+    - 2 x Intel(R) Xeon(R) CPU E5-E5-2680 v3 (12 cores) @ 2.50 GHz, Multithreading disabled
+    - 64 GB RAM (2.67 GB per core)
+    - 128 GB local memory on SSD
+    - 4 x NVIDIA Tesla K80 (12 GB GDDR RAM) GPUs
+- Hostnames: `taurusi[2045-2108]`
+- Slurm Partition: `gpu2`
+- Node topology, same as [island 4 - 6](#island-6-intel-haswell-cpus)
+## SMP Nodes - up to 2 TB RAM
+- 5 Nodes, each with
+    - 4 x Intel(R) Xeon(R) CPU E7-4850 v3 (14 cores) @ 2.20 GHz, Multithreading disabled
+    - 2 TB RAM
+- Hostnames: `taurussmp[3-7]`
+- Slurm partition: `smp2`
+??? hint "Node topology"
+    ![Node topology](misc/smp2.png)
+    {: align=center}