Update 7 files

- /doc.zih.tu-dresden.de/docs/jobs_and_resources/barnard.md - /doc.zih.tu-dresden.de/docs/jobs_and_resources/alpha_centauri.md - /doc.zih.tu-dresden.de/docs/jobs_and_resources/julia.md - /doc.zih.tu-dresden.de/docs/jobs_and_resources/power9.md - /doc.zih.tu-dresden.de/docs/jobs_and_resources/romeo.md - /doc.zih.tu-dresden.de/docs/jobs_and_resources/capella.md - /doc.zih.tu-dresden.de/docs/jobs_and_resources/hardware_overview.md

Update 7 files
- /doc.zih.tu-dresden.de/docs/jobs_and_resources/barnard.md - /doc.zih.tu-dresden.de/docs/jobs_and_resources/alpha_centauri.md - /doc.zih.tu-dresden.de/docs/jobs_and_resources/julia.md - /doc.zih.tu-dresden.de/docs/jobs_and_resources/power9.md - /doc.zih.tu-dresden.de/docs/jobs_and_resources/romeo.md - /doc.zih.tu-dresden.de/docs/jobs_and_resources/capella.md - /doc.zih.tu-dresden.de/docs/jobs_and_resources/hardware_overview.md
1ea3278e · Natalie Breidenbach · Sebastian Döbel · 22d5bb38 · 1ea3278e · 1ea3278e
Commit 1ea3278e authored 4 months ago by Natalie Breidenbach Committed by Sebastian Döbel 4 months ago
--- a/doc.zih.tu-dresden.de/docs/jobs_and_resources/alpha_centauri.md
+++ b/doc.zih.tu-dresden.de/docs/jobs_and_resources/alpha_centauri.md
 # GPU Cluster Alpha Centauri

+## Overview
 The multi-GPU cluster `Alpha Centauri` has been installed for AI-related computations (ScaDS.AI).

-The hardware specification is documented on the page
-[HPC Resources](hardware_overview.md#alpha-centauri).
+## Details
+
+- 34 nodes, each with
+    - 8 x NVIDIA A100-SXM4 Tensor Core-GPUs
+    - 2 x AMD EPYC CPU 7352 (24 cores) @ 2.3 GHz, Multithreading available
+    - 1 TB RAM (16 x 32 GB DDR4-2933 MT/s per socket)
+    - 3.5 TB local storage on NVMe device at `/tmp`
+- Login nodes: `login[1-2].alpha.hpc.tu-dresden.de`
+- Hostnames: `i[8001-8037].alpha.hpc.tu-dresden.de`
+- Operating system: Rocky Linux 8.9
+- Further information on the usage is documented below

 ## Filesystems


--- a/doc.zih.tu-dresden.de/docs/jobs_and_resources/barnard.md
+++ b/doc.zih.tu-dresden.de/docs/jobs_and_resources/barnard.md
@@ -6,10 +6,19 @@ In 2023, Taurus was replaced by the cluster Barnard. Barnard is a general purpos
 and is based on Intel Sapphire Rapids CPUs.
 The cluster consists of 630 nodes with particular [hardware specifications](hardware_overview#barnard).

-## Usage
-Barnard has four [login nodes](hardware_overview#barnard).
-The [filesystems](../data_lifecycle/file_systems.md)
-(`/home`, `/software`, `/data/horse`, `/data/walrus`, etc.) are available.
+## Details
+
+- 630 nodes, each with
+    - 2 x Intel Xeon Platinum 8470 (52 cores) @ 2.00 GHz, Multithreading enabled
+    - 512 GB RAM (8 x 32 GB DDR5-4800 MT/s per socket)
+    - 12 nodes provide 1.8 TB local storage on NVMe device at `/tmp`
+    - All other nodes are diskless and have no or very limited local storage (i.e. `/tmp`)
+- Login nodes: `login[1-4].barnard.hpc.tu-dresden.de`
+- Hostnames: `n[1001-1630].barnard.hpc.tu-dresden.de`
+- Operating system: Red Hat Enterpise Linux 8.9
+
+The [filesystems](../data_lifecycle/file_systems.md) are available on `barnard`
+(`/home`, `/software`, `/data/horse`, `/data/walrus`, etc.).

 ```console
 marie@login.barnard$ module spider <module_name>

--- a/doc.zih.tu-dresden.de/docs/jobs_and_resources/capella.md
+++ b/doc.zih.tu-dresden.de/docs/jobs_and_resources/capella.md
 # GPU Cluster Capella

+## Overview
+
 The multi-GPU cluster `Capella` has been installed for AI-related computations and traditional
 HPC simulations.

-The hardware specification is documented on the page
-[HPC Resources](hardware_overview.md#capella).
+## Details

-Capelle has two [login nodes](hardware_overview.md#capella)
+- 144 nodes, each with
+    - 4 x NVIDIA H100-SXM5 Tensor Core-GPUs
+    - 2 x AMD EPYC CPU 9334 (32 cores) @ 2.7 GHz, Multithreading disabled
+    - 768 GB RAM (12 x 32 GB DDR5-4800 MT/s per socket)
+    - 800 GB local storage on NVMe device at `/tmp`
+- Login nodes: `login[1-2].capella.hpc.tu-dresden.de`
+- Hostnames: `c[1-144].capella.hpc.tu-dresden.de`
+- Operating system: Alma Linux 9.4

 ## Filesystems


--- a/doc.zih.tu-dresden.de/docs/jobs_and_resources/hardware_overview.md
+++ b/doc.zih.tu-dresden.de/docs/jobs_and_resources/hardware_overview.md
@@ -8,20 +8,6 @@ analytics, and artificial intelligence methods with extensive capabilities for e
 and performance monitoring provides ideal conditions to achieve the ambitious research goals of the
 users and the ZIH.

-HPC resources at ZIH comprise a total of **six systems**:
-
-| Name                                | Description           | Year of Installation | DNS |
-| ----------------------------------- | ----------------------| -------------------- | --- |
-| [`Capella`](#capella)               | GPU cluster           | 2024                 | `c[1-144].capella.hpc.tu-dresden.de` |
-| [`Barnard`](#barnard)               | CPU cluster           | 2023                 | `n[1001-1630].barnard.hpc.tu-dresden.de` |
-| [`Alpha Centauri`](#alpha-centauri) | GPU cluster           | 2021                 | `i[8001-8037].alpha.hpc.tu-dresden.de` |
-| [`Julia`](#julia)                   | Single SMP system     | 2021                 | `julia.hpc.tu-dresden.de` |
-| [`Romeo`](#romeo)                   | CPU cluster           | 2020                 | `i[8001-8190].romeo.hpc.tu-dresden.de` |
-| [`Power9`](#power9)                 | IBM Power/GPU cluster | 2018                 | `ml[1-29].power9.hpc.tu-dresden.de` |
-
-All clusters will run with their own [Slurm batch system](slurm.md) and job submission is possible
-only from their respective login nodes.
-
 ## Architectural Design

 Over the last decade we have been running our HPC system of high heterogeneity with a single
@@ -38,10 +24,25 @@ permanent filesystems on the page [Filesystems](../data_lifecycle/file_systems.m
 ![Architecture overview 2023](../jobs_and_resources/misc/architecture_2024.png)
 {: align=center}

+HPC resources at ZIH comprise a total of the **six systems**:
+
+| Name                                | Description           | Year of Installation | DNS |
+| ----------------------------------- | ----------------------| -------------------- | --- |
+| [`Capella`](capella.md)               | GPU cluster           | 2024                 | `c[1-144].capella.hpc.tu-dresden.de` |
+| [`Barnard`](barnard.md)               | CPU cluster           | 2023                 | `n[1001-1630].barnard.hpc.tu-dresden.de` |
+| [`Alpha Centauri`](alpha-centauri.md) | GPU cluster           | 2021                 | `i[8001-8037].alpha.hpc.tu-dresden.de` |
+| [`Julia`](julia.md)                   | Single SMP system     | 2021                 | `julia.hpc.tu-dresden.de` |
+| [`Romeo`](romeo.md)                   | CPU cluster           | 2020                 | `i[8001-8190].romeo.hpc.tu-dresden.de` |
+| [`Power9`](power9.md)                 | IBM Power/GPU cluster | 2018                 | `ml[1-29].power9.hpc.tu-dresden.de` |
+
+All clusters will run with their own [Slurm batch system](slurm.md) and job submission is possible
+only from their respective login nodes.
+
+
 ## Login and Dataport Nodes

 - Login-Nodes
-    - Individual for each cluster. See sections below.
+    - Individual for each cluster. See the specifics in each cluster chapter
 - 2 Data-Transfer-Nodes
    - 2 servers without interactive login, only available via file transfer protocols
      (`rsync`, `ftp`)
@@ -52,87 +53,34 @@ permanent filesystems on the page [Filesystems](../data_lifecycle/file_systems.m

 ## Barnard

-The cluster `Barnard` is a general purpose cluster by Bull. It is based on Intel Sapphire Rapids
-CPUs.
-
- 630 nodes, each with
-    - 2 x Intel Xeon Platinum 8470 (52 cores) @ 2.00 GHz, Multithreading enabled
-    - 512 GB RAM (8 x 32 GB DDR5-4800 MT/s per socket)
-    - 12 nodes provide 1.8 TB local storage on NVMe device at `/tmp`
-    - All other nodes are diskless and have no or very limited local storage (i.e. `/tmp`)
- Login nodes: `login[1-4].barnard.hpc.tu-dresden.de`
- Hostnames: `n[1001-1630].barnard.hpc.tu-dresden.de`
- Operating system: Red Hat Enterpise Linux 8.9
+The cluster `Barnard` is a general purpose cluster by Bull. It is based on Intel Sapphire Rapids CPUs.
+Further details in [`Barnard` Chapter](barnard.md).

 ## Alpha Centauri

 The cluster `Alpha Centauri` (short: `Alpha`) by NEC provides AMD Rome CPUs and NVIDIA A100 GPUs
 and is designed for AI and ML tasks.
-
- 34 nodes, each with
-    - 8 x NVIDIA A100-SXM4 Tensor Core-GPUs
-    - 2 x AMD EPYC CPU 7352 (24 cores) @ 2.3 GHz, Multithreading available
-    - 1 TB RAM (16 x 32 GB DDR4-2933 MT/s per socket)
-    - 3.5 TB local storage on NVMe device at `/tmp`
- Login nodes: `login[1-2].alpha.hpc.tu-dresden.de`
- Hostnames: `i[8001-8037].alpha.hpc.tu-dresden.de`
- Operating system: Rocky Linux 8.9
- Further information on the usage is documented on the site [GPU Cluster Alpha Centauri](alpha_centauri.md)
+Further details in [`Alpha Centauri` Chapter](alpha_centauri.md).

 ## Capella

 The cluster `Capella` by MEGWARE provides AMD Genoa CPUs and NVIDIA H100 GPUs
 and is designed for AI and ML tasks.
-
- 144 nodes, each with
-    - 4 x NVIDIA H100-SXM5 Tensor Core-GPUs
-    - 2 x AMD EPYC CPU 9334 (32 cores) @ 2.7 GHz, Multithreading disabled
-    - 768 GB RAM (12 x 32 GB DDR5-4800 MT/s per socket)
-    - 800 GB local storage on NVMe device at `/tmp`
- Login nodes: `login[1-2].capella.hpc.tu-dresden.de`
- Hostnames: `c[1-144].capella.hpc.tu-dresden.de`
- Operating system: Alma Linux 9.4
+Further details in [`Capella` Chapter](capella.md).

 ## Romeo

 The cluster `Romeo` is a general purpose cluster by NEC based on AMD Rome CPUs.
-
- 192 nodes, each with
-    - 2 x AMD EPYC CPU 7702 (64 cores) @ 2.0 GHz, Multithreading available
-    - 512 GB RAM (8 x 32 GB DDR4-3200 MT/s per socket)
-    - 200 GB local storage on SSD at `/tmp`
- Login nodes: `login[1-2].romeo.hpc.tu-dresden.de`
- Hostnames: `i[7001-7190].romeo.hpc.tu-dresden.de`
- Operating system: Rocky Linux 8.9
- Further information on the usage is documented on the site [CPU Cluster Romeo](romeo.md)
+Further details in [`Romeo` Chapter](romeo.md).

 ## Julia

 The cluster `Julia` is a large SMP (shared memory parallel) system by HPE based on Superdome Flex
 architecture.
-
- 1 node, with
-    - 32 x Intel(R) Xeon(R) Platinum 8276M CPU @ 2.20 GHz (28 cores)
-    - 47 TB RAM (12 x 128 GB DDR4-2933 MT/s per socket)
- Configured as one single node
- 48 TB RAM (usable: 47 TB - one TB is used for cache coherence protocols)
- 370 TB of fast NVME storage available at `/nvme/<projectname>`
- Login node: `julia.hpc.tu-dresden.de`
- Hostname: `julia.hpc.tu-dresden.de`
- Operating system: Rocky Linux 8.7
- Further information on the usage is documented on the site [SMP System Julia](julia.md)
+Further details in [`Julia` Chapter](julia.md).

 ## Power9

 The cluster `Power9` by IBM is based on Power9 CPUs and provides NVIDIA V100 GPUs.
 `Power9` is specifically designed for machine learning (ML) tasks.
-
- 32 nodes, each with
-    - 2 x IBM Power9 CPU (2.80 GHz, 3.10 GHz boost, 22 cores)
-    - 256 GB RAM (8 x 16 GB DDR4-2666 MT/s per socket)
-    - 6 x NVIDIA VOLTA V100 with 32 GB HBM2
-    - NVLINK bandwidth 150 GB/s between GPUs and host
- Login nodes: `login[1-2].power9.hpc.tu-dresden.de`
- Hostnames: `ml[1-29].power9.hpc.tu-dresden.de`
- Operating system: Alma Linux 8.7
- Further information on the usage is documented on the site [GPU Cluster Power9](power9.md)
+Further details in [`Power9` Chapter](power9.md).
--- a/doc.zih.tu-dresden.de/docs/jobs_and_resources/julia.md
+++ b/doc.zih.tu-dresden.de/docs/jobs_and_resources/julia.md
 # SMP Cluster Julia

+## Overview
+
 The HPE Superdome Flex is a large shared memory node. It is especially well suited for data
 intensive application scenarios, for example to process extremely large data sets completely in main
 memory or in very fast NVMe memory.

-## Hardware Resources
-
-The hardware specification is documented on the page
-[HPC Resources](hardware_overview.md#julia).
+## Details
+
+- 1 node, with
+    - 32 x Intel(R) Xeon(R) Platinum 8276M CPU @ 2.20 GHz (28 cores)
+    - 47 TB RAM (12 x 128 GB DDR4-2933 MT/s per socket)
+- Configured as one single node
+- 48 TB RAM (usable: 47 TB - one TB is used for cache coherence protocols)
+- 370 TB of fast NVME storage available at `/nvme/<projectname>`
+- Login node: `julia.hpc.tu-dresden.de`
+- Hostname: `julia.hpc.tu-dresden.de`
+- Operating system: Rocky Linux 8.7
+- Further information on the usage is documented below

 !!! note


--- a/doc.zih.tu-dresden.de/docs/jobs_and_resources/power9.md
+++ b/doc.zih.tu-dresden.de/docs/jobs_and_resources/power9.md
 # GPU Cluster Power9

+## Overview
+
 The multi-GPU cluster `Power9` was installed in 2018. Until the end of 2023, it was available as
 partition `power` within the now decommissioned `Taurus` system. With the decommission of `Taurus`,
 `Power9` has been re-engineered and is now a homogeneous, standalone cluster with own
 [Slurm batch system](slurm.md) and own login nodes.

-## Hardware Resources
+## Details
+
+- 32 nodes, each with
+    - 2 x IBM Power9 CPU (2.80 GHz, 3.10 GHz boost, 22 cores)
+    - 256 GB RAM (8 x 16 GB DDR4-2666 MT/s per socket)
+    - 6 x NVIDIA VOLTA V100 with 32 GB HBM2
+    - NVLINK bandwidth 150 GB/s between GPUs and host
+- Login nodes: `login[1-2].power9.hpc.tu-dresden.de`
+- Hostnames: `ml[1-29].power9.hpc.tu-dresden.de`
+- Operating system: Alma Linux 8.7
+- Further information on the usage is documented below

-The hardware specification is documented on the page [HPC Resources](hardware_overview.md#power9).

 ## Usage

 If you want to use containers on `Power9`, please refer to the page
 [Singularity for Power9 Architecuture](../software/singularity_power9.md).

-The compute nodes of the cluster `power` are built on the base of
+The compute nodes of the cluster `power9` are built on the base of
 [Power9 architecture](https://www.ibm.com/it-infrastructure/power/power9) from IBM. The system was created
 for AI challenges, analytics and working with data-intensive workloads and accelerated databases.

 The main feature of the nodes is the ability to work with the
 [NVIDIA Tesla V100](https://www.nvidia.com/en-gb/data-center/tesla-v100/) GPU with **NV-Link**
 support that allows a total bandwidth with up to 300 GB/s. Each node on the
-cluster `power` has 6x Tesla V-100 GPUs. You can find a detailed specification of the cluster in our
+cluster `power9` has 6x Tesla V-100 GPUs. You can find a detailed specification of the cluster in our
 [Power9 documentation](../jobs_and_resources/hardware_overview.md).



--- a/doc.zih.tu-dresden.de/docs/jobs_and_resources/romeo.md
+++ b/doc.zih.tu-dresden.de/docs/jobs_and_resources/romeo.md
@@ -7,10 +7,16 @@ of 2023, it was available as partition `romeo` within `Taurus`. With the decommi
 `Romeo` has been re-engineered and is now a homogeneous, standalone cluster with own
 [Slurm batch system](slurm.md) and own login nodes.

-## Hardware Resources
-
-The hardware specification is documented on the page
-[HPC Resources](hardware_overview.md#romeo).
+## Details
+
+- 192 nodes, each with
+    - 2 x AMD EPYC CPU 7702 (64 cores) @ 2.0 GHz, Multithreading available
+    - 512 GB RAM (8 x 32 GB DDR4-3200 MT/s per socket)
+    - 200 GB local storage on SSD at `/tmp`
+- Login nodes: `login[1-2].romeo.hpc.tu-dresden.de`
+- Hostnames: `i[7001-7190].romeo.hpc.tu-dresden.de`
+- Operating system: Rocky Linux 8.9
+- Further information on the usage is documented below

 ## Usage