Move hardware info of clusters back to hardware_overview

c44dfd46 · Sebastian Döbel · dfb71a70 · c44dfd46 · c44dfd46 · c44dfd46
Commit c44dfd46 authored 5 months ago by Sebastian Döbel
--- a/doc.zih.tu-dresden.de/docs/access/key_fingerprints.md
+++ b/doc.zih.tu-dresden.de/docs/access/key_fingerprints.md
@@ -40,7 +40,7 @@ approved again.
 ## Barnard
-The cluster [`Barnard`](../jobs_and_resources/barnard.md) can be accessed via the
+The cluster [`Barnard`](../jobs_and_resources/hardware_overview.md#barnard) can be accessed via the
 four login nodes `login[1-4].barnard.hpc.tu-dresden.de`. (Please choose one concrete login node when
 connecting, see example below.)
@@ -56,7 +56,7 @@ connecting, see example below.)
 ## Romeo
-The cluster [`Romeo`](../jobs_and_resources/romeo.md) can be accessed via the two
+The cluster [`Romeo`](../jobs_and_resources/hardware_overview.md#romeo) can be accessed via the two
 login nodes `login[1-2].romeo.hpc.tu-dresden.de`. (Please choose one concrete login node when
 connecting, see example below.)
@@ -72,7 +72,7 @@ connecting, see example below.)
 ## Capella
-The cluster [`Capella`](../jobs_and_resources/capella.md) can be accessed via the two
+The cluster [`Capella`](../jobs_and_resources/hardware_overview.md#capella) can be accessed via the two
 login nodes `login[1-2].capella.hpc.tu-dresden.de`. (Please choose one concrete login node when
 connecting, see example below.)
@@ -88,9 +88,9 @@ connecting, see example below.)
 ## Alpha Centauri
-The cluster [`Alpha Centauri`](../jobs_and_resources/alpha_centauri.md) can be accessed via the two
+The cluster [`Alpha Centauri`](../jobs_and_resources/hardware_overview.md#alpha_centauri) can be
-login nodes `login[1-2].alpha.hpc.tu-dresden.de`. (Please choose one concrete login node when
+accessed via the two login nodes `login[1-2].alpha.hpc.tu-dresden.de`. (Please choose one concrete
-connecting, see example below.)
+login node when connecting, see example below.)
 | Key Type | Fingerprint                                           |
 |:---------|:------------------------------------------------------|
@@ -104,7 +104,7 @@ connecting, see example below.)
 ## Julia
-The cluster [`Julia`](../jobs_and_resources/julia.md) can be accessed via `julia.hpc.tu-dresden.de`.
+The cluster [`Julia`](../jobs_and_resources/hardware_overview.md#julia) can be accessed via `julia.hpc.tu-dresden.de`.
 (Note, there is no separate login node.)
 | Key Type | Fingerprint                                           |

--- a/doc.zih.tu-dresden.de/docs/jobs_and_resources/alpha_centauri.md
+++ b/doc.zih.tu-dresden.de/docs/jobs_and_resources/alpha_centauri.md
@@ -4,18 +4,6 @@
 The multi-GPU cluster `Alpha Centauri` has been installed for AI-related computations (ScaDS.AI).
-## Details
- 34 nodes, each with
-    - 8 x NVIDIA A100-SXM4 Tensor Core-GPUs
-    - 2 x AMD EPYC CPU 7352 (24 cores) @ 2.3 GHz, Multithreading available
-    - 1 TB RAM (16 x 32 GB DDR4-2933 MT/s per socket)
-    - 3.5 TB local storage on NVMe device at `/tmp`
- Login nodes: `login[1-2].alpha.hpc.tu-dresden.de`
- Hostnames: `i[8001-8037].alpha.hpc.tu-dresden.de`
- Operating system: Rocky Linux 8.9
- Further information on the usage is documented below
 ## Filesystems
 Since 5th July 2024, `Alpha Centauri` is fully integrated in the InfiniBand infrastructure of

--- a/doc.zih.tu-dresden.de/docs/jobs_and_resources/capella.md
+++ b/doc.zih.tu-dresden.de/docs/jobs_and_resources/capella.md
@@ -6,6 +6,11 @@ The multi-GPU cluster `Capella` has been installed for AI-related computations a
 HPC simulations. Capella is fully integrated into the ZIH HPC infrastructure.
 Therefore, the usage should be similar to the other clusters.
+## Hardware Resources
+The hardware specification is documented on the page
+[HPC Resources](hardware_overview.md#capella).
 ## Login
 You use `login[1-2].capella.hpc.tu-dresden.de` to access the system from campus (or VPN).

--- a/doc.zih.tu-dresden.de/docs/jobs_and_resources/hardware_overview.md
+++ b/doc.zih.tu-dresden.de/docs/jobs_and_resources/hardware_overview.md
@@ -30,7 +30,7 @@ HPC resources at ZIH comprise a total of the **six systems**:
 | ----------------------------------- | ----------------------| -------------------- | --- |
 | [`Capella`](capella.md)               | GPU cluster           | 2024                 | `c[1-144].capella.hpc.tu-dresden.de` |
 | [`Barnard`](barnard.md)               | CPU cluster           | 2023                 | `n[1001-1630].barnard.hpc.tu-dresden.de` |
-| [`Alpha Centauri`](alpha-centauri.md) | GPU cluster           | 2021                 | `i[8001-8037].alpha.hpc.tu-dresden.de` |
+| [`Alpha Centauri`](alpha_centauri.md) | GPU cluster           | 2021                 | `i[8001-8037].alpha.hpc.tu-dresden.de` |
 | [`Julia`](julia.md)                   | Single SMP system     | 2021                 | `julia.hpc.tu-dresden.de` |
 | [`Romeo`](romeo.md)                   | CPU cluster           | 2020                 | `i[8001-8190].romeo.hpc.tu-dresden.de` |
 | [`Power9`](power9.md)                 | IBM Power/GPU cluster | 2018                 | `ml[1-29].power9.hpc.tu-dresden.de` |
@@ -53,33 +53,86 @@ only from their respective login nodes.
 ## Barnard
 The cluster `Barnard` is a general purpose cluster by Bull. It is based on Intel Sapphire Rapids CPUs.
-Further details in [`Barnard` Chapter](barnard.md).
+- 630 nodes, each with
+    - 2 x Intel Xeon Platinum 8470 (52 cores) @ 2.00 GHz, Multithreading enabled
+    - 512 GB RAM (8 x 32 GB DDR5-4800 MT/s per socket)
+    - 12 nodes provide 1.8 TB local storage on NVMe device at `/tmp`
+    - All other nodes are diskless and have no or very limited local storage (i.e. `/tmp`)
+- Login nodes: `login[1-4].barnard.hpc.tu-dresden.de`
+- Hostnames: `n[1001-1630].barnard.hpc.tu-dresden.de`
+- Operating system: Red Hat Enterpise Linux 8.9
 ## Alpha Centauri
 The cluster `Alpha Centauri` (short: `Alpha`) by NEC provides AMD Rome CPUs and NVIDIA A100 GPUs
 and is designed for AI and ML tasks.
-Further details in [`Alpha Centauri` Chapter](alpha_centauri.md).
+- 37 nodes, each with
+    - 8 x NVIDIA A100-SXM4 Tensor Core-GPUs
+    - 2 x AMD EPYC CPU 7352 (24 cores) @ 2.3 GHz, Multithreading available
+    - 1 TB RAM (16 x 32 GB DDR4-2933 MT/s per socket)
+    - 3.5 TB local storage on NVMe device at `/tmp`
+- Login nodes: `login[1-2].alpha.hpc.tu-dresden.de`
+- Hostnames: `i[8001-8037].alpha.hpc.tu-dresden.de`
+- Operating system: Rocky Linux 8.9
+- Further information on the usage is documented on the site [GPU Cluster Alpha Centauri](alpha_centauri.md)
 ## Capella
 The cluster `Capella` by MEGWARE provides AMD Genoa CPUs and NVIDIA H100 GPUs
 and is designed for AI and ML tasks.
-Further details in [`Capella` Chapter](capella.md).
+- 144 nodes, each with
+    - 4 x NVIDIA H100-SXM5 Tensor Core-GPUs
+    - 2 x AMD EPYC CPU 9334 (32 cores) @ 2.7 GHz, Multithreading disabled
+    - 768 GB RAM (12 x 32 GB DDR5-4800 MT/s per socket)
+    - 800 GB local storage on NVMe device at `/tmp`
+- Login nodes: `login[1-2].capella.hpc.tu-dresden.de`
+- Hostnames: `c[1-144].capella.hpc.tu-dresden.de`
+- Operating system: Alma Linux 9.4
+- Further information on the usage is documented on the site [GPU Cluster Capella](capella.md)
 ## Romeo
 The cluster `Romeo` is a general purpose cluster by NEC based on AMD Rome CPUs.
-Further details in [`Romeo` Chapter](romeo.md).
+- 188 nodes, each with
+    - 2 x AMD EPYC CPU 7702 (64 cores) @ 2.0 GHz, Multithreading available
+    - 512 GB RAM (8 x 32 GB DDR4-3200 MT/s per socket)
+    - 200 GB local storage on SSD at `/tmp`
+- Login nodes: `login[1-2].romeo.hpc.tu-dresden.de`
+- Hostnames: `i[7001-7186].romeo.hpc.tu-dresden.de`
+- Operating system: Rocky Linux 8.9
+- Further information on the usage is documented on the site [CPU Cluster Romeo](romeo.md)
 ## Julia
 The cluster `Julia` is a large SMP (shared memory parallel) system by HPE based on Superdome Flex
 architecture.
-Further details in [`Julia` Chapter](julia.md).
+- 1 node, with
+    - 32 x Intel(R) Xeon(R) Platinum 8276M CPU @ 2.20 GHz (28 cores)
+    - 47 TB RAM (12 x 128 GB DDR4-2933 MT/s per socket)
+- Configured as one single node
+- 48 TB RAM (usable: 47 TB - one TB is used for cache coherence protocols)
+- 370 TB of fast NVME storage available at `/nvme/<projectname>`
+- Login node: `julia.hpc.tu-dresden.de`
+- Hostname: `julia.hpc.tu-dresden.de`
+- Operating system: Rocky Linux 8.7
+- Further information on the usage is documented on the site [SMP System Julia](julia.md)
 ## Power9
 The cluster `Power9` by IBM is based on Power9 CPUs and provides NVIDIA V100 GPUs.
 `Power9` is specifically designed for machine learning (ML) tasks.
-Further details in [`Power9` Chapter](power9.md).
+- 32 nodes, each with
+    - 2 x IBM Power9 CPU (2.80 GHz, 3.10 GHz boost, 22 cores)
+    - 256 GB RAM (8 x 16 GB DDR4-2666 MT/s per socket)
+    - 6 x NVIDIA VOLTA V100 with 32 GB HBM2
+    - NVLINK bandwidth 150 GB/s between GPUs and host
+- Login nodes: `login[1-2].power9.hpc.tu-dresden.de`
+- Hostnames: `ml[1-29].power9.hpc.tu-dresden.de`
+- Operating system: Alma Linux 8.7
+- Further information on the usage is documented on the site [GPU Cluster Power9](power9.md)
--- a/doc.zih.tu-dresden.de/docs/jobs_and_resources/julia.md
+++ b/doc.zih.tu-dresden.de/docs/jobs_and_resources/julia.md
@@ -6,18 +6,10 @@ The HPE Superdome Flex is a large shared memory node. It is especially well suit
 intensive application scenarios, for example to process extremely large data sets completely in main
 memory or in very fast NVMe memory.
-## Details
+## Hardware Resources
- 1 node, with
+The hardware specification is documented on the page
-    - 32 x Intel(R) Xeon(R) Platinum 8276M CPU @ 2.20 GHz (28 cores)
+[HPC Resources](hardware_overview.md#julia).
-    - 47 TB RAM (12 x 128 GB DDR4-2933 MT/s per socket)
- Configured as one single node
- 48 TB RAM (usable: 47 TB - one TB is used for cache coherence protocols)
- 370 TB of fast NVME storage available at `/nvme/<projectname>`
- Login node: `julia.hpc.tu-dresden.de`
- Hostname: `julia.hpc.tu-dresden.de`
- Operating system: Rocky Linux 8.7
- Further information on the usage is documented below
 !!! note

--- a/doc.zih.tu-dresden.de/docs/jobs_and_resources/power9.md
+++ b/doc.zih.tu-dresden.de/docs/jobs_and_resources/power9.md
@@ -7,17 +7,9 @@ partition `power` within the now decommissioned `Taurus` system. With the decomm
 `Power9` has been re-engineered and is now a homogeneous, standalone cluster with own
 [Slurm batch system](slurm.md) and own login nodes.
-## Details
+## Hardware Resources
- 32 nodes, each with
+The hardware specification is documented on the page [HPC Resources](hardware_overview.md#power9).
-    - 2 x IBM Power9 CPU (2.80 GHz, 3.10 GHz boost, 22 cores)
-    - 256 GB RAM (8 x 16 GB DDR4-2666 MT/s per socket)
-    - 6 x NVIDIA Tesla V100 with 32 GB HBM2
-    - NVLINK bandwidth 150 GB/s between GPUs and host
- Login nodes: `login[1-2].power9.hpc.tu-dresden.de`
- Hostnames: `ml[1-29].power9.hpc.tu-dresden.de`
- Operating system: Alma Linux 8.7
- Further information on the usage is documented below
 ## Usage

--- a/doc.zih.tu-dresden.de/docs/jobs_and_resources/romeo.md
+++ b/doc.zih.tu-dresden.de/docs/jobs_and_resources/romeo.md
@@ -7,16 +7,10 @@ of 2023, it was available as partition `romeo` within `Taurus`. With the decommi
 `Romeo` has been re-engineered and is now a homogeneous, standalone cluster with own
 [Slurm batch system](slurm.md) and own login nodes.
-## Details
+## Hardware Resources
- 192 nodes, each with
+The hardware specification is documented on the page
-    - 2 x AMD EPYC CPU 7702 (64 cores) @ 2.0 GHz, Multithreading available
+[HPC Resources](hardware_overview.md#romeo).
-    - 512 GB RAM (8 x 32 GB DDR4-3200 MT/s per socket)
-    - 200 GB local storage on SSD at `/tmp`
- Login nodes: `login[1-2].romeo.hpc.tu-dresden.de`
- Hostnames: `i[7001-7190].romeo.hpc.tu-dresden.de`
- Operating system: Rocky Linux 8.9
- Further information on the usage is documented below
 ## Usage

--- a/doc.zih.tu-dresden.de/docs/jobs_and_resources/slurm_examples.md
+++ b/doc.zih.tu-dresden.de/docs/jobs_and_resources/slurm_examples.md
@@ -109,7 +109,7 @@ But, do you need to request tasks or CPUs from Slurm in order to provide resourc
 Slurm will allocate one or many GPUs for your job if requested.
 Please note that GPUs are only available in the GPU clusters, like
-[Alpha Centauri](hardware_overview.md#alpha-centauri), [Capella](hardware_overview#capella) and
+[Alpha Centauri](hardware_overview.md#alpha-centauri), [Capella](hardware_overview.md#capella) and
 [Power9](hardware_overview.md#power9).
 The option for `sbatch/srun` in this case is `--gres=gpu:[NUM_PER_NODE]`,
 where `NUM_PER_NODE` is the number of GPUs **per node** that will be used for the job.

--- a/doc.zih.tu-dresden.de/docs/jobs_and_resources/slurm_limits.md
+++ b/doc.zih.tu-dresden.de/docs/jobs_and_resources/slurm_limits.md
@@ -72,16 +72,16 @@ The physical installed memory might differ from the amount available for Slurm j
 so-called diskless compute nodes, i.e., nodes without additional local drive. At these nodes, the
 operating system and other components reside in the main memory, lowering the available memory for
 jobs. The reserved amount of memory for the system operation might vary slightly over time. The
-following table depicts the resource limits for [all our HPC systems](hardware_overview.md).
+following table depics the resource limits for [all our HPC systems](hardware_overview.md).
 | HPC System | Nodes | # Nodes | Cores per Node | Threads per Core | Memory per Node [in MB] | Memory per (SMT) Core [in MB] | GPUs per Node | Cores per GPU | Job Max Time [in days] |
 |:-----------|:------|--------:|---------------:|-----------------:|------------------------:|------------------------------:|--------------:|--------------:|-------------:|
-| [`Barnard`](barnard.md)               | `n[1001-1630].barnard` | 630 | 104 | 2 | 515,000    | 4,951  | - | - | 7 |
+| [`Barnard`](hardware_overview.md#barnard)               | `n[1001-1630].barnard` | 630 | 104 | 2 | 515,000    | 4,951  | - | - | 7 |
-| [`Capella`](capella.md)               | `c[1-144].capella`     |144  | 64  | 1 | 768,000    | 13,438 | 4 | 14 | 7 |
+| [`Capella`](hardware_overview.md#capella)               | `c[1-144].capella`     |144  | 64  | 1 | 768,000    | 13,438 | 4 | 14 | 7 |
-| [`Power9`](power9.md)                 | `ml[1-29].power9`      | 29  | 44  | 4 | 254,000    | 1,443  | 6 | - | 7 |
+| [`Power9`](hardware_overview.md#power9)                 | `ml[1-29].power9`      | 29  | 44  | 4 | 254,000    | 1,443  | 6 | - | 7 |
-| [`Romeo`](romeo.md)                   | `i[8001-8190].romeo`   | 190 | 128 | 2 | 505,000    | 1,972  | - | - | 7 |
+| [`Romeo`](hardware_overview.md#romeo)                   | `i[8001-8190].romeo`   | 190 | 128 | 2 | 505,000    | 1,972  | - | - | 7 |
-| [`Julia`](julia.md)                   | `julia`                | 1   | 896 | 1 | 48,390,000 | 54,006 | - | - | 7 |
+| [`Julia`](hardware_overview.md#julia)                   | `julia`                | 1   | 896 | 1 | 48,390,000 | 54,006 | - | - | 7 |
-| [`Alpha Centauri`](alpha_centauri.md) | `i[8001-8037].alpha`   | 37  | 48  | 2 | 990,000    | 10,312 | 8 | 6 | 7 |
+| [`Alpha Centauri`](hardware_overview.md#alpha_centauri) | `i[8001-8037].alpha`   | 37  | 48  | 2 | 990,000    | 10,312 | 8 | 6 | 7 |
 {: summary="Slurm resource limits table" align="bottom"}
 All HPC systems have Simultaneous Multithreading (SMT) enabled. You request for this