Skip to content
Snippets Groups Projects
Commit c44dfd46 authored by Sebastian Döbel's avatar Sebastian Döbel
Browse files

Move hardware info of clusters back to hardware_overview

parent dfb71a70
No related branches found
No related tags found
2 merge requests!1164Automated merge from preview to main,!1152Add page for Capella
...@@ -40,7 +40,7 @@ approved again. ...@@ -40,7 +40,7 @@ approved again.
## Barnard ## Barnard
The cluster [`Barnard`](../jobs_and_resources/barnard.md) can be accessed via the The cluster [`Barnard`](../jobs_and_resources/hardware_overview.md#barnard) can be accessed via the
four login nodes `login[1-4].barnard.hpc.tu-dresden.de`. (Please choose one concrete login node when four login nodes `login[1-4].barnard.hpc.tu-dresden.de`. (Please choose one concrete login node when
connecting, see example below.) connecting, see example below.)
...@@ -56,7 +56,7 @@ connecting, see example below.) ...@@ -56,7 +56,7 @@ connecting, see example below.)
## Romeo ## Romeo
The cluster [`Romeo`](../jobs_and_resources/romeo.md) can be accessed via the two The cluster [`Romeo`](../jobs_and_resources/hardware_overview.md#romeo) can be accessed via the two
login nodes `login[1-2].romeo.hpc.tu-dresden.de`. (Please choose one concrete login node when login nodes `login[1-2].romeo.hpc.tu-dresden.de`. (Please choose one concrete login node when
connecting, see example below.) connecting, see example below.)
...@@ -72,7 +72,7 @@ connecting, see example below.) ...@@ -72,7 +72,7 @@ connecting, see example below.)
## Capella ## Capella
The cluster [`Capella`](../jobs_and_resources/capella.md) can be accessed via the two The cluster [`Capella`](../jobs_and_resources/hardware_overview.md#capella) can be accessed via the two
login nodes `login[1-2].capella.hpc.tu-dresden.de`. (Please choose one concrete login node when login nodes `login[1-2].capella.hpc.tu-dresden.de`. (Please choose one concrete login node when
connecting, see example below.) connecting, see example below.)
...@@ -88,9 +88,9 @@ connecting, see example below.) ...@@ -88,9 +88,9 @@ connecting, see example below.)
## Alpha Centauri ## Alpha Centauri
The cluster [`Alpha Centauri`](../jobs_and_resources/alpha_centauri.md) can be accessed via the two The cluster [`Alpha Centauri`](../jobs_and_resources/hardware_overview.md#alpha_centauri) can be
login nodes `login[1-2].alpha.hpc.tu-dresden.de`. (Please choose one concrete login node when accessed via the two login nodes `login[1-2].alpha.hpc.tu-dresden.de`. (Please choose one concrete
connecting, see example below.) login node when connecting, see example below.)
| Key Type | Fingerprint | | Key Type | Fingerprint |
|:---------|:------------------------------------------------------| |:---------|:------------------------------------------------------|
...@@ -104,7 +104,7 @@ connecting, see example below.) ...@@ -104,7 +104,7 @@ connecting, see example below.)
## Julia ## Julia
The cluster [`Julia`](../jobs_and_resources/julia.md) can be accessed via `julia.hpc.tu-dresden.de`. The cluster [`Julia`](../jobs_and_resources/hardware_overview.md#julia) can be accessed via `julia.hpc.tu-dresden.de`.
(Note, there is no separate login node.) (Note, there is no separate login node.)
| Key Type | Fingerprint | | Key Type | Fingerprint |
......
...@@ -4,18 +4,6 @@ ...@@ -4,18 +4,6 @@
The multi-GPU cluster `Alpha Centauri` has been installed for AI-related computations (ScaDS.AI). The multi-GPU cluster `Alpha Centauri` has been installed for AI-related computations (ScaDS.AI).
## Details
- 34 nodes, each with
- 8 x NVIDIA A100-SXM4 Tensor Core-GPUs
- 2 x AMD EPYC CPU 7352 (24 cores) @ 2.3 GHz, Multithreading available
- 1 TB RAM (16 x 32 GB DDR4-2933 MT/s per socket)
- 3.5 TB local storage on NVMe device at `/tmp`
- Login nodes: `login[1-2].alpha.hpc.tu-dresden.de`
- Hostnames: `i[8001-8037].alpha.hpc.tu-dresden.de`
- Operating system: Rocky Linux 8.9
- Further information on the usage is documented below
## Filesystems ## Filesystems
Since 5th July 2024, `Alpha Centauri` is fully integrated in the InfiniBand infrastructure of Since 5th July 2024, `Alpha Centauri` is fully integrated in the InfiniBand infrastructure of
......
...@@ -6,6 +6,11 @@ The multi-GPU cluster `Capella` has been installed for AI-related computations a ...@@ -6,6 +6,11 @@ The multi-GPU cluster `Capella` has been installed for AI-related computations a
HPC simulations. Capella is fully integrated into the ZIH HPC infrastructure. HPC simulations. Capella is fully integrated into the ZIH HPC infrastructure.
Therefore, the usage should be similar to the other clusters. Therefore, the usage should be similar to the other clusters.
## Hardware Resources
The hardware specification is documented on the page
[HPC Resources](hardware_overview.md#capella).
## Login ## Login
You use `login[1-2].capella.hpc.tu-dresden.de` to access the system from campus (or VPN). You use `login[1-2].capella.hpc.tu-dresden.de` to access the system from campus (or VPN).
......
...@@ -30,7 +30,7 @@ HPC resources at ZIH comprise a total of the **six systems**: ...@@ -30,7 +30,7 @@ HPC resources at ZIH comprise a total of the **six systems**:
| ----------------------------------- | ----------------------| -------------------- | --- | | ----------------------------------- | ----------------------| -------------------- | --- |
| [`Capella`](capella.md) | GPU cluster | 2024 | `c[1-144].capella.hpc.tu-dresden.de` | | [`Capella`](capella.md) | GPU cluster | 2024 | `c[1-144].capella.hpc.tu-dresden.de` |
| [`Barnard`](barnard.md) | CPU cluster | 2023 | `n[1001-1630].barnard.hpc.tu-dresden.de` | | [`Barnard`](barnard.md) | CPU cluster | 2023 | `n[1001-1630].barnard.hpc.tu-dresden.de` |
| [`Alpha Centauri`](alpha-centauri.md) | GPU cluster | 2021 | `i[8001-8037].alpha.hpc.tu-dresden.de` | | [`Alpha Centauri`](alpha_centauri.md) | GPU cluster | 2021 | `i[8001-8037].alpha.hpc.tu-dresden.de` |
| [`Julia`](julia.md) | Single SMP system | 2021 | `julia.hpc.tu-dresden.de` | | [`Julia`](julia.md) | Single SMP system | 2021 | `julia.hpc.tu-dresden.de` |
| [`Romeo`](romeo.md) | CPU cluster | 2020 | `i[8001-8190].romeo.hpc.tu-dresden.de` | | [`Romeo`](romeo.md) | CPU cluster | 2020 | `i[8001-8190].romeo.hpc.tu-dresden.de` |
| [`Power9`](power9.md) | IBM Power/GPU cluster | 2018 | `ml[1-29].power9.hpc.tu-dresden.de` | | [`Power9`](power9.md) | IBM Power/GPU cluster | 2018 | `ml[1-29].power9.hpc.tu-dresden.de` |
...@@ -53,33 +53,86 @@ only from their respective login nodes. ...@@ -53,33 +53,86 @@ only from their respective login nodes.
## Barnard ## Barnard
The cluster `Barnard` is a general purpose cluster by Bull. It is based on Intel Sapphire Rapids CPUs. The cluster `Barnard` is a general purpose cluster by Bull. It is based on Intel Sapphire Rapids CPUs.
Further details in [`Barnard` Chapter](barnard.md).
- 630 nodes, each with
- 2 x Intel Xeon Platinum 8470 (52 cores) @ 2.00 GHz, Multithreading enabled
- 512 GB RAM (8 x 32 GB DDR5-4800 MT/s per socket)
- 12 nodes provide 1.8 TB local storage on NVMe device at `/tmp`
- All other nodes are diskless and have no or very limited local storage (i.e. `/tmp`)
- Login nodes: `login[1-4].barnard.hpc.tu-dresden.de`
- Hostnames: `n[1001-1630].barnard.hpc.tu-dresden.de`
- Operating system: Red Hat Enterpise Linux 8.9
## Alpha Centauri ## Alpha Centauri
The cluster `Alpha Centauri` (short: `Alpha`) by NEC provides AMD Rome CPUs and NVIDIA A100 GPUs The cluster `Alpha Centauri` (short: `Alpha`) by NEC provides AMD Rome CPUs and NVIDIA A100 GPUs
and is designed for AI and ML tasks. and is designed for AI and ML tasks.
Further details in [`Alpha Centauri` Chapter](alpha_centauri.md).
- 37 nodes, each with
- 8 x NVIDIA A100-SXM4 Tensor Core-GPUs
- 2 x AMD EPYC CPU 7352 (24 cores) @ 2.3 GHz, Multithreading available
- 1 TB RAM (16 x 32 GB DDR4-2933 MT/s per socket)
- 3.5 TB local storage on NVMe device at `/tmp`
- Login nodes: `login[1-2].alpha.hpc.tu-dresden.de`
- Hostnames: `i[8001-8037].alpha.hpc.tu-dresden.de`
- Operating system: Rocky Linux 8.9
- Further information on the usage is documented on the site [GPU Cluster Alpha Centauri](alpha_centauri.md)
## Capella ## Capella
The cluster `Capella` by MEGWARE provides AMD Genoa CPUs and NVIDIA H100 GPUs The cluster `Capella` by MEGWARE provides AMD Genoa CPUs and NVIDIA H100 GPUs
and is designed for AI and ML tasks. and is designed for AI and ML tasks.
Further details in [`Capella` Chapter](capella.md).
- 144 nodes, each with
- 4 x NVIDIA H100-SXM5 Tensor Core-GPUs
- 2 x AMD EPYC CPU 9334 (32 cores) @ 2.7 GHz, Multithreading disabled
- 768 GB RAM (12 x 32 GB DDR5-4800 MT/s per socket)
- 800 GB local storage on NVMe device at `/tmp`
- Login nodes: `login[1-2].capella.hpc.tu-dresden.de`
- Hostnames: `c[1-144].capella.hpc.tu-dresden.de`
- Operating system: Alma Linux 9.4
- Further information on the usage is documented on the site [GPU Cluster Capella](capella.md)
## Romeo ## Romeo
The cluster `Romeo` is a general purpose cluster by NEC based on AMD Rome CPUs. The cluster `Romeo` is a general purpose cluster by NEC based on AMD Rome CPUs.
Further details in [`Romeo` Chapter](romeo.md).
- 188 nodes, each with
- 2 x AMD EPYC CPU 7702 (64 cores) @ 2.0 GHz, Multithreading available
- 512 GB RAM (8 x 32 GB DDR4-3200 MT/s per socket)
- 200 GB local storage on SSD at `/tmp`
- Login nodes: `login[1-2].romeo.hpc.tu-dresden.de`
- Hostnames: `i[7001-7186].romeo.hpc.tu-dresden.de`
- Operating system: Rocky Linux 8.9
- Further information on the usage is documented on the site [CPU Cluster Romeo](romeo.md)
## Julia ## Julia
The cluster `Julia` is a large SMP (shared memory parallel) system by HPE based on Superdome Flex The cluster `Julia` is a large SMP (shared memory parallel) system by HPE based on Superdome Flex
architecture. architecture.
Further details in [`Julia` Chapter](julia.md).
- 1 node, with
- 32 x Intel(R) Xeon(R) Platinum 8276M CPU @ 2.20 GHz (28 cores)
- 47 TB RAM (12 x 128 GB DDR4-2933 MT/s per socket)
- Configured as one single node
- 48 TB RAM (usable: 47 TB - one TB is used for cache coherence protocols)
- 370 TB of fast NVME storage available at `/nvme/<projectname>`
- Login node: `julia.hpc.tu-dresden.de`
- Hostname: `julia.hpc.tu-dresden.de`
- Operating system: Rocky Linux 8.7
- Further information on the usage is documented on the site [SMP System Julia](julia.md)
## Power9 ## Power9
The cluster `Power9` by IBM is based on Power9 CPUs and provides NVIDIA V100 GPUs. The cluster `Power9` by IBM is based on Power9 CPUs and provides NVIDIA V100 GPUs.
`Power9` is specifically designed for machine learning (ML) tasks. `Power9` is specifically designed for machine learning (ML) tasks.
Further details in [`Power9` Chapter](power9.md).
- 32 nodes, each with
- 2 x IBM Power9 CPU (2.80 GHz, 3.10 GHz boost, 22 cores)
- 256 GB RAM (8 x 16 GB DDR4-2666 MT/s per socket)
- 6 x NVIDIA VOLTA V100 with 32 GB HBM2
- NVLINK bandwidth 150 GB/s between GPUs and host
- Login nodes: `login[1-2].power9.hpc.tu-dresden.de`
- Hostnames: `ml[1-29].power9.hpc.tu-dresden.de`
- Operating system: Alma Linux 8.7
- Further information on the usage is documented on the site [GPU Cluster Power9](power9.md)
...@@ -6,18 +6,10 @@ The HPE Superdome Flex is a large shared memory node. It is especially well suit ...@@ -6,18 +6,10 @@ The HPE Superdome Flex is a large shared memory node. It is especially well suit
intensive application scenarios, for example to process extremely large data sets completely in main intensive application scenarios, for example to process extremely large data sets completely in main
memory or in very fast NVMe memory. memory or in very fast NVMe memory.
## Details ## Hardware Resources
- 1 node, with The hardware specification is documented on the page
- 32 x Intel(R) Xeon(R) Platinum 8276M CPU @ 2.20 GHz (28 cores) [HPC Resources](hardware_overview.md#julia).
- 47 TB RAM (12 x 128 GB DDR4-2933 MT/s per socket)
- Configured as one single node
- 48 TB RAM (usable: 47 TB - one TB is used for cache coherence protocols)
- 370 TB of fast NVME storage available at `/nvme/<projectname>`
- Login node: `julia.hpc.tu-dresden.de`
- Hostname: `julia.hpc.tu-dresden.de`
- Operating system: Rocky Linux 8.7
- Further information on the usage is documented below
!!! note !!! note
......
...@@ -7,17 +7,9 @@ partition `power` within the now decommissioned `Taurus` system. With the decomm ...@@ -7,17 +7,9 @@ partition `power` within the now decommissioned `Taurus` system. With the decomm
`Power9` has been re-engineered and is now a homogeneous, standalone cluster with own `Power9` has been re-engineered and is now a homogeneous, standalone cluster with own
[Slurm batch system](slurm.md) and own login nodes. [Slurm batch system](slurm.md) and own login nodes.
## Details ## Hardware Resources
- 32 nodes, each with The hardware specification is documented on the page [HPC Resources](hardware_overview.md#power9).
- 2 x IBM Power9 CPU (2.80 GHz, 3.10 GHz boost, 22 cores)
- 256 GB RAM (8 x 16 GB DDR4-2666 MT/s per socket)
- 6 x NVIDIA Tesla V100 with 32 GB HBM2
- NVLINK bandwidth 150 GB/s between GPUs and host
- Login nodes: `login[1-2].power9.hpc.tu-dresden.de`
- Hostnames: `ml[1-29].power9.hpc.tu-dresden.de`
- Operating system: Alma Linux 8.7
- Further information on the usage is documented below
## Usage ## Usage
......
...@@ -7,16 +7,10 @@ of 2023, it was available as partition `romeo` within `Taurus`. With the decommi ...@@ -7,16 +7,10 @@ of 2023, it was available as partition `romeo` within `Taurus`. With the decommi
`Romeo` has been re-engineered and is now a homogeneous, standalone cluster with own `Romeo` has been re-engineered and is now a homogeneous, standalone cluster with own
[Slurm batch system](slurm.md) and own login nodes. [Slurm batch system](slurm.md) and own login nodes.
## Details ## Hardware Resources
- 192 nodes, each with The hardware specification is documented on the page
- 2 x AMD EPYC CPU 7702 (64 cores) @ 2.0 GHz, Multithreading available [HPC Resources](hardware_overview.md#romeo).
- 512 GB RAM (8 x 32 GB DDR4-3200 MT/s per socket)
- 200 GB local storage on SSD at `/tmp`
- Login nodes: `login[1-2].romeo.hpc.tu-dresden.de`
- Hostnames: `i[7001-7190].romeo.hpc.tu-dresden.de`
- Operating system: Rocky Linux 8.9
- Further information on the usage is documented below
## Usage ## Usage
......
...@@ -109,7 +109,7 @@ But, do you need to request tasks or CPUs from Slurm in order to provide resourc ...@@ -109,7 +109,7 @@ But, do you need to request tasks or CPUs from Slurm in order to provide resourc
Slurm will allocate one or many GPUs for your job if requested. Slurm will allocate one or many GPUs for your job if requested.
Please note that GPUs are only available in the GPU clusters, like Please note that GPUs are only available in the GPU clusters, like
[Alpha Centauri](hardware_overview.md#alpha-centauri), [Capella](hardware_overview#capella) and [Alpha Centauri](hardware_overview.md#alpha-centauri), [Capella](hardware_overview.md#capella) and
[Power9](hardware_overview.md#power9). [Power9](hardware_overview.md#power9).
The option for `sbatch/srun` in this case is `--gres=gpu:[NUM_PER_NODE]`, The option for `sbatch/srun` in this case is `--gres=gpu:[NUM_PER_NODE]`,
where `NUM_PER_NODE` is the number of GPUs **per node** that will be used for the job. where `NUM_PER_NODE` is the number of GPUs **per node** that will be used for the job.
......
...@@ -72,16 +72,16 @@ The physical installed memory might differ from the amount available for Slurm j ...@@ -72,16 +72,16 @@ The physical installed memory might differ from the amount available for Slurm j
so-called diskless compute nodes, i.e., nodes without additional local drive. At these nodes, the so-called diskless compute nodes, i.e., nodes without additional local drive. At these nodes, the
operating system and other components reside in the main memory, lowering the available memory for operating system and other components reside in the main memory, lowering the available memory for
jobs. The reserved amount of memory for the system operation might vary slightly over time. The jobs. The reserved amount of memory for the system operation might vary slightly over time. The
following table depicts the resource limits for [all our HPC systems](hardware_overview.md). following table depics the resource limits for [all our HPC systems](hardware_overview.md).
| HPC System | Nodes | # Nodes | Cores per Node | Threads per Core | Memory per Node [in MB] | Memory per (SMT) Core [in MB] | GPUs per Node | Cores per GPU | Job Max Time [in days] | | HPC System | Nodes | # Nodes | Cores per Node | Threads per Core | Memory per Node [in MB] | Memory per (SMT) Core [in MB] | GPUs per Node | Cores per GPU | Job Max Time [in days] |
|:-----------|:------|--------:|---------------:|-----------------:|------------------------:|------------------------------:|--------------:|--------------:|-------------:| |:-----------|:------|--------:|---------------:|-----------------:|------------------------:|------------------------------:|--------------:|--------------:|-------------:|
| [`Barnard`](barnard.md) | `n[1001-1630].barnard` | 630 | 104 | 2 | 515,000 | 4,951 | - | - | 7 | | [`Barnard`](hardware_overview.md#barnard) | `n[1001-1630].barnard` | 630 | 104 | 2 | 515,000 | 4,951 | - | - | 7 |
| [`Capella`](capella.md) | `c[1-144].capella` |144 | 64 | 1 | 768,000 | 13,438 | 4 | 14 | 7 | | [`Capella`](hardware_overview.md#capella) | `c[1-144].capella` |144 | 64 | 1 | 768,000 | 13,438 | 4 | 14 | 7 |
| [`Power9`](power9.md) | `ml[1-29].power9` | 29 | 44 | 4 | 254,000 | 1,443 | 6 | - | 7 | | [`Power9`](hardware_overview.md#power9) | `ml[1-29].power9` | 29 | 44 | 4 | 254,000 | 1,443 | 6 | - | 7 |
| [`Romeo`](romeo.md) | `i[8001-8190].romeo` | 190 | 128 | 2 | 505,000 | 1,972 | - | - | 7 | | [`Romeo`](hardware_overview.md#romeo) | `i[8001-8190].romeo` | 190 | 128 | 2 | 505,000 | 1,972 | - | - | 7 |
| [`Julia`](julia.md) | `julia` | 1 | 896 | 1 | 48,390,000 | 54,006 | - | - | 7 | | [`Julia`](hardware_overview.md#julia) | `julia` | 1 | 896 | 1 | 48,390,000 | 54,006 | - | - | 7 |
| [`Alpha Centauri`](alpha_centauri.md) | `i[8001-8037].alpha` | 37 | 48 | 2 | 990,000 | 10,312 | 8 | 6 | 7 | | [`Alpha Centauri`](hardware_overview.md#alpha_centauri) | `i[8001-8037].alpha` | 37 | 48 | 2 | 990,000 | 10,312 | 8 | 6 | 7 |
{: summary="Slurm resource limits table" align="bottom"} {: summary="Slurm resource limits table" align="bottom"}
All HPC systems have Simultaneous Multithreading (SMT) enabled. You request for this All HPC systems have Simultaneous Multithreading (SMT) enabled. You request for this
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment