Skip to content
Snippets Groups Projects
hardware_overview.md 4.12 KiB
Newer Older
Ulf Markwardt's avatar
Ulf Markwardt committed
HPC resources in ZIH systems comprise the *High Performance Computing and Storage Complex* and its
extension *High Performance Computing – Data Analytics*. In total it offers scientists
Martin Schroschk's avatar
Martin Schroschk committed
about 100,000 CPU cores and a peak performance of more than 1.5 quadrillion floating point
Ulf Markwardt's avatar
Ulf Markwardt committed
operations per second. The architecture specifically tailored to data-intensive computing, Big Data
analytics, and artificial intelligence methods with extensive capabilities for energy measurement
and performance monitoring provides ideal conditions to achieve the ambitious research goals of the
users and the ZIH.

Martin Schroschk's avatar
Martin Schroschk committed
## Architectural Design
Martin Schroschk's avatar
Martin Schroschk committed
Over the last decade we have been running our HPC system of high heterogeneity with a single
Martin Schroschk's avatar
Martin Schroschk committed
Slurm batch system. This made things very complicated, especially to inexperienced users. With
the replacement of the Taurus system by the cluster [Barnard](#barnard) in 2023 we have a new
archtictural design comprising **six homogeneous clusters with their own Slurm instances and with
cluster specific login nodes** running on the same CPU. Job submission is possible only from
within the corresponding cluster (compute or login node).
Martin Schroschk's avatar
Martin Schroschk committed
All clusters are integrated to the new InfiniBand fabric and have the same access to
the shared filesystems. You find a comprehensive documentation on the available working and
Martin Schroschk's avatar
Martin Schroschk committed
permanent filesystems on the page [Filesystems](../data_lifecycle/file_systems.md).
Ulf Markwardt's avatar
Ulf Markwardt committed
![Architecture overview 2023](../jobs_and_resources/misc/architecture_2024.png)
Martin Schroschk's avatar
Martin Schroschk committed
{: align=center}
Natalie Breidenbach's avatar
Natalie Breidenbach committed
HPC resources at ZIH comprise a total of the **six systems**:

| Name                                | Description           | Year of Installation | DNS |
| ----------------------------------- | ----------------------| -------------------- | --- |
| [`Capella`](capella.md)               | GPU cluster           | 2024                 | `c[1-144].capella.hpc.tu-dresden.de` |
| [`Barnard`](barnard.md)               | CPU cluster           | 2023                 | `n[1001-1630].barnard.hpc.tu-dresden.de` |
| [`Alpha Centauri`](alpha-centauri.md) | GPU cluster           | 2021                 | `i[8001-8037].alpha.hpc.tu-dresden.de` |
| [`Julia`](julia.md)                   | Single SMP system     | 2021                 | `julia.hpc.tu-dresden.de` |
| [`Romeo`](romeo.md)                   | CPU cluster           | 2020                 | `i[8001-8190].romeo.hpc.tu-dresden.de` |
| [`Power9`](power9.md)                 | IBM Power/GPU cluster | 2018                 | `ml[1-29].power9.hpc.tu-dresden.de` |

All clusters will run with their own [Slurm batch system](slurm.md) and job submission is possible
only from their respective login nodes.

## Login and Dataport Nodes
Ulf Markwardt's avatar
Ulf Markwardt committed

Natalie Breidenbach's avatar
Natalie Breidenbach committed
    - Individual for each cluster. See the specifics in each cluster chapter
- 2 Data-Transfer-Nodes
    - 2 servers without interactive login, only available via file transfer protocols
      (`rsync`, `ftp`)
    - `dataport[3-4].hpc.tu-dresden.de`
    - IPs: 141.30.73.\[4,5\]
    - Further information on the usage is documented on the site
      [dataport Nodes](../data_transfer/dataport_nodes.md)

Natalie Breidenbach's avatar
Natalie Breidenbach committed
The cluster `Barnard` is a general purpose cluster by Bull. It is based on Intel Sapphire Rapids CPUs.
Further details in [`Barnard` Chapter](barnard.md).
The cluster `Alpha Centauri` (short: `Alpha`) by NEC provides AMD Rome CPUs and NVIDIA A100 GPUs
and is designed for AI and ML tasks.
Natalie Breidenbach's avatar
Natalie Breidenbach committed
Further details in [`Alpha Centauri` Chapter](alpha_centauri.md).
Ulf Markwardt's avatar
Ulf Markwardt committed
## Capella

The cluster `Capella` by MEGWARE provides AMD Genoa CPUs and NVIDIA H100 GPUs
and is designed for AI and ML tasks.
Natalie Breidenbach's avatar
Natalie Breidenbach committed
Further details in [`Capella` Chapter](capella.md).
Ulf Markwardt's avatar
Ulf Markwardt committed

The cluster `Romeo` is a general purpose cluster by NEC based on AMD Rome CPUs.
Natalie Breidenbach's avatar
Natalie Breidenbach committed
Further details in [`Romeo` Chapter](romeo.md).
The cluster `Julia` is a large SMP (shared memory parallel) system by HPE based on Superdome Flex
Natalie Breidenbach's avatar
Natalie Breidenbach committed
Further details in [`Julia` Chapter](julia.md).
Martin Schroschk's avatar
Martin Schroschk committed
## Power9
The cluster `Power9` by IBM is based on Power9 CPUs and provides NVIDIA V100 GPUs.
`Power9` is specifically designed for machine learning (ML) tasks.
Natalie Breidenbach's avatar
Natalie Breidenbach committed
Further details in [`Power9` Chapter](power9.md).