From fa31e4e80bb7550f8295478fc19a882e93e255c6 Mon Sep 17 00:00:00 2001 From: Ulf Markwardt <ulf.markwardt@tu-dresden.de> Date: Mon, 30 Sep 2024 13:35:57 +0200 Subject: [PATCH] added Capella --- .../jobs_and_resources/hardware_overview.md | 17 ++++++++++++++++- 1 file changed, 16 insertions(+), 1 deletion(-) diff --git a/doc.zih.tu-dresden.de/docs/jobs_and_resources/hardware_overview.md b/doc.zih.tu-dresden.de/docs/jobs_and_resources/hardware_overview.md index c8d49918b..7d702a54c 100644 --- a/doc.zih.tu-dresden.de/docs/jobs_and_resources/hardware_overview.md +++ b/doc.zih.tu-dresden.de/docs/jobs_and_resources/hardware_overview.md @@ -15,6 +15,7 @@ perspective, there will be **five separate clusters**: | Name | Description | Year of Installation | DNS | | ----------------------------------- | ----------------------| -------------------- | --- | +| [`Capella`](#capella) | GPU cluster | 2024 | `c[1-244].capella.hpc.tu-dresden.de` | | [`Barnard`](#barnard) | CPU cluster | 2023 | `n[1001-1630].barnard.hpc.tu-dresden.de` | | [`Alpha Centauri`](#alpha-centauri) | GPU cluster | 2021 | `i[8001-8037].alpha.hpc.tu-dresden.de` | | [`Julia`](#julia) | Single SMP system | 2021 | `julia.hpc.tu-dresden.de` | @@ -36,7 +37,7 @@ nodes** running on the same CPU. Job submission will be possible only from with All clusters will be integrated to the new InfiniBand fabric and have then the same access to the shared filesystems. This recabling will require a brief downtime of a few days. - + {: align=center} ### Compute Systems @@ -133,6 +134,20 @@ and is designed for AI and ML tasks. - Operating system: Rocky Linux 8.7 - Further information on the usage is documented on the site [GPU Cluster Alpha Centauri](alpha_centauri.md) +## Capella + +The cluster `Capella` by MEGWARE provides AMD Genoa CPUs and NVIDIA H100 GPUs +and is designed for AI and ML tasks. + +- 144 nodes, each with + - 4 x NVIDIA H100-SXM5 Tensor Core-GPUs + - 2 x AMD EPYC CPU 9334 (32 cores) @ 2.7 GHz, Multithreading disabled + - 768 GB RAM (24 x 32 GB TruDDR5, 4800 MHz) + - 800 GB local memory on NVMe device at `/tmp` +- Login nodes: `login[1-2].capella.hpc.tu-dresden.de` +- Hostnames: `c[1-144]].capella.hpc.tu-dresden.de` +- Operating system: Alma Linux 9.4 + ## Romeo The cluster `Romeo` is a general purpose cluster by NEC based on AMD Rome CPUs. -- GitLab