diff --git a/doc.zih.tu-dresden.de/docs/jobs_and_resources/alpha_centauri.md b/doc.zih.tu-dresden.de/docs/jobs_and_resources/alpha_centauri.md index 561d0ed622ae6b514866404a38ee5bc7d2f0c4ba..5a0e15613665ececfaa2d0915fd7022c742a9288 100644 --- a/doc.zih.tu-dresden.de/docs/jobs_and_resources/alpha_centauri.md +++ b/doc.zih.tu-dresden.de/docs/jobs_and_resources/alpha_centauri.md @@ -1,22 +1,17 @@ # Alpha Centauri -The multi-GPU sub-cluster "Alpha Centauri" had been installed for AI-related computations (ScaDS.AI). -It has 34 nodes, each with: +The multi-GPU sub-cluster "Alpha Centauri" has been installed for AI-related computations (ScaDS.AI). -* 8 x NVIDIA A100-SXM4 (40 GB RAM) -* 2 x AMD EPYC CPU 7352 (24 cores) @ 2.3 GHz with multi-threading enabled -* 1 TB RAM -* 3.5 TB `/tmp` local NVMe device -* Hostnames: `taurusi[8001-8034]` -* Slurm partition `alpha` for batch jobs and `alpha-interactive` for interactive jobs +The hardware specification is documented on the page +[HPC Resources](hardware_overview.md#amd-rome-cpus-nvidia-a100). + +## Usage !!! note The NVIDIA A100 GPUs may only be used with **CUDA 11** or later. Earlier versions do not recognize the new hardware properly. Make sure the software you are using is built with CUDA11. -## Usage - ### Modules The easiest way is using the [module system](../software/modules.md). diff --git a/doc.zih.tu-dresden.de/docs/jobs_and_resources/hardware_overview.md b/doc.zih.tu-dresden.de/docs/jobs_and_resources/hardware_overview.md index 7405eec766fa216e5eccdab2b4a1856ca5f98b2b..1d06a620e89ae43b796286e6add849644beae530 100644 --- a/doc.zih.tu-dresden.de/docs/jobs_and_resources/hardware_overview.md +++ b/doc.zih.tu-dresden.de/docs/jobs_and_resources/hardware_overview.md @@ -1,6 +1,6 @@ # HPC Resources -HPC resources in ZIH systems comprises the *High Performance Computing and Storage Complex* and its +HPC resources in ZIH systems comprise the *High Performance Computing and Storage Complex* and its extension *High Performance Computing – Data Analytics*. In total it offers scientists about 60,000 CPU cores and a peak performance of more than 1.5 quadrillion floating point operations per second. The architecture specifically tailored to data-intensive computing, Big Data @@ -8,80 +8,76 @@ analytics, and artificial intelligence methods with extensive capabilities for e and performance monitoring provides ideal conditions to achieve the ambitious research goals of the users and the ZIH. -## Login Nodes +## Login and Export Nodes -- Login-Nodes (`tauruslogin[3-6].hrsk.tu-dresden.de`) - - each with 2x Intel(R) Xeon(R) CPU E5-2680 v3 each with 12 cores - 2.50GHz, Multithreading Disabled, 64 GB RAM, 128 GB SSD local disk +- 4 Login-Nodes `tauruslogin[3-6].hrsk.tu-dresden.de` + - Each login node is equipped with 2x Intel(R) Xeon(R) CPU E5-2680 v3 with 24 cores in total @ + 2.50 GHz, Multithreading disabled, 64 GB RAM, 128 GB SSD local disk - IPs: 141.30.73.\[102-105\] -- Transfer-Nodes (`taurusexport[3-4].hrsk.tu-dresden.de`, DNS Alias - `taurusexport.hrsk.tu-dresden.de`) - - 2 Servers without interactive login, only available via file transfer protocols (`rsync`, `ftp`) - - IPs: 141.30.73.82/83 -- Direct access to these nodes is granted via IP whitelisting (contact - hpcsupport@zih.tu-dresden.de) - otherwise use TU Dresden VPN. - -!!! warning "Run time limit" - - Any process on login nodes is stopped after 5 minutes. +- 2 Data-Transfer-Nodes `taurusexport[3-4].hrsk.tu-dresden.de` + - DNS Alias `taurusexport.hrsk.tu-dresden.de` + - 2 Servers without interactive login, only available via file transfer protocols + (`rsync`, `ftp`) + - IPs: 141.30.73.\[82,83\] + - Further information on the usage is documented on the site + [Export Nodes](../data_transfer/export_nodes.md) ## AMD Rome CPUs + NVIDIA A100 - 34 nodes, each with - - 8 x NVIDIA A100-SXM4 + - 8 x NVIDIA A100-SXM4 Tensor Core-GPUs - 2 x AMD EPYC CPU 7352 (24 cores) @ 2.3 GHz, Multithreading disabled - 1 TB RAM - - 3.5 TB local memory at NVMe device at `/tmp` + - 3.5 TB local memory on NVMe device at `/tmp` - Hostnames: `taurusi[8001-8034]` -- Slurm partition `alpha` -- Dedicated mostly for ScaDS-AI +- Slurm partition: `alpha` +- Further information on the usage is documented on the site [AMD Rome Nodes](rome_nodes.md) ## Island 7 - AMD Rome CPUs - 192 nodes, each with - - 2x AMD EPYC CPU 7702 (64 cores) @ 2.0GHz, Multithreading - enabled, + - 2 x AMD EPYC CPU 7702 (64 cores) @ 2.0 GHz, Multithreading enabled, - 512 GB RAM - - 200 GB /tmp on local SSD local disk + - 200 GB local memory on SSD at `/tmp` - Hostnames: `taurusi[7001-7192]` -- Slurm partition `romeo` -- More information under [Rome Nodes](rome_nodes.md) +- Slurm partition: `romeo` +- Further information on the usage is documented on the site [AMD Rome Nodes](rome_nodes.md) ## Large SMP System HPE Superdome Flex - 1 node, with - - 32 x Intel(R) Xeon(R) Platinum 8276M CPU @ 2.20GHz (28 cores) + - 32 x Intel(R) Xeon(R) Platinum 8276M CPU @ 2.20 GHz (28 cores) - 47 TB RAM -- Currently configured as one single node +- Configured as one single node +- 48 TB RAM (usable: 47 TB - one TB is used for cache coherence protocols) +- 370 TB of fast NVME storage available at `/nvme/<projectname>` - Hostname: `taurussmp8` -- Slurm partition `julia` -- More information under [HPE SD Flex](sd_flex.md) +- Slurm partition: `julia` +- Further information on the usage is documented on the site [HPE Superdome Flex](sd_flex.md) ## IBM Power9 Nodes for Machine Learning -For machine learning, we have 32 IBM AC922 nodes installed with this configuration: +For machine learning, we have IBM AC922 nodes installed with this configuration: - 32 nodes, each with - 2 x IBM Power9 CPU (2.80 GHz, 3.10 GHz boost, 22 cores) - - 256 GB RAM DDR4 2666MHz - - 6x NVIDIA VOLTA V100 with 32GB HBM2 + - 256 GB RAM DDR4 2666 MHz + - 6 x NVIDIA VOLTA V100 with 32 GB HBM2 - NVLINK bandwidth 150 GB/s between GPUs and host - Hostnames: `taurusml[1-32]` -- Slurm partition `ml` +- Slurm partition: `ml` ## Island 6 - Intel Haswell CPUs - 612 nodes, each with - - 2x Intel(R) Xeon(R) CPU E5-2680 v3 (12 cores) - @ 2.50GHz, Multithreading disabled, 128 GB SSD local disk -- Varying amounts of main memory (selected automatically by the batch - system for you according to your job requirements) - - 594 nodes with 2.67 GB RAM per core (64 GB total): - `taurusi[6001-6540,6559-6612]` - - 18 nodes with 10.67 GB RAM per core (256 GB total): - `taurusi[6541-6558]` + - 2 x Intel(R) Xeon(R) CPU E5-2680 v3 (12 cores) @ 2.50 GHz, Multithreading disabled + - 128 GB local memory on SSD +- Varying amounts of main memory (selected automatically by the batch system for you according to + your job requirements) + * 594 nodes with 2.67 GB RAM per core (64 GB in total): `taurusi[6001-6540,6559-6612]` + - 18 nodes with 10.67 GB RAM per core (256 GB in total): `taurusi[6541-6558]` - Hostnames: `taurusi[6001-6612]` -- Slurm Partition `haswell` +- Slurm Partition: `haswell` ??? hint "Node topology" @@ -91,23 +87,21 @@ For machine learning, we have 32 IBM AC922 nodes installed with this configurati ## Island 2 Phase 2 - Intel Haswell CPUs + NVIDIA K80 GPUs - 64 nodes, each with - - 2x Intel(R) Xeon(R) CPU E5-E5-2680 v3 (12 cores) - @ 2.50GHz, Multithreading Disabled + - 2 x Intel(R) Xeon(R) CPU E5-E5-2680 v3 (12 cores) @ 2.50 GHz, Multithreading disabled - 64 GB RAM (2.67 GB per core) - - 128 GB SSD local disk - - 4x NVIDIA Tesla K80 (12 GB GDDR RAM) GPUs + - 128 GB local memory on SSD + - 4 x NVIDIA Tesla K80 (12 GB GDDR RAM) GPUs - Hostnames: `taurusi[2045-2108]` -- Slurm Partition `gpu2` +- Slurm Partition: `gpu2` - Node topology, same as [island 4 - 6](#island-6-intel-haswell-cpus) ## SMP Nodes - up to 2 TB RAM - 5 Nodes, each with - - 4x Intel(R) Xeon(R) CPU E7-4850 v3 (14 cores) @ - 2.20GHz, Multithreading Disabled + - 4 x Intel(R) Xeon(R) CPU E7-4850 v3 (14 cores) @ 2.20 GHz, Multithreading disabled - 2 TB RAM - Hostnames: `taurussmp[3-7]` -- Slurm partition `smp2` +- Slurm partition: `smp2` ??? hint "Node topology" diff --git a/doc.zih.tu-dresden.de/docs/jobs_and_resources/rome_nodes.md b/doc.zih.tu-dresden.de/docs/jobs_and_resources/rome_nodes.md index 530e2714668c0695e5fd57df8052c609bdc9a28f..905110c775721ded6ce280ef069b0b05e7ce146f 100644 --- a/doc.zih.tu-dresden.de/docs/jobs_and_resources/rome_nodes.md +++ b/doc.zih.tu-dresden.de/docs/jobs_and_resources/rome_nodes.md @@ -1,13 +1,7 @@ # Island 7 - AMD Rome Nodes -## Hardware - -- Slurm partition: `romeo` -- Module architecture: `rome` -- 192 nodes `taurusi[7001-7192]`, each: - - 2x AMD EPYC CPU 7702 (64 cores) @ 2.0GHz, Simultaneous Multithreading (SMT) - - 512 GB RAM - - 200 GB SSD disk mounted on `/tmp` +The hardware specification is documented on the page +[HPC Resources](hardware_overview.md#island-7-amd-rome-cpus). ## Usage diff --git a/doc.zih.tu-dresden.de/docs/jobs_and_resources/sd_flex.md b/doc.zih.tu-dresden.de/docs/jobs_and_resources/sd_flex.md index 34505f93de1673aea883574459157b41c9f56357..27a39b06a6444ebd13a5a4e86c74cf1b17317e8d 100644 --- a/doc.zih.tu-dresden.de/docs/jobs_and_resources/sd_flex.md +++ b/doc.zih.tu-dresden.de/docs/jobs_and_resources/sd_flex.md @@ -4,16 +4,10 @@ The HPE Superdome Flex is a large shared memory node. It is especially well suit intensive application scenarios, for example to process extremely large data sets completely in main memory or in very fast NVMe memory. -## Configuration Details +The hardware specification is documented on the page +[HPC Resources](hardware_overview.md#large-smp-system-hpe-superdome-flex). -- Hostname: `taurussmp8` -- Access to all shared filesystems -- Slurm partition `julia` -- 32 x Intel(R) Xeon(R) Platinum 8276M CPU @ 2.20GHz (28 cores) -- 48 TB RAM (usable: 47 TB - one TB is used for cache coherence protocols) -- 370 TB of fast NVME storage available at `/nvme/<projectname>` - -## Local Temporary NVMe Storage +## Local Temporary on NVMe Storage There are 370 TB of NVMe devices installed. For immediate access for all projects, a volume of 87 TB of fast NVMe storage is available at `/nvme/1/<projectname>`. A quota of @@ -28,7 +22,13 @@ project's quota can be increased or dedicated volumes of up to the full capacity - Granularity should be a socket (28 cores) - Can be used for OpenMP applications with large memory demands - To use OpenMPI it is necessary to export the following environment - variables, so that OpenMPI uses shared memory instead of Infiniband - for message transport. `export OMPI_MCA_pml=ob1;   export OMPI_MCA_mtl=^mxm` + variables, so that OpenMPI uses shared-memory instead of Infiniband + for message transport: + + ``` + export OMPI_MCA_pml=ob1 + export OMPI_MCA_mtl=^mxm + ``` + - Use `I_MPI_FABRICS=shm` so that Intel MPI doesn't even consider using Infiniband devices itself, but only shared-memory instead