diff --git a/doc.zih.tu-dresden.de/docs/access/jupyterhub.md b/doc.zih.tu-dresden.de/docs/access/jupyterhub.md index 3627573d09bca98013810b4861f8b1751240cdc9..79894da0d17ca651d3d402965128e7fe3d68c0b4 100644 --- a/doc.zih.tu-dresden.de/docs/access/jupyterhub.md +++ b/doc.zih.tu-dresden.de/docs/access/jupyterhub.md @@ -29,9 +29,6 @@ case. JupyterHub is available at [https://jupyterhub.hpc.tu-dresden.de](https://jupyterhub.hpc.tu-dresden.de). -Old taurus -[https://taurus.hrsk.tu-dresden.de/jupyter](https://taurus.hrsk.tu-dresden.de/jupyter). - ## Login Page At login page please use your ZIH credentials (without @tu-dresden.de). @@ -60,6 +57,9 @@ We have created three profiles for each cluster, namely: | Barnard | 1 core, 1.5 GB, 1 hour | x86_64 (Intel) | Python programming | | Barnard | 2 core, 3 GB, 4 hours | x86_64 (Intel) | Julia and R programming | | Barnard | 4 core, 8 GB, 8 hours | x86_64 (Intel) | | +| Capella | 1 core, 1.5 GB, 1 hour | x86_64 (AMD) | Python programming | +| Capella | 2 core, 3 GB, 4 hours | x86_64 (AMD) | R programming | +| Capella | 4 core, 8 GB, 8 hours | x86_64 (AMD) | | | Romeo | 1 core, 1.5 GB 1 hour | x86_64 (AMD) | Python programming | | Romeo | 2 core, 3 GB, 4 hours | x86_64 (AMD) | R programming | | Romeo | 4 core, 8 GB, 8 hours | x86_64 (AMD) | | @@ -106,6 +106,7 @@ At the following table it's possible to see what is available at each cluster. |----------------------|--------|----|---------|--------|------------|--------|-----| | Alpha | - | OK | OK | OK | OK* | OK* | - | | Barnard | OK | OK | OK | OK | OK* | OK* | - | +| Capella | OK | OK | OK | OK | - | - | - | | Romeo | - | OK | OK | OK | OK* | OK* | - | | VIS | OK | OK | OK | OK | OK* | OK* | OK | diff --git a/doc.zih.tu-dresden.de/docs/access/jupyterhub_custom_environments.md b/doc.zih.tu-dresden.de/docs/access/jupyterhub_custom_environments.md index 44e162a0d9377447a99694de53e2a3e4b22edecc..8e9f42b4ac9afaeda4a4f868802ea1ad81bfeb57 100644 --- a/doc.zih.tu-dresden.de/docs/access/jupyterhub_custom_environments.md +++ b/doc.zih.tu-dresden.de/docs/access/jupyterhub_custom_environments.md @@ -18,6 +18,7 @@ will work. Depending on that hardware, allocate resources as follows. |------------|-------------------------| | Alpha | x86_64 (AMD) | | Barnard | x86_64 (Intel) | +| Capella | x86_64 (AMD) | | Romeo | x86_64 (AMD) | | Power9 | ppc64le (IBM) | diff --git a/doc.zih.tu-dresden.de/docs/access/jupyterlab.md b/doc.zih.tu-dresden.de/docs/access/jupyterlab.md index 93d4f883ac9b6eb900adb46ff84185fa70e3f6ae..b1c200b15bd881089a306e7dc5840102596a9f6a 100644 --- a/doc.zih.tu-dresden.de/docs/access/jupyterlab.md +++ b/doc.zih.tu-dresden.de/docs/access/jupyterlab.md @@ -23,7 +23,7 @@ Wait until you receive a message with further instructions on how to connect to ```console Message from marie@login on <no tty> at 14:22 ... - At your local machine, run: + At your local machine, run: ssh marie@login.<cluster>.hpc.tu-dresden.de -NL 8946:<node>:8138 and point your browser to http://localhost:8946/?token=M7SHy...5HnsY...GMaRj0e2X @@ -33,7 +33,7 @@ Wait until you receive a message with further instructions on how to connect to ### Access with X11 forwarding -=== "alpha" +=== "Alpha Centauri" ```console marie@local$ ssh -XC marie@login1.alpha.hpc.tu-dresden.de @@ -44,7 +44,7 @@ Wait until you receive a message with further instructions on how to connect to marie@compute$ jupyter lab -y ``` -=== "barnard" +=== "Barnard" ```console marie@local$ ssh -XC marie@login1.barnard.hpc.tu-dresden.de @@ -55,7 +55,18 @@ Wait until you receive a message with further instructions on how to connect to marie@compute$ jupyter lab -y ``` -=== "romeo" +=== "Capella" + + ```console + marie@local$ ssh -XC marie@login1.capella.hpc.tu-dresden.de + marie@login$ module load release/24.04 GCCcore/12.2.0 + marie@login$ module load Python/3.10.8 + marie@login$ source /software/util/JupyterLab/barnard/jupyterlab-4.0.4/bin/activate + marie@login$ srun --nodes=1 --ntasks=1 --cpus-per-task=4 --mem-per-cpu=8192 --x11 --pty --gres=gpu:1 bash -l + marie@compute$ jupyter lab -y + ``` + +=== "Romeo" ```console marie@local$ ssh -XC marie@login1.romeo.hpc.tu-dresden.de @@ -66,7 +77,7 @@ Wait until you receive a message with further instructions on how to connect to marie@compute$ jupyter lab -y ``` -=== "visualization" +=== "Visualization" ```console marie@local$ ssh -XC marie@login4.barnard.hpc.tu-dresden.de diff --git a/doc.zih.tu-dresden.de/docs/access/key_fingerprints.md b/doc.zih.tu-dresden.de/docs/access/key_fingerprints.md index 4b980ad5d0c400098b5da70c0a519afeb6b84417..a2002499745ad82f017a512014bcdc63c27ed31a 100644 --- a/doc.zih.tu-dresden.de/docs/access/key_fingerprints.md +++ b/doc.zih.tu-dresden.de/docs/access/key_fingerprints.md @@ -56,7 +56,7 @@ connecting, see example below.) ## Romeo -The cluster [`Romeo`](../jobs_and_resources/romeo.md) can be accessed via the two +The cluster [`Romeo`](../jobs_and_resources/hardware_overview.md#romeo) can be accessed via the two login nodes `login[1-2].romeo.hpc.tu-dresden.de`. (Please choose one concrete login node when connecting, see example below.) @@ -70,12 +70,28 @@ connecting, see example below.) |ED25519 | `MD5:e4:4e:7a:76:aa:87:da:17:92:b1:17:c6:a1:25:29:7e` | {: summary="List of valid fingerprints for Romeo login[1-2] node"} -## Alpha Centauri +## Capella -The cluster [`Alpha Centauri`](../jobs_and_resources/alpha_centauri.md) can be accessed via the two -login nodes `login[1-2].alpha.hpc.tu-dresden.de`. (Please choose one concrete login node when +The cluster [`Capella`](../jobs_and_resources/hardware_overview.md#capella) can be accessed via the two +login nodes `login[1-2].capella.hpc.tu-dresden.de`. (Please choose one concrete login node when connecting, see example below.) +| Key Type | Fingerprint | +|:---------|:------------------------------------------------------| +|RSA | `` | +|RSA | `` | +|ECDSA | `` | +|ECDSA | `` | +|ED25519 | `` | +|ED25519 | `` | +{: summary="List of valid fingerprints for Capella login[1-2] node"} + +## Alpha Centauri + +The cluster [`Alpha Centauri`](../jobs_and_resources/hardware_overview.md#alpha-centauri) can be +accessed via the two login nodes `login[1-2].alpha.hpc.tu-dresden.de`. (Please choose one concrete +login node when connecting, see example below.) + | Key Type | Fingerprint | |:---------|:------------------------------------------------------| | RSA | `SHA256:BvYEYJtIYDGr3U0up58q5F7aog7JA2RP+w53XKmwO8I` | @@ -88,7 +104,7 @@ connecting, see example below.) ## Julia -The cluster [`Julia`](../jobs_and_resources/julia.md) can be accessed via `julia.hpc.tu-dresden.de`. +The cluster [`Julia`](../jobs_and_resources/hardware_overview.md#julia) can be accessed via `julia.hpc.tu-dresden.de`. (Note, there is no separate login node.) | Key Type | Fingerprint | diff --git a/doc.zih.tu-dresden.de/docs/access/ssh_login.md b/doc.zih.tu-dresden.de/docs/access/ssh_login.md index 38f3b3046e4c2fc6bb6713fdabe5bed4f32c73cc..00cf3b7cb7e784777ea326a94d542a28158a13be 100644 --- a/doc.zih.tu-dresden.de/docs/access/ssh_login.md +++ b/doc.zih.tu-dresden.de/docs/access/ssh_login.md @@ -118,8 +118,8 @@ for more information on Dataport nodes. !!! note "Gernalization to all HPC systems" In the above `.ssh/config` file, the HPC system `Barnard` is chosen as an example. - The very same settings can be made for individuall or all ZIH systems, e.g. `Alpha`, `Julia`, - `Romeo` etc. + The very same settings can be made for individuall or all ZIH systems, e.g. `Capella`, `Alpha`, + `Julia`, `Romeo` etc. ## X11-Forwarding diff --git a/doc.zih.tu-dresden.de/docs/data_lifecycle/working.md b/doc.zih.tu-dresden.de/docs/data_lifecycle/working.md index 6c514c7737e7fe70e30e68c39a73942c2ed0706b..ad8cbbc93996c2fcb1ac777a496f886cc7286be7 100644 --- a/doc.zih.tu-dresden.de/docs/data_lifecycle/working.md +++ b/doc.zih.tu-dresden.de/docs/data_lifecycle/working.md @@ -10,6 +10,7 @@ performance and permanence. | `Lustre` | `/data/walrus` | 20 PB | global | Only accessible via [Workspaces](workspaces.md). For moderately low bandwidth, low IOPS. Mounted read-only on compute nodes. | | `WEKAio` | `/data/weasel` | 1 PB | global (w/o Power) | *Coming 2024!* For high IOPS | | `ext4` | `/tmp` | 95 GB | node local | Systems: tbd. Is cleaned up after the job automatically. | +| `WEKAio` | `/data/cat` | 1 PB | only Capella | For high IOPS. Only available on [`Capella`](../jobs_and_resources/hardware_overview.md#capella). | ## Recommendations for Filesystem Usage diff --git a/doc.zih.tu-dresden.de/docs/data_lifecycle/workspaces.md b/doc.zih.tu-dresden.de/docs/data_lifecycle/workspaces.md index 66fcbf7d83e8bb06ff82521afad82a927afe9ea9..0e671a3c5a2a34604bc476a96b45a3a9f065d647 100644 --- a/doc.zih.tu-dresden.de/docs/data_lifecycle/workspaces.md +++ b/doc.zih.tu-dresden.de/docs/data_lifecycle/workspaces.md @@ -35,6 +35,7 @@ renewals are provided in the following table. |:------------------------------------------------------------|---------------:|-----------:|---------:| | `horse` | 100 | 10 | 30 | | `walrus` | 100 | 10 | 60 | +| `cat` | | 2 | 30 | {: summary="Settings for Workspace Filesystems."} !!! note @@ -67,6 +68,16 @@ provides information which filesystem is available on which cluster. walrus ``` +=== "Capella" + + ```console + marie@login.capella$ ws_list -l + available filesystems: + horse + walrus + cat (default) + ``` + === "Romeo" ```console diff --git a/doc.zih.tu-dresden.de/docs/data_transfer/datamover.md b/doc.zih.tu-dresden.de/docs/data_transfer/datamover.md index 26388b51b16fdd6781eeff79a1051f7544d1be60..7e20ab7c07eb2656414b4d3f2ceb169928482768 100644 --- a/doc.zih.tu-dresden.de/docs/data_transfer/datamover.md +++ b/doc.zih.tu-dresden.de/docs/data_transfer/datamover.md @@ -31,11 +31,12 @@ To identify the mount points of the different filesystems on the data transfer m | Directory on Datamover | Mounting Clusters | Directory on Cluster | |:----------- |:--------- |:-------- | -| `/home` | Alpha,Barnard,Julia,Power9,Romeo | `/home` | -| `/projects` | Alpha,Barnard,Julia,Power9,Romeo | `/projects` | -| `/data/horse` | Alpha,Barnard,Julia,Power9,Romeo | `/data/horse` | -| `/data/walrus` | Alpha,Barnard,Julia,Power9 | `/data/walrus` | -| `/data/octopus` | Alpha,Barnard,Power9,Romeo | `/data/octopus` | +| `/home` | Alpha,Barnard,Capella,Julia,Power9,Romeo | `/home` | +| `/projects` | Alpha,Barnard,Capella,Julia,Power9,Romeo | `/projects` | +| `/data/horse` | Alpha,Barnard,Capella,Julia,Power9,Romeo | `/data/horse` | +| `/data/walrus` | Alpha,Barnard,Capella,Julia,Power9 | `/data/walrus` | +| `/data/octopus` | Alpha,Barnard,Capella,Power9,Romeo | `/data/octopus` | +| `/data/cat` | Capella | `/data/cat` | | `/data/archiv` | | | ## Usage of Datamover diff --git a/doc.zih.tu-dresden.de/docs/index.md b/doc.zih.tu-dresden.de/docs/index.md index 02ea407408a8a7e4f7151ef10faef788d2e63ca6..2c7041ffd5ddb516e1c2a01c3851f83d342cce49 100644 --- a/doc.zih.tu-dresden.de/docs/index.md +++ b/doc.zih.tu-dresden.de/docs/index.md @@ -31,6 +31,8 @@ Please also find out the other ways you could contribute in our ## News +* **2024-11-08** Early access phase of the + [new GPU cluster `Capella`](jobs_and_resources/capella.md) started * **2024-11-04** Slides from the HPC Introduction tutorial in October 2024 are available for [download now](misc/HPC-Introduction.pdf) diff --git a/doc.zih.tu-dresden.de/docs/jobs_and_resources/alpha_centauri.md b/doc.zih.tu-dresden.de/docs/jobs_and_resources/alpha_centauri.md index 3b2fb5502b7963bbe360598c05cdf5f7150a1e3a..986f7d93a305f239654a51193a1e25f41693bb0f 100644 --- a/doc.zih.tu-dresden.de/docs/jobs_and_resources/alpha_centauri.md +++ b/doc.zih.tu-dresden.de/docs/jobs_and_resources/alpha_centauri.md @@ -1,7 +1,11 @@ # GPU Cluster Alpha Centauri +## Overview + The multi-GPU cluster `Alpha Centauri` has been installed for AI-related computations (ScaDS.AI). +## Hardware Specification + The hardware specification is documented on the page [HPC Resources](hardware_overview.md#alpha-centauri). @@ -20,6 +24,8 @@ Since 5th July 2024, `Alpha Centauri` is fully integrated in the InfiniBand infr There is a total of 48 physical cores in each node. SMT is also active, so in total, 96 logical cores are available per node. +Each node on the cluster `Alpha` has 2x AMD EPYC CPUs, 8x NVIDIA +A100-SXM4 GPUs, 1 TB RAM and 3.5 TB local space (`/tmp`) on an NVMe device. !!! note diff --git a/doc.zih.tu-dresden.de/docs/jobs_and_resources/capella.md b/doc.zih.tu-dresden.de/docs/jobs_and_resources/capella.md new file mode 100644 index 0000000000000000000000000000000000000000..8cffc84b2975c65560a10f3d606a8abc41f72a2e --- /dev/null +++ b/doc.zih.tu-dresden.de/docs/jobs_and_resources/capella.md @@ -0,0 +1,123 @@ +# GPU Cluster Capella + +!!! warning "Acceptance phase" + + The cluster `Capella` is currently in the acceptance phase, i.e., + interruptions and reboots without notice, node failures are possible. Furthermore, the systems + configuration might be adjusted further. + + Do not yet move your "production" to `Capella`, but feel free to test it using moderate sized + workloads. Please read this page carefully to understand, what you need to adopt in your + existing workflows w.r.t. [filesystem](#filesystems), [software and + modules](#software-and-modules) and [batch jobs](#batchsystem). + + We highly appreciate your hints and would be pleased to receive your comments and experiences + regarding its operation via e-mail to + [hpc-support@tu-dresden.de](mailto:hpc-support@tu-dresden.de) using the subject + *Capella: <subject>*. + + Please understand that we are current priority is the acceptance, configuration and rollout of + the system. Consequently, we are unable to address any support requests at this time. + +## Overview + +The multi-GPU cluster `Capella` has been installed for AI-related computations and traditional +HPC simulations. Capella is fully integrated into the ZIH HPC infrastructure. +Therefore, the usage should be similar to the other clusters. + +## Hardware Specifications + +The hardware specification is documented on the page +[HPC Resources](hardware_overview.md#capella). + +## Access and Login Nodes + +You use `login[1-2].capella.hpc.tu-dresden.de` to access the cluster `Capella` from the campus +(or VPN) network. +In order to verify the SSH fingerprints of the login nodes, please refer to the page +[Key Fingerprints](../access/key_fingerprints.md#capella). + +On the login nodes you have access to the same filesystems and the software stack +as on the compute node. GPUs are **not** available there. + +In the subsections [Filesystems](#filesystems) and [Software and Modules](#software-and-modules) we +provide further information on these two topics. + +## Filesystems + +As with all other clusters, your `/home` directory is also available on `Capella`. +For reasons of convenience, the filesystems `horse` and `walrus` are also accessible. +Please note, that the filesystem `horse` **is not to be used** as working filesystem at the cluster +`Capella`. + +With `Capella` comes the **new filesystem `cat`** designed to meet the high I/O requirements of AI +and ML workflows. It is a WEKAio filesystem and mounted under `/data/cat`. It is **only available** +on the cluster `Capella` and the [Datamover nodes](../data_transfer/datamover.md). + +!!! hint "Main working filesystem is `cat`" + + The filesystem `cat` should be used as the + main working filesystem and has to be used with [workspaces](../data_lifecycle/file_systems.md). + Workspaces on the filesystem `cat` can only be created on the login and compute nodes, not on + the other clusters since `cat` is not available there. + +Although all other [filesystems](../data_lifecycle/workspaces.md) +(`/home`, `/software`, `/data/horse`, `/data/walrus`, etc.) are also available. + +!!! hint "Datatransfer to and from `/data/cat`" + + Please utilize the new filesystem `cat` as the working filesystem on `Capella`. It has limited + capacity, so we advise you to only hold hot data on `cat`. + To transfer input and result data from and to the filesystems `horse` and `walrus`, respectively, + you will need to use the [Datamover nodes](../data_transfer/datamover.md). Regardless of the + direction of transfer, you should pack your data into archives (,e.g., using `dttar` command) + for the transfer. + + **Do not** invoke data transfer to the filesystems `horse` and `walrus` from login nodes. + Both login nodes are part of the cluster. Failures, reboots and other work + might affect your data transfer resulting in data corruption. + +### Software and Modules + +The most straightforward method for utilizing the software is through the well-known +[module system](../software/modules.md). +All software available from the module system has been **specifically build** for the cluster +`Capella` i.e., with optimization for Zen4 (Genoa) microarchitecture and CUDA-support enabled. + +#### Python Virtual Environments + +[Virtual environments](../software/python_virtual_environments.md) allow you to install +additional Python packages and create an isolated runtime environment. We recommend using +`venv` for this purpose. + +!!! hint "Virtual environments in workspaces" + + We recommend to use [workspaces](../data_lifecycle/workspaces.md) for your virtual environments. + +### Batchsystem + +The batch system Slurm may be used as usual. Please refer to the page [Batch System Slurm](slurm.md) +for detailed information. In addition, the page [Job Examples](slurm_examples.md#requesting-gpus) +provides examples on GPU allocation with Slurm. + +You can find out about upcoming reservations (,e.g., for acceptance benchmarks) via `sinfo -T`. +Acceptance has priority, so your reservation requests can currently not be considered. + +!!! note "Slurm limits and job runtime" + + Although, each compute node is equipped with 64 CPU cores in total, only a **maximum of 56** can + be requested via Slurm + (cf. [Slurm Resource Limits Table](slurm_limits.md#slurm-resource-limits-table)). + + The **maximum runtime** of jobs and interactive sessions is currently 24 hours. However, to + allow for greater fluctuation in testing, please make the jobs shorter if possible. You can use + [Chain Jobs](slurm_examples.md#chain-jobs) to split a long running job exceeding the batch queues + limits into parts and chain these parts. Applications with build-in check-point-restart + functionality are very suitable for this approach! If your application provides + check-point-restart, please use `/data/cat` for temporary data. Remove these data afterwards! + +The partition `capella-interactive` can be used for your small tests and compilation of software. +You need to add `#SBATCH --partition=capella-interactive` to your jobfile and +`--partition=capella-interactive` to your `sbatch`, `srun` and `salloc` command line, respectively, +to address this partition. The partitions configuration might be adopted within acceptance phase. +You get the current settings via `scontrol show partitions capella-interactive`. diff --git a/doc.zih.tu-dresden.de/docs/jobs_and_resources/hardware_overview.md b/doc.zih.tu-dresden.de/docs/jobs_and_resources/hardware_overview.md index a7e41719deb1587eaadeb55f154efe1ae452a0be..f2be5d9f79c532b9d7feb2bfd05580353351f687 100644 --- a/doc.zih.tu-dresden.de/docs/jobs_and_resources/hardware_overview.md +++ b/doc.zih.tu-dresden.de/docs/jobs_and_resources/hardware_overview.md @@ -8,20 +8,6 @@ analytics, and artificial intelligence methods with extensive capabilities for e and performance monitoring provides ideal conditions to achieve the ambitious research goals of the users and the ZIH. -HPC resources at ZIH comprise a total of **six systems**: - -| Name | Description | Year of Installation | DNS | -| ----------------------------------- | ----------------------| -------------------- | --- | -| [`Capella`](#capella) | GPU cluster | 2024 | `c[1-144].capella.hpc.tu-dresden.de` | -| [`Barnard`](#barnard) | CPU cluster | 2023 | `n[1001-1630].barnard.hpc.tu-dresden.de` | -| [`Alpha Centauri`](#alpha-centauri) | GPU cluster | 2021 | `i[8001-8037].alpha.hpc.tu-dresden.de` | -| [`Julia`](#julia) | Single SMP system | 2021 | `julia.hpc.tu-dresden.de` | -| [`Romeo`](#romeo) | CPU cluster | 2020 | `i[7001-7186].romeo.hpc.tu-dresden.de` | -| [`Power9`](#power9) | IBM Power/GPU cluster | 2018 | `ml[1-29].power9.hpc.tu-dresden.de` | - -All clusters will run with their own [Slurm batch system](slurm.md) and job submission is possible -only from their respective login nodes. - ## Architectural Design Over the last decade we have been running our HPC system of high heterogeneity with a single @@ -38,10 +24,24 @@ permanent filesystems on the page [Filesystems](../data_lifecycle/file_systems.m  {: align=center} +HPC resources at ZIH comprise a total of **six systems**: + +| Name | Description | Year of Installation | DNS | +| ----------------------------------- | ----------------------| -------------------- | --- | +| [`Capella`](#capella) | GPU cluster | 2024 | `c[1-144].capella.hpc.tu-dresden.de` | +| [`Barnard`](#barnard) | CPU cluster | 2023 | `n[1001-1630].barnard.hpc.tu-dresden.de` | +| [`Alpha Centauri`](#alpha-centauri) | GPU cluster | 2021 | `i[8001-8037].alpha.hpc.tu-dresden.de` | +| [`Julia`](#julia) | Single SMP system | 2021 | `julia.hpc.tu-dresden.de` | +| [`Romeo`](#romeo) | CPU cluster | 2020 | `i[7001-7186].romeo.hpc.tu-dresden.de` | +| [`Power9`](#power9) | IBM Power/GPU cluster | 2018 | `ml[1-29].power9.hpc.tu-dresden.de` | + +All clusters will run with their own [Slurm batch system](slurm.md) and job submission is possible +only from their respective login nodes. + ## Login and Dataport Nodes - Login-Nodes - - Individual for each cluster. See sections below. + - Individual for each cluster. See the specifics in each cluster chapter. - 2 Data-Transfer-Nodes - 2 servers without interactive login, only available via file transfer protocols (`rsync`, `ftp`) @@ -52,8 +52,7 @@ permanent filesystems on the page [Filesystems](../data_lifecycle/file_systems.m ## Barnard -The cluster `Barnard` is a general purpose cluster by Bull. It is based on Intel Sapphire Rapids -CPUs. +The cluster `Barnard` is a general purpose cluster by Bull. It is based on Intel Sapphire Rapids CPUs. - 630 nodes, each with - 2 x Intel Xeon Platinum 8470 (52 cores) @ 2.00 GHz, Multithreading enabled @@ -92,6 +91,7 @@ and is designed for AI and ML tasks. - Login nodes: `login[1-2].capella.hpc.tu-dresden.de` - Hostnames: `c[1-144].capella.hpc.tu-dresden.de` - Operating system: Alma Linux 9.4 +- Further information on the usage is documented on the site [GPU Cluster Capella](capella.md) ## Romeo diff --git a/doc.zih.tu-dresden.de/docs/jobs_and_resources/julia.md b/doc.zih.tu-dresden.de/docs/jobs_and_resources/julia.md index e193e54aaad3da39147e379474ecd094486393b7..f95287ec52565e542ed1b4335d4b751ae005ed4b 100644 --- a/doc.zih.tu-dresden.de/docs/jobs_and_resources/julia.md +++ b/doc.zih.tu-dresden.de/docs/jobs_and_resources/julia.md @@ -1,5 +1,7 @@ # SMP Cluster Julia +## Overview + The HPE Superdome Flex is a large shared memory node. It is especially well suited for data intensive application scenarios, for example to process extremely large data sets completely in main memory or in very fast NVMe memory. diff --git a/doc.zih.tu-dresden.de/docs/jobs_and_resources/misc/architecture_2024.png b/doc.zih.tu-dresden.de/docs/jobs_and_resources/misc/architecture_2024.png index 5d4a57dd95206fddb7677e46d0ae350d081f741f..54f9f8172319f350572bf4800c85a98ebbcf794c 100644 Binary files a/doc.zih.tu-dresden.de/docs/jobs_and_resources/misc/architecture_2024.png and b/doc.zih.tu-dresden.de/docs/jobs_and_resources/misc/architecture_2024.png differ diff --git a/doc.zih.tu-dresden.de/docs/jobs_and_resources/overview.md b/doc.zih.tu-dresden.de/docs/jobs_and_resources/overview.md index 9fcb3e1a1c285b8c54ed7af7c2f5d1ff077ac28f..dd7b090ea2da95b24a5765fc3b1c4f65bda23db7 100644 --- a/doc.zih.tu-dresden.de/docs/jobs_and_resources/overview.md +++ b/doc.zih.tu-dresden.de/docs/jobs_and_resources/overview.md @@ -17,11 +17,13 @@ components. ## Selection of Suitable Hardware -The six clusters `Capella`, [`Barnard`](hardware_overview.md#barnard), -[`Alpha Centauri`](alpha_centauri.md), -[`Romeo`](romeo.md), +The six clusters +[`Barnard`](hardware_overview.md#barnard), +[`Alpha Centauri`](hardware_overview.md#alpha-centauri), +[`Capella`](hardware_overview.md#capella), +[`Romeo`](hardware_overview.md#romeo), [`Power9`](hardware_overview.md#power9) and -[`Julia`](julia.md) +[`Julia`](hardware_overview.md#julia) differ, among others, in number of nodes, cores per node, and GPUs and memory. The particular [characteristica](hardware_overview.md) qualify them for different applications. diff --git a/doc.zih.tu-dresden.de/docs/jobs_and_resources/power9.md b/doc.zih.tu-dresden.de/docs/jobs_and_resources/power9.md index 915a93c521de0f78ab6594571240da805eb63c20..113c258ae7be7a967fb87638648d889f428089a1 100644 --- a/doc.zih.tu-dresden.de/docs/jobs_and_resources/power9.md +++ b/doc.zih.tu-dresden.de/docs/jobs_and_resources/power9.md @@ -1,5 +1,7 @@ # GPU Cluster Power9 +## Overview + The multi-GPU cluster `Power9` was installed in 2018. Until the end of 2023, it was available as partition `power` within the now decommissioned `Taurus` system. With the decommission of `Taurus`, `Power9` has been re-engineered and is now a homogeneous, standalone cluster with own @@ -7,9 +9,34 @@ partition `power` within the now decommissioned `Taurus` system. With the decomm ## Hardware Resources -The hardware specification is documented on the page [HPC Resources](hardware_overview.md#power9). +The hardware specification of the cluster `Power9` is documented on the page +[HPC Resources](hardware_overview.md#power9). + +We provide additional architectural information in the following. +The compute nodes of the cluster `Power9` are built on the base of +[Power9 architecture](https://www.ibm.com/it-infrastructure/power/power9) from IBM. +The system was created for AI challenges, analytics and working with data-intensive workloads and +accelerated databases. + +The main feature of the nodes is the ability to work with the +[NVIDIA Tesla V100](https://www.nvidia.com/en-gb/data-center/tesla-v100/) GPU with **NV-Link** +support that allows a total bandwidth with up to 300 GB/s. Each node on the +cluster `Power9` has six Tesla V100 GPUs. You can find a detailed specification of the cluster in our +[Power9 documentation](../jobs_and_resources/hardware_overview.md#power9). + +!!! note + + The cluster `Power9` is based on the PPC64 architecture, which means that the software built + for x86_64 will not work on this cluster. ## Usage +### Containers + If you want to use containers on `Power9`, please refer to the page -[Singularity for Power9 Architecuture](../software/singularity_power9.md). +[Singularity for Power9 Architecture](../software/singularity_power9.md). + +### Power AI + +There are tools provided by IBM, that work on cluster `Power9` and are related to AI tasks. +For more information see our [Power AI documentation](../software/power_ai.md). diff --git a/doc.zih.tu-dresden.de/docs/jobs_and_resources/slurm_examples.md b/doc.zih.tu-dresden.de/docs/jobs_and_resources/slurm_examples.md index 66d06f557678db40abfad7a39cac328b62d12e71..37b7a74bc410217255182cee45e38ad64d9ffb98 100644 --- a/doc.zih.tu-dresden.de/docs/jobs_and_resources/slurm_examples.md +++ b/doc.zih.tu-dresden.de/docs/jobs_and_resources/slurm_examples.md @@ -109,8 +109,8 @@ But, do you need to request tasks or CPUs from Slurm in order to provide resourc Slurm will allocate one or many GPUs for your job if requested. Please note that GPUs are only available in the GPU clusters, like -[Alpha Centauri](hardware_overview.md#alpha-centauri) and -[Power9](hardware_overview.md#power9). +[`Alpha`](hardware_overview.md#alpha-centauri), [`Capella`](hardware_overview.md#capella) +and [`Power9`](hardware_overview.md#power9). The option for `sbatch/srun` in this case is `--gres=gpu:[NUM_PER_NODE]`, where `NUM_PER_NODE` is the number of GPUs **per node** that will be used for the job. @@ -144,7 +144,8 @@ This is because we do not wish that GPUs become unusable due to all cores on a n a single job which does not, at the same time, request all GPUs. E.g., if you specify `--gres=gpu:2`, your total number of cores per node (meaning: -`ntasks`*`cpus-per-task`) may not exceed 12 on [`Alpha Centauri`](alpha_centauri.md). +`ntasks`*`cpus-per-task`) may not exceed 12 on [`Alpha`](alpha_centauri.md) or on +[`Capella`](capella.md). Note that this also has implications for the use of the `--exclusive` parameter. Since this sets the number of allocated cores to the maximum, you also **must** request all GPUs @@ -168,8 +169,8 @@ srun: error: Unable to allocate resources: Job violates accounting/QOS policy (j ### Running Multiple GPU Applications Simultaneously in a Batch Job Our starting point is a (serial) program that needs a single GPU and four CPU cores to perform its -task (e.g. TensorFlow). The following batch script shows how to run such a job on the cluster -`Power9`. +task (e.g. TensorFlow). The following batch script shows how to run such a job on any of +the GPU clusters `Power9`, `Alpha` or `Capella`. !!! example @@ -181,7 +182,6 @@ task (e.g. TensorFlow). The following batch script shows how to run such a job o #SBATCH --gpus-per-task=1 #SBATCH --time=01:00:00 #SBATCH --mem-per-cpu=1443 - #SBATCH --partition=power9 srun some-gpu-application ``` @@ -224,7 +224,6 @@ three things: #SBATCH --gpus-per-task=1 #SBATCH --time=01:00:00 #SBATCH --mem-per-cpu=1443 - #SBATCH --partition=power9 srun --exclusive --gres=gpu:1 --ntasks=1 --cpus-per-task=4 --gpus-per-task=1 --mem-per-cpu=1443 some-gpu-application & srun --exclusive --gres=gpu:1 --ntasks=1 --cpus-per-task=4 --gpus-per-task=1 --mem-per-cpu=1443 some-gpu-application & @@ -251,7 +250,7 @@ enough resources in total were specified in the header of the batch script. Jobs on ZIH systems run, by default, in shared-mode, meaning that multiple jobs (from different users) can run at the same time on the same compute node. Sometimes, this behavior is not desired -(e.g. for benchmarking purposes). You can request for exclusive usage of resources using the Slurm +(e.g. for benchmarking purposes). You can request for exclusive usage of resources using the Slurm parameter `--exclusive`. !!! note "Exclusive does not allocate all available resources" diff --git a/doc.zih.tu-dresden.de/docs/jobs_and_resources/slurm_limits.md b/doc.zih.tu-dresden.de/docs/jobs_and_resources/slurm_limits.md index 1105dac4cfa60b68ebbf9ebc02679b47eed85dee..9360a7fc023ea26ed3f3de5d94bcb1ed921415f6 100644 --- a/doc.zih.tu-dresden.de/docs/jobs_and_resources/slurm_limits.md +++ b/doc.zih.tu-dresden.de/docs/jobs_and_resources/slurm_limits.md @@ -74,13 +74,14 @@ operating system and other components reside in the main memory, lowering the av jobs. The reserved amount of memory for the system operation might vary slightly over time. The following table depicts the resource limits for [all our HPC systems](hardware_overview.md). -| HPC System | Nodes | # Nodes | Cores per Node | Threads per Core | Memory per Node [in MB] | Memory per (SMT) Core [in MB] | GPUs per Node | Cores per GPU | Job Max Time | +| HPC System | Nodes | # Nodes | Cores per Node | Threads per Core | Memory per Node [in MB] | Memory per (SMT) Core [in MB] | GPUs per Node | Cores per GPU | Job Max Time [in days] | |:-----------|:------|--------:|---------------:|-----------------:|------------------------:|------------------------------:|--------------:|--------------:|-------------:| -| [`Barnard`](hardware_overview.md#barnard) | `n[1001-1630].barnard` | 630 | 104 | 2 | 515,000 | 4,951 | - | - | unlimited | -| [`Alpha Centauri`](alpha_centauri.md) | `i[8001-8037].alpha` | 37 | 48 | 2 | 990,000 | 10,312 | 8 | 6 | unlimited | -| [`Julia`](julia.md) | `julia` | 1 | 896 | 1 | 48,390,000 | 54,006 | - | - | unlimited | -| [`Romeo`](romeo.md) | `i[7001-7186].romeo` | 186 | 128 | 2 | 505,000 | 1,972 | - | - | unlimited | -| [`Power9`](hardware_overview.md#power9) | `ml[1-29].power9` | 29 | 44 | 4 | 254,000 | 1,443 | 6 | - | unlimited | +| [`Capella`](hardware_overview.md#capella) | `c[1-144].capella` | 144 | 56 | 1 | 768,000 | 13,438 | 4 | 14 | 1 | +| [`Barnard`](hardware_overview.md#barnard) | `n[1001-1630].barnard` | 630 | 104 | 2 | 515,000 | 4,951 | - | - | unlimited | +| [`Alpha Centauri`](hardware_overview.md#alpha-centauri) | `i[8001-8037].alpha` | 37 | 48 | 2 | 990,000 | 10,312 | 8 | 6 | unlimited | +| [`Julia`](hardware_overview.md#julia) | `julia` | 1 | 896 | 1 | 48,390,000 | 54,006 | - | - | unlimited | +| [`Romeo`](hardware_overview.md#romeo) | `i[7001-7186].romeo` | 186 | 128 | 2 | 505,000 | 1,972 | - | - | unlimited | +| [`Power9`](hardware_overview.md#power9) | `ml[1-29].power9` | 29 | 44 | 4 | 254,000 | 1,443 | 6 | - | unlimited | {: summary="Slurm resource limits table" align="bottom"} All HPC systems have Simultaneous Multithreading (SMT) enabled. You request for this diff --git a/doc.zih.tu-dresden.de/docs/software/compilers.md b/doc.zih.tu-dresden.de/docs/software/compilers.md index 1ee33edd78d2394f69d90b3478f3167bf2869b72..25ae9c7cf1ab13b5825b5649de301459a529587e 100644 --- a/doc.zih.tu-dresden.de/docs/software/compilers.md +++ b/doc.zih.tu-dresden.de/docs/software/compilers.md @@ -71,6 +71,7 @@ The following matrix shows proper compiler flags for the architectures at the ZI |------------|--------------------|----------------------|----------------------|-----| | [`Alpha Centauri`](../jobs_and_resources/alpha_centauri.md) | AMD Rome | `-march=znver2` | `-march=core-avx2` | `-tp=zen2` | | [`Barnard`](../jobs_and_resources/hardware_overview.md#barnard) | Intel Sapphire Rapids | `-march=sapphirerapids` | `-march=core-sapphirerapids` | | +| [`Capella`](../jobs_and_resources/capella.md) | AMD Genoa | `-march=znver4` | | `-tp=zen4` | | [`Julia`](../jobs_and_resources/julia.md) | Intel Cascade Lake | `-march=cascadelake` | `-march=cascadelake` | `-tp=cascadelake` | | [`Romeo`](../jobs_and_resources/romeo.md) | AMD Rome | `-march=znver2` | `-march=core-avx2` | `-tp=zen2` | | All x86 | Host's architecture | `-march=native` | `-xHost` or `-march=native` | `-tp=host` | diff --git a/doc.zih.tu-dresden.de/docs/software/data_analytics_with_r.md b/doc.zih.tu-dresden.de/docs/software/data_analytics_with_r.md index 7874b0fb110a435fad9660286ce159ae57ca60a6..f28e32a553ac2a78e49dc546ff459d7b274e8fb6 100644 --- a/doc.zih.tu-dresden.de/docs/software/data_analytics_with_r.md +++ b/doc.zih.tu-dresden.de/docs/software/data_analytics_with_r.md @@ -62,8 +62,9 @@ marie@compute$ R -e 'install.packages("ggplot2")' The deep learning frameworks perform extremely fast when run on accelerators such as GPU. Therefore, using nodes with built-in GPUs, e.g., clusters -[Power9](../jobs_and_resources/hardware_overview.md) -and [Alpha](../jobs_and_resources/alpha_centauri.md), is beneficial for the examples here. +[`Capella`](../jobs_and_resources/hardware_overview.md#capella), +[`Alpha`](../jobs_and_resources/hardware_overview.md#alpha-centauri) and +[`Power9`](../jobs_and_resources/hardware_overview.md#power9) is beneficial for the examples here. ### R Interface to TensorFlow diff --git a/doc.zih.tu-dresden.de/docs/software/gpu_programming.md b/doc.zih.tu-dresden.de/docs/software/gpu_programming.md index dc45a045cafbbd2777845fbb7a648fd542e82345..5832c15bb4d40779089035bc68e830036b26c393 100644 --- a/doc.zih.tu-dresden.de/docs/software/gpu_programming.md +++ b/doc.zih.tu-dresden.de/docs/software/gpu_programming.md @@ -30,7 +30,7 @@ For general information on how to use Slurm, read the respective [page in this c When allocating resources on a GPU-node, you must specify the number of requested GPUs by using the `--gres=gpu:<N>` option, like this: -=== "cluster `alpha`" +=== "cluster `Alpha` or `Capella`" ```bash #!/bin/bash # Batch script starts with shebang line diff --git a/doc.zih.tu-dresden.de/docs/software/machine_learning.md b/doc.zih.tu-dresden.de/docs/software/machine_learning.md index 13a823e0fe7311469537f011416296b6c85dc116..2ec53fed230283dce7a9397c8ed4a6a966213f4b 100644 --- a/doc.zih.tu-dresden.de/docs/software/machine_learning.md +++ b/doc.zih.tu-dresden.de/docs/software/machine_learning.md @@ -1,45 +1,23 @@ # Machine Learning -This is an introduction of how to run machine learning applications on ZIH systems. -For machine learning purposes, we recommend to use the cluster `alpha` and/or `power`. +This is an introduction of how to run machine learning (ML) applications on ZIH systems. +We recommend using the GPU clusters `Alpha`, `Capella`, and `Power9` for machine learning purposes. +The hardware specification of each cluster can be found in the page +[HPC Resources](../jobs_and_resources/hardware_overview.md). -## Cluster: `power` +## Modules -The compute nodes of the cluster `power` are built on the base of -[Power9 architecture](https://www.ibm.com/it-infrastructure/power/power9) from IBM. The system was created -for AI challenges, analytics and working with data-intensive workloads and accelerated databases. +The way of loading modules is identical on each cluster. Here we show an example + on the cluster `Alpha` how to load the module environment: -The main feature of the nodes is the ability to work with the -[NVIDIA Tesla V100](https://www.nvidia.com/en-gb/data-center/tesla-v100/) GPU with **NV-Link** -support that allows a total bandwidth with up to 300 GB/s. Each node on the -cluster `power` has 6x Tesla V-100 GPUs. You can find a detailed specification of the cluster in our -[Power9 documentation](../jobs_and_resources/hardware_overview.md). +```console +marie@alpha$ module load release/23.04 +``` !!! note - The cluster `power` is based on the Power9 architecture, which means that the software built - for x86_64 will not work on this cluster. - -### Power AI - -There are tools provided by IBM, that work on cluster `power` and are related to AI tasks. -For more information see our [Power AI documentation](power_ai.md). - -## Cluster: Alpha - -Another cluster for machine learning tasks is `alpha`. It is mainly dedicated to -[ScaDS.AI](https://scads.ai/) topics. Each node on the cluster `alpha` has 2x AMD EPYC CPUs, 8x NVIDIA -A100-SXM4 GPUs, 1 TB RAM and 3.5 TB local space (`/tmp`) on an NVMe device. You can find more -details of the cluster in our [Alpha Centauri](../jobs_and_resources/alpha_centauri.md) -documentation. - -### Modules - -On the cluster `alpha` load the module environment: - -```console -[marie@alpha ]$ module load release/23.04 -``` + Software and their available versions may differ among the clusters. Check the available + modules with `module spider <module_name>` ## Machine Learning via Console @@ -67,7 +45,7 @@ create documents containing live code, equations, visualizations, and narrative TensorFlow or PyTorch) on ZIH systems and to run your Jupyter notebooks on HPC nodes. After accessing JupyterHub, you can start a new session and configure it. For machine learning -purposes, select either cluster `alpha` or `power` and the resources, your application requires. +purposes, select either cluster `Alpha`, `Capella` or `Power9` and the resources, your application requires. In your session you can use [Python](data_analytics_with_python.md#jupyter-notebooks), [R](data_analytics_with_r.md#r-in-jupyterhub) or [RStudio](data_analytics_with_rstudio.md) for your @@ -88,13 +66,13 @@ the [PowerAI container](https://hub.docker.com/r/ibmcom/powerai/) DockerHub repo You could find other versions of software in the container on the "tag" tab on the Docker web page of the container. -In the following example, we build a Singularity container with TensorFlow from the DockerHub and -start it: +In the following example, we build a Singularity container on the cluster `Power9` + with TensorFlow from the DockerHub and start it: ```console -marie@ml$ singularity build my-ML-container.sif docker://ibmcom/powerai:1.6.2-tensorflow-ubuntu18.04-py37-ppc64le #create a container from the DockerHub with TensorFlow version 1.6.2 +marie@power9$ singularity build my-ML-container.sif docker://ibmcom/powerai:1.6.2-tensorflow-ubuntu18.04-py37-ppc64le #create a container from the DockerHub with TensorFlow version 1.6.2 [...] -marie@ml$ singularity run --nv my-ML-container.sif #run my-ML-container.sif container supporting the Nvidia's GPU. You can also work with your container by: singularity shell, singularity exec +marie@power9$ singularity run --nv my-ML-container.sif #run my-ML-container.sif container supporting the Nvidia's GPU. You can also work with your container by: singularity shell, singularity exec [...] ``` diff --git a/doc.zih.tu-dresden.de/docs/software/pytorch.md b/doc.zih.tu-dresden.de/docs/software/pytorch.md index efbce5ac4265fbd5f85eb113ac22891662be8362..5f3228f64c1e376a4495983bc188ee7c7a7c090b 100644 --- a/doc.zih.tu-dresden.de/docs/software/pytorch.md +++ b/doc.zih.tu-dresden.de/docs/software/pytorch.md @@ -15,8 +15,8 @@ marie@login$ module spider pytorch to find out, which PyTorch modules are available. -We recommend using the cluster `alpha` and/or `power` when working with machine learning workflows -and the PyTorch library. +We recommend using the cluster `Alpha`, `Capella` and/or `Power9` when + working with machine learning workflows and the PyTorch library. You can find detailed hardware specification in our [hardware documentation](../jobs_and_resources/hardware_overview.md). @@ -49,46 +49,46 @@ Module PyTorch/1.12.1-CUDA-11.7.0 and 42 dependencies loaded. Using the **--no-deps** option for "pip install" is necessary here as otherwise the PyTorch version might be replaced and you will run into trouble with the CUDA drivers. -On the cluster `power`: +On the cluster `Power9`: ```console # Job submission in power nodes with 1 gpu on 1 node with 800 Mb per CPU -marie@login.power$ srun --gres=gpu:1 -n 1 -c 7 --pty --mem-per-cpu=800 bash +marie@login.power9$ srun --gres=gpu:1 -n 1 -c 7 --pty --mem-per-cpu=800 bash ``` After calling ```console -marie@login.power$ module spider pytorch +marie@login.power9$ module spider pytorch ``` we know that we can load PyTorch (including torchvision) with ```console -marie@power$ module load release/23.04 GCC/11.3.0 OpenMPI/4.1.4 torchvision/0.13.1 +marie@power9$ module load release/23.04 GCC/11.3.0 OpenMPI/4.1.4 torchvision/0.13.1 Modules GCC/11.3.0, OpenMPI/4.1.4, torchvision/0.13.1 and 62 dependencies loaded. ``` Now, we check that we can access PyTorch: ```console -marie@{power,alpha}$ python -c "import torch; print(torch.__version__)" +marie@power9 python -c "import torch; print(torch.__version__)" ``` The following example shows how to create a python virtual environment and import PyTorch. ```console # Create folder -marie@power$ mkdir python-environments +marie@power9$ mkdir python-environments # Check which python are you using -marie@power$ which python +marie@power9$ which python /sw/installed/Python/3.7.4-GCCcore-8.3.0/bin/python # Create virtual environment "env" which inheriting with global site packages -marie@power$ virtualenv --system-site-packages python-environments/env +marie@power9$ virtualenv --system-site-packages python-environments/env [...] # Activate virtual environment "env". Example output: (env) bash-4.2$ -marie@power$ source python-environments/env/bin/activate -marie@power$ python -c "import torch; print(torch.__version__)" +marie@power9$ source python-environments/env/bin/activate +marie@power9$ python -c "import torch; print(torch.__version__)" ``` ## PyTorch in JupyterHub diff --git a/doc.zih.tu-dresden.de/docs/software/tensorflow.md b/doc.zih.tu-dresden.de/docs/software/tensorflow.md index eb644266350860ee34df6bc29ff0ccd7077a5cab..206212222ad5387690911fb1014363ff9a42f677 100644 --- a/doc.zih.tu-dresden.de/docs/software/tensorflow.md +++ b/doc.zih.tu-dresden.de/docs/software/tensorflow.md @@ -17,13 +17,14 @@ to find out, which TensorFlow modules are available on your cluster. On ZIH systems, TensorFlow 2 is the default module version. For compatibility hints between TensorFlow 2 and TensorFlow 1, see the corresponding [section below](#compatibility-tf2-and-tf1). -We recommend using the clusters `alpha` and/or `power` when working with machine learning workflows -and the TensorFlow library. You can find detailed hardware specification in our +We recommend using the clusters `Alpha`, `Capella` and/or `Power9` when working with machine +learning workflows and the TensorFlow library. You can find detailed hardware specification in our [Hardware](../jobs_and_resources/hardware_overview.md) documentation. +Available software may differ among the clusters. ## TensorFlow Console -On the cluster `alpha`, load the module environment: +On the cluster `Alpha`, load the module environment: ```console marie@alpha$ module load release/23.04 diff --git a/doc.zih.tu-dresden.de/mkdocs.yml b/doc.zih.tu-dresden.de/mkdocs.yml index bc76a7c6e8b9f00d306b2fd9bf86c0a4846689d4..3b7f3f8171042830a881ccd7013bd1a80446a42b 100644 --- a/doc.zih.tu-dresden.de/mkdocs.yml +++ b/doc.zih.tu-dresden.de/mkdocs.yml @@ -101,6 +101,7 @@ nav: - HPC Resources: - Overview: jobs_and_resources/hardware_overview.md - GPU Cluster Alpha Centauri: jobs_and_resources/alpha_centauri.md + - GPU Cluster Capella: jobs_and_resources/capella.md - SMP Cluster Julia: jobs_and_resources/julia.md - CPU Cluster Romeo: jobs_and_resources/romeo.md - GPU Cluster Power9: jobs_and_resources/power9.md diff --git a/doc.zih.tu-dresden.de/wordlist.aspell b/doc.zih.tu-dresden.de/wordlist.aspell index 48c6daedbdd9475dc7418cde1aeee93522218b0a..23b96778659d78de2d4a1f6045959718a7f26f71 100644 --- a/doc.zih.tu-dresden.de/wordlist.aspell +++ b/doc.zih.tu-dresden.de/wordlist.aspell @@ -233,7 +233,7 @@ mem Memcheck MFlop MiB -Microarchitecture +microarchitecture MIMD Miniconda mkdocs @@ -482,6 +482,7 @@ VPN VRs walltime WebVNC +WEKAio WinSCP WML Workdir