diff --git a/doc.zih.tu-dresden.de/docs/data_lifecycle/working.md b/doc.zih.tu-dresden.de/docs/data_lifecycle/working.md index 6c514c7737e7fe70e30e68c39a73942c2ed0706b..87b6e58f0d1adbaea9a91cbabddd829f35ad9414 100644 --- a/doc.zih.tu-dresden.de/docs/data_lifecycle/working.md +++ b/doc.zih.tu-dresden.de/docs/data_lifecycle/working.md @@ -10,6 +10,7 @@ performance and permanence. | `Lustre` | `/data/walrus` | 20 PB | global | Only accessible via [Workspaces](workspaces.md). For moderately low bandwidth, low IOPS. Mounted read-only on compute nodes. | | `WEKAio` | `/data/weasel` | 1 PB | global (w/o Power) | *Coming 2024!* For high IOPS | | `ext4` | `/tmp` | 95 GB | node local | Systems: tbd. Is cleaned up after the job automatically. | +| `WEKAio` | `/data/cat` | 1 PB | shared on Capella | For high IOPS. Only available on Capella. | ## Recommendations for Filesystem Usage diff --git a/doc.zih.tu-dresden.de/docs/data_transfer/datamover.md b/doc.zih.tu-dresden.de/docs/data_transfer/datamover.md index 26388b51b16fdd6781eeff79a1051f7544d1be60..7e20ab7c07eb2656414b4d3f2ceb169928482768 100644 --- a/doc.zih.tu-dresden.de/docs/data_transfer/datamover.md +++ b/doc.zih.tu-dresden.de/docs/data_transfer/datamover.md @@ -31,11 +31,12 @@ To identify the mount points of the different filesystems on the data transfer m | Directory on Datamover | Mounting Clusters | Directory on Cluster | |:----------- |:--------- |:-------- | -| `/home` | Alpha,Barnard,Julia,Power9,Romeo | `/home` | -| `/projects` | Alpha,Barnard,Julia,Power9,Romeo | `/projects` | -| `/data/horse` | Alpha,Barnard,Julia,Power9,Romeo | `/data/horse` | -| `/data/walrus` | Alpha,Barnard,Julia,Power9 | `/data/walrus` | -| `/data/octopus` | Alpha,Barnard,Power9,Romeo | `/data/octopus` | +| `/home` | Alpha,Barnard,Capella,Julia,Power9,Romeo | `/home` | +| `/projects` | Alpha,Barnard,Capella,Julia,Power9,Romeo | `/projects` | +| `/data/horse` | Alpha,Barnard,Capella,Julia,Power9,Romeo | `/data/horse` | +| `/data/walrus` | Alpha,Barnard,Capella,Julia,Power9 | `/data/walrus` | +| `/data/octopus` | Alpha,Barnard,Capella,Power9,Romeo | `/data/octopus` | +| `/data/cat` | Capella | `/data/cat` | | `/data/archiv` | | | ## Usage of Datamover diff --git a/doc.zih.tu-dresden.de/docs/jobs_and_resources/capella.md b/doc.zih.tu-dresden.de/docs/jobs_and_resources/capella.md new file mode 100644 index 0000000000000000000000000000000000000000..9da289d33652233c9c091261dc1b5b713b2c9931 --- /dev/null +++ b/doc.zih.tu-dresden.de/docs/jobs_and_resources/capella.md @@ -0,0 +1,63 @@ +# GPU Cluster Alpha Capella + +The multi-GPU cluster `Capella` has been installed for AI-related computations and traditional +HPC simulations. + +The hardware specification is documented on the page +[HPC Resources](hardware_overview.md#capella). + +## Filesystems + +Capella has a fast WEKAio file system mounted on `/data/cat`. It is only mounted on Capella and the +[Datamover nodes](../data_transfer/datamover.md). +It should be used as the main working file system on Capella. +Although all other [filesystems](../data_lifecycle/file_systems.md) +(`/home`, `/software`, `/data/horse`, `/data/walrus`, etc.) are also available. + +### Modules + +The easiest way is using the [module system](../software/modules.md). +All software available from the module system has been specifically build for the cluster `Alpha` +i.e., with optimization for Zen4 (Genoa) microarchitecture and CUDA-support enabled. + +To check the available modules for `Capella`, use the command + +```console +marie@login.capella$ module spider <module_name> +``` + +??? example "Example: Searching and loading PyTorch" + + For example, to check which `PyTorch` versions are available you can invoke + + ```console + marie@login.capella$ module spider PyTorch + + ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ + PyTorch: PyTorch/2.1.2-CUDA-12.1.1 + ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ + Description: + Tensors and Dynamic neural networks in Python with strong GPU acceleration. PyTorch is a deep learning framework that puts Python first. + + + You will need to load all module(s) on any one of the lines below before the "PyTorch/2.1.2-CUDA-12.1.1" module is available to load. + + release/24.04 GCC/12.3.0 OpenMPI/4.1.5 + + Help: + Description + =========== + Tensors and Dynamic neural networks in Python with strong GPU acceleration. + PyTorch is a deep learning framework that puts Python first. + + + More information + ================ + - Homepage: https://pytorch.org/ + ``` + + ```console + marie@login.alpha$ python -c "import torch; print(torch.__version__); print(torch.cuda.is_available())" + 2.1.12 + True + ```