From f15b6e221d0f382aeb824c600332211bff1ed174 Mon Sep 17 00:00:00 2001
From: Martin Schroschk <martin.schroschk@tu-dresden.de>
Date: Wed, 6 Dec 2023 18:19:38 +0100
Subject: [PATCH] Review Alpha page: state of stand-alone cluster and module
 load example

---
 .../docs/jobs_and_resources/alpha_centauri.md | 116 +++++++++++++-----
 1 file changed, 86 insertions(+), 30 deletions(-)

diff --git a/doc.zih.tu-dresden.de/docs/jobs_and_resources/alpha_centauri.md b/doc.zih.tu-dresden.de/docs/jobs_and_resources/alpha_centauri.md
index 7dacca7f5..1e4d6cced 100644
--- a/doc.zih.tu-dresden.de/docs/jobs_and_resources/alpha_centauri.md
+++ b/doc.zih.tu-dresden.de/docs/jobs_and_resources/alpha_centauri.md
@@ -1,10 +1,46 @@
 # GPU Cluster Alpha Centauri
 
-The multi-GPU sub-cluster "Alpha Centauri" has been installed for AI-related computations (ScaDS.AI).
+The multi-GPU cluster `Alpha Centauri` has been installed for AI-related computations (ScaDS.AI).
 
 The hardware specification is documented on the page
 [HPC Resources](hardware_overview.md#alpha-centauri).
 
+## Becoming a Stand-Alone Cluster
+
+The former HPC system Taurus is partly switched-off and partly split up into separate clusters
+until the end of 2023. One such upcoming separate cluster is what you have known as partition
+`alpha` so far. With the end of the maintenance at November 30 2023, `Alpha Centauri` is now a
+stand-alone cluster with
+
+* homogenous hardware resources incl. two login nodes `login[1-2].alpha.hpc.tu-dresden.de`,
+* and own Slurm batch system.
+
+### Filesystems
+
+Your new `/home` directory (from `Barnard`) is also your `/home` on `Alpha Centauri`.
+If you have not
+[migrated your `/home` from Taurus to your **new** `/home` on Barnard](barnard.md#data-management-and-data-transfer)
+, please do so as soon as possible!
+
+!!! warning "Current limititations w.r.t. filesystems"
+
+    For now, `Alpha Centauri` will not be integrated in the InfiniBand fabric of Barnard.  With this
+    comes a dire restriction: **the only work filesystems for Alpha Centauri** will be the `/beegfs`
+    filesystems. (`/scratch` and `/lustre/ssd` are not usable any longer.)
+
+    Please, prepare your
+    stage-in/stage-out workflows using our [datamovers](../data_transfer/datamover.md) to enable the
+    work with larger datasets that might be stored on Barnard’s new capacity filesystem
+    `/data/walrus`. The datamover commands are not yet running. Thus, you need to use them from
+    Barnard!
+
+    The new Lustre filesystems, namely `horse` and `walrus`, will be mounted as soon as `Alpha` is
+    recabled (planned for May 2024).
+
+!!! warning "Current limititations w.r.t. workspace management"
+
+    Workspace management commands do not work for `beegfs` yet. (Use them from Taurus!)
+
 ## Usage
 
 !!! note
@@ -23,48 +59,68 @@ cores are available per node.
 ### Modules
 
 The easiest way is using the [module system](../software/modules.md).
-The software for the cluster `alpha` is available in module environment `modenv/hiera`.
+All software available from the module system has been specifically build for the cluster `Alpha`
+i.e., with optimzation for Zen2 microarchitecture and CUDA-support enabled.
 
-To check the available modules for `modenv/hiera`, use the command
+To check the available modules for `Alpha`, use the command
 
 ```console
-marie@alpha$ module spider <module_name>
+marie@login.alpha$ module spider <module_name>
 ```
 
-For example, to check whether PyTorch is available in version 1.7.1:
+??? example "Searching and loading PyTorch"
 
-```console
-marie@alpha$ module spider PyTorch/1.7.1
+    For example, to check which `PyTorch` versions are available you can invoke
 
------------------------------------------------------------------------------------------------------------------------------------------
-  PyTorch: PyTorch/1.7.1
------------------------------------------------------------------------------------------------------------------------------------------
-    Description:
-      Tensors and Dynamic neural networks in Python with strong GPU acceleration. PyTorch is a deep learning framework that puts Python
-      first.
+    ```console
+    marie@login.alpha$ module spider PyTorch
+    -------------------------------------------------------------------------------------------------------------------------
+      PyTorch:
+    -------------------------------------------------------------------------------------------------------------------------
+        Description:
+          Tensors and Dynamic neural networks in Python with strong GPU acceleration. PyTorch is a deep learning framework
+          that puts Python first.
 
+         Versions:
+            PyTorch/1.12.0
+            PyTorch/1.12.1-CUDA-11.7.0
+            PyTorch/1.12.1
+    [...]
+    ```
 
-    You will need to load all module(s) on any one of the lines below before the "PyTorch/1.7.1" module is available to load.
+    Not all modules can be loaded directly. Most modules are build with a certain compiler or toolchain
+    that need to be loaded beforehand.
+    Luckely, the module system can tell us, what we need to do for a specific module or software version
 
-      modenv/hiera  GCC/10.2.0  CUDA/11.1.1  OpenMPI/4.0.5
+    ```console
+    marie@login.alpha$ module spider PyTorch/1.12.1-CUDA-11.7.0
 
-[...]
-```
+    -------------------------------------------------------------------------------------------------------------------------
+      PyTorch: PyTorch/1.12.1-CUDA-11.7.0
+    -------------------------------------------------------------------------------------------------------------------------
+        Description:
+          Tensors and Dynamic neural networks in Python with strong GPU acceleration. PyTorch is a deep learning framework
+          that puts Python first.
 
-The output of `module spider <module_name>` provides hints which dependencies should be loaded beforehand:
 
-```console
-marie@alpha$ module load modenv/hiera GCC/10.2.0 CUDA/11.1.1 OpenMPI/4.0.5
-Module GCC/10.2.0, CUDA/11.1.1, OpenMPI/4.0.5 and 15 dependencies loaded.
-marie@alpha$ module avail PyTorch
--------------------------------------- /sw/modules/hiera/all/MPI/GCC-CUDA/10.2.0-11.1.1/OpenMPI/4.0.5 ---------------------------------------
-   PyTorch/1.7.1 (L)    PyTorch/1.9.0 (D)
-marie@alpha$ module load PyTorch/1.7.1
-Module PyTorch/1.7.1 and 39 dependencies loaded.
-marie@alpha$ python -c "import torch; print(torch.__version__); print(torch.cuda.is_available())"
-1.7.1
-True
-```
+        You will need to load all module(s) on any one of the lines below before the "PyTorch/1.12.1" module is available to load.
+
+          release/23.04  GCC/11.3.0  OpenMPI/4.1.4
+    [...]
+    ```
+
+    Finaly, the commandline to load the `PyTorch/1.12.1-CUDA-11.7.0` module is
+
+    ```console
+    marie@login.alpha$ module load release/23.04  GCC/11.3.0  OpenMPI/4.1.4 PyTorch/1.12.1-CUDA-11.7.0
+    Module GCC/11.3.0, OpenMPI/4.1.4, PyTorch/1.12.1-CUDA-11.7.0 and 64 dependencies loaded.
+    ```
+
+    ```console
+    marie@login.alpha$ python -c "import torch; print(torch.__version__); print(torch.cuda.is_available())"
+    1.12.1
+    True
+    ```
 
 ### Python Virtual Environments
 
-- 
GitLab