diff --git a/doc.zih.tu-dresden.de/docs/index.md b/doc.zih.tu-dresden.de/docs/index.md index 5eb83dcd0dff764e5a582b0e696ac7a202a9b0a8..3835bc8551395bf409c6d79c655ee110492fd5d7 100644 --- a/doc.zih.tu-dresden.de/docs/index.md +++ b/doc.zih.tu-dresden.de/docs/index.md @@ -31,6 +31,7 @@ Please also find out the other ways you could contribute in our ## News +* **2024-04-19** [Maintenance at `Alpha Centauri` - User action required!](jobs_and_resources/alpha_centauri.md#recabling-maintenance) * **2024-02-05** [New Support Form: HPC Café Questions & Answers](support/support.md#open-qa-sessions) ([to the Event](https://tu-dresden.de/zih/qa-sessions-nhr-at-tud)) * **2024-02-05** [New JupyterHub now available](access/jupyterhub.md) diff --git a/doc.zih.tu-dresden.de/docs/jobs_and_resources/alpha_centauri.md b/doc.zih.tu-dresden.de/docs/jobs_and_resources/alpha_centauri.md index 4d5efef8f255b571b40eebdd247875c101b05742..e74a67eef78cdae2e9705b978858e8fc615d1cce 100644 --- a/doc.zih.tu-dresden.de/docs/jobs_and_resources/alpha_centauri.md +++ b/doc.zih.tu-dresden.de/docs/jobs_and_resources/alpha_centauri.md @@ -5,6 +5,53 @@ The multi-GPU cluster `Alpha Centauri` has been installed for AI-related computa The hardware specification is documented on the page [HPC Resources](hardware_overview.md#alpha-centauri). +## Recabling Maintenance + +!!! warning "User Action Required" + + Please read the following information carefully and follow the provided instructions. + +We are in the +[process of becoming `Alpha Centauri` a Stand-Alone Cluster](#becoming-a-stand-alone-cluster). +Planned now is the integration of the cluster into the InfiniBand infrastructure +of the new cluster [`Barnard`](barnard.md). + +!!! hint "Maintenance Work" + + On **June 4+5**, we will shut down and migrate `Alpha Centauri` to the Barnard Infiniband + infrastructure. + +As consequences, + +* BeeGFS will no longer be available, +* all `Barnard` filesystems (`/home`, `/software`, `/data/horse`, `/data/walrus`) can be + used normally. + +For your convenience, we already have started migrating your data from `/beegfs` to +`/data/horse/beegfs`. Starting with the downtime, we again synchronize these data. + +!!! hint "User Action Required" + + The less we have to synchronize the faster the overall process. So clean-up + as much as possible as soon as possible. + +Important for your work is: + +* Do not add terabytes of data to `/beegfs` if you cannot "consume" it before June 4. +* After final successful data transfer to `/data/horse/beegfs` you then have to + move it to normal workspaces on `/data/horse`. +* Be prepared to adapt your workflows to the new paths. + +What happens afterward: + + * complete deletion of all user data in `/beegfs` + * complete recabling of the storage nodes (BeeGFS hardware) + * Software+Firmware updates + * set-up of a new WEKA filesystem for high I/O demands on the same hardware + +In case of any question regarding this maintenance or required action, please do not hesitate to +contact the [HPC support team](../support/support.md). + ## Becoming a Stand-Alone Cluster The former HPC system Taurus is partly switched-off and partly split up into separate clusters @@ -12,7 +59,7 @@ until the end of 2023. One such upcoming separate cluster is what you have known `alpha` so far. With the end of the maintenance at November 30 2023, `Alpha Centauri` is now a stand-alone cluster with -* homogenous hardware resources incl. two login nodes `login[1-2].alpha.hpc.tu-dresden.de`, +* homogeneous hardware resources incl. two login nodes `login[1-2].alpha.hpc.tu-dresden.de`, * and own [Slurm batch system](slurm.md). ### Filesystems @@ -22,7 +69,7 @@ If you have not [migrated your `/home` from Taurus to your **new** `/home` on Barnard](barnard.md#data-management-and-data-transfer) , please do so as soon as possible! -!!! warning "Current limititations w.r.t. filesystems" +!!! warning "Current limitations w.r.t. filesystems" For now, `Alpha Centauri` will not be integrated in the InfiniBand fabric of Barnard. With this comes a dire restriction: **the only work filesystems for Alpha Centauri** will be the `/beegfs` @@ -37,7 +84,7 @@ If you have not The new Lustre filesystems, namely `horse` and `walrus`, will be mounted as soon as `Alpha` is recabled (planned for May 2024). -!!! warning "Current limititations w.r.t. workspace management" +!!! warning "Current limitations w.r.t. workspace management" Workspace management commands do not work for `beegfs` yet. (Use them from Taurus!) @@ -60,7 +107,7 @@ cores are available per node. The easiest way is using the [module system](../software/modules.md). All software available from the module system has been specifically build for the cluster `Alpha` -i.e., with optimzation for Zen2 microarchitecture and CUDA-support enabled. +i.e., with optimization for Zen2 microarchitecture and CUDA-support enabled. To check the available modules for `Alpha`, use the command