diff --git a/doc.zih.tu-dresden.de/docs/jobs_and_resources/migration_2023.md b/doc.zih.tu-dresden.de/docs/jobs_and_resources/migration_2023.md index c1098361618300a996abccb614ba8ccabae41658..1d93d1d038796928b1b596c6acd619f3d8d67ba1 100644 --- a/doc.zih.tu-dresden.de/docs/jobs_and_resources/migration_2023.md +++ b/doc.zih.tu-dresden.de/docs/jobs_and_resources/migration_2023.md @@ -20,7 +20,7 @@ More details can be found in the [overview](/jobs_and_resources/hardware_overvie Over the last decade we have been running our HPC system of high heterogeneity with a single Slurm batch system. This made things very complicated, especially to inexperienced users. -To lower this hurdle we now create homogenous clusters with their own Slurm instances and with +To lower this hurdle we now create homogeneous clusters with their own Slurm instances and with cluster specific login nodes running on the same CPU. Job submission is possible only from within the cluster (compute or login node). @@ -36,7 +36,7 @@ all operating system will be updated to the same versions of OS, Mellanox and Lu With this all application software was re-built consequently using GIT and CI for handling the multitude of versions. -We start with `release/23.10` which is based on software reqeusts from user feedbacks of our +We start with `release/23.10` which is based on software requests from user feedbacks of our HPC users. Most major software versions exist on all hardware platforms. ## Migration Path @@ -49,15 +49,15 @@ of the action items. |When?|TODO ZIH |TODO users |Remark | |---|---|---|---| -| done (May 2023) |first sync /scratch to /data/horse/old_scratch2| |copied 4 PB in about 3 weeks| +| done (May 2023) |first sync `/scratch` to `/data/horse/old_scratch2`| |copied 4 PB in about 3 weeks| | done (June 2023) |enable access to Barnard| |initialized LDAP tree with Taurus users| | done (July 2023) | |install new software stack|tedious work | | ASAP | |adapt scripts|new Slurm version, new resources, no partitions| | August 2023 | |test new software stack on Barnard|new versions sometimes require different prerequisites| -| August 2023| |test new software stack on other clusters|a few nodes will be made available with the new sw stack, but with the old filesystems| +| August 2023| |test new software stack on other clusters|a few nodes will be made available with the new software stack, but with the old filesystems| | ASAP | |prepare data migration|The small filesystems `/beegfs` and `/lustre/ssd`, and `/home` are mounted on the old systems "until the end". They will *not* be migrated to the new system.| | July 2023 | sync `/warm_archive` to new hardware| |using datamover nodes with Slurm jobs | -| September 2023 |prepare recabling of older hardware (Bull)| |integrate other clusters in the IB infrastructure | +| September 2023 |prepare re-cabling of older hardware (Bull)| |integrate other clusters in the IB infrastructure | | Autumn 2023 |finalize integration of other clusters (Bull)| |**~2 days downtime**, final rsync and migration of `/projects`, `/warm_archive`| | Autumn 2023 ||transfer last data from old filesystems | `/beegfs`, `/lustre/scratch`, `/lustre/ssd` are no longer available on the new systems|