From 154c1383b37e8d836d557b758e2ce5c3fdc91609 Mon Sep 17 00:00:00 2001 From: Alexander Grund <alexander.grund@tu-dresden.de> Date: Wed, 6 Mar 2024 10:01:23 +0100 Subject: [PATCH] Check and correct uses of `partition` in changed places --- .../docs/jobs_and_resources/slurm_examples.md | 2 +- doc.zih.tu-dresden.de/docs/software/distributed_training.md | 4 ++-- doc.zih.tu-dresden.de/docs/software/modules.md | 5 +++-- doc.zih.tu-dresden.de/docs/software/power_ai.md | 4 ++-- doc.zih.tu-dresden.de/docs/software/virtual_machines.md | 2 +- 5 files changed, 9 insertions(+), 8 deletions(-) diff --git a/doc.zih.tu-dresden.de/docs/jobs_and_resources/slurm_examples.md b/doc.zih.tu-dresden.de/docs/jobs_and_resources/slurm_examples.md index b6e9e2408..1dcc8f1d9 100644 --- a/doc.zih.tu-dresden.de/docs/jobs_and_resources/slurm_examples.md +++ b/doc.zih.tu-dresden.de/docs/jobs_and_resources/slurm_examples.md @@ -196,7 +196,7 @@ When `srun` is used within a submission script, it inherits parameters from `sba `--ntasks=1`, `--cpus-per-task=4`, etc. So we actually implicitly run the following ```bash -srun --ntasks=1 --cpus-per-task=4 [...] --partition=power9 some-gpu-application +srun --ntasks=1 --cpus-per-task=4 [...] some-gpu-application ``` Now, our goal is to run four instances of this program concurrently in a single batch script. Of diff --git a/doc.zih.tu-dresden.de/docs/software/distributed_training.md b/doc.zih.tu-dresden.de/docs/software/distributed_training.md index df7652493..a1d1f0ee5 100644 --- a/doc.zih.tu-dresden.de/docs/software/distributed_training.md +++ b/doc.zih.tu-dresden.de/docs/software/distributed_training.md @@ -208,8 +208,8 @@ parameter `--ntasks-per-node=<N>` equals the number of GPUs you use per node. Also, it can be useful to increase `memory/cpu` parameters if you run larger models. Memory can be set up to: -- `--mem=250G` and `--cpus-per-task=7` for the partition `power9`. -- `--mem=900G` and `--cpus-per-task=6` for the partition `alpha`. +- `--mem=250G` and `--cpus-per-task=7` for the `Power9` cluster. +- `--mem=900G` and `--cpus-per-task=6` for the `Alpha` cluster. Keep in mind that only one memory parameter (`--mem-per-cpu=<MB>` or `--mem=<MB>`) can be specified. diff --git a/doc.zih.tu-dresden.de/docs/software/modules.md b/doc.zih.tu-dresden.de/docs/software/modules.md index 88379b8cd..1f26e8896 100644 --- a/doc.zih.tu-dresden.de/docs/software/modules.md +++ b/doc.zih.tu-dresden.de/docs/software/modules.md @@ -338,10 +338,11 @@ So the concept if this hierarchical toolchains is already built into this module ## Per-Architecture Builds Since we have a heterogeneous cluster, we do individual builds of the software for each -architecture present. This ensures that, no matter what partition the software runs on, a build +architecture present. +This ensures that, no matter what partition/cluster the software runs on, a build optimized for the host architecture is used automatically. -However, not every module will be available for each node type or partition. +However, not every module will be available on all clusters. Use `ml av` or `ml spider` to search for modules available on the sub-cluster you are on. ## Advanced Usage diff --git a/doc.zih.tu-dresden.de/docs/software/power_ai.md b/doc.zih.tu-dresden.de/docs/software/power_ai.md index e4578ba69..a4fc430ff 100644 --- a/doc.zih.tu-dresden.de/docs/software/power_ai.md +++ b/doc.zih.tu-dresden.de/docs/software/power_ai.md @@ -5,7 +5,7 @@ the PowerAI Framework for Machine Learning. In the following the links are valid for PowerAI version 1.5.4. !!! warning - The information provided here is available from IBM and can be used on partition `power9` only! + The information provided here is available from IBM and can be used on the `Power9` cluster only! ## General Overview @@ -47,7 +47,7 @@ are valid for PowerAI version 1.5.4. (Open Neural Network Exchange) provides support for moving models between those frameworks. - [Distributed Deep Learning](https://www.ibm.com/support/knowledgecenter/SS5SF7_1.5.4/navigation/pai_getstarted_ddl.html?view=kc) - Distributed Deep Learning (DDL). Works on up to 4 nodes on partition `power9`. + Distributed Deep Learning (DDL). Works on up to 4 nodes on cluster `Power9`. ## PowerAI Container diff --git a/doc.zih.tu-dresden.de/docs/software/virtual_machines.md b/doc.zih.tu-dresden.de/docs/software/virtual_machines.md index dcd240778..d8391dc2d 100644 --- a/doc.zih.tu-dresden.de/docs/software/virtual_machines.md +++ b/doc.zih.tu-dresden.de/docs/software/virtual_machines.md @@ -47,7 +47,7 @@ times till it succeeds. bash-4.2$ cat /tmp/marie_2759627/activate #!/bin/bash -if ! grep -q -- "Key for the VM on the partition power9" "/home/marie/.ssh/authorized_keys" > /dev/null; then +if ! grep -q -- "Key for the VM on the cluster power" "/home/marie/.ssh/authorized_keys" > /dev/null; then cat "/tmp/marie_2759627/kvm.pub" >> "/home/marie/.ssh/authorized_keys" else sed -i "s|.*Key for the VM on the cluster power.*|ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQC3siZfQ6vQ6PtXPG0RPZwtJXYYFY73TwGYgM6mhKoWHvg+ZzclbBWVU0OoU42B3Ddofld7TFE8sqkHM6M+9jh8u+pYH4rPZte0irw5/27yM73M93q1FyQLQ8Rbi2hurYl5gihCEqomda7NQVQUjdUNVc6fDAvF72giaoOxNYfvqAkw8lFyStpqTHSpcOIL7pm6f76Jx+DJg98sXAXkuf9QK8MurezYVj1qFMho570tY+83ukA04qQSMEY5QeZ+MJDhF0gh8NXjX/6+YQrdh8TklPgOCmcIOI8lwnPTUUieK109ndLsUFB5H0vKL27dA2LZ3ZK+XRCENdUbpdoG2Czz Key for the VM on the cluster power|" "/home/marie/.ssh/authorized_keys" -- GitLab