diff --git a/doc.zih.tu-dresden.de/docs/jobs_and_resources/slurm_examples.md b/doc.zih.tu-dresden.de/docs/jobs_and_resources/slurm_examples.md index b6e9e2408f74a7083bb9f4bad0e13cfcf7cb3ce6..1dcc8f1d995ac0b467d3ecce67eea9879e38215c 100644 --- a/doc.zih.tu-dresden.de/docs/jobs_and_resources/slurm_examples.md +++ b/doc.zih.tu-dresden.de/docs/jobs_and_resources/slurm_examples.md @@ -196,7 +196,7 @@ When `srun` is used within a submission script, it inherits parameters from `sba `--ntasks=1`, `--cpus-per-task=4`, etc. So we actually implicitly run the following ```bash -srun --ntasks=1 --cpus-per-task=4 [...] --partition=power9 some-gpu-application +srun --ntasks=1 --cpus-per-task=4 [...] some-gpu-application ``` Now, our goal is to run four instances of this program concurrently in a single batch script. Of diff --git a/doc.zih.tu-dresden.de/docs/software/distributed_training.md b/doc.zih.tu-dresden.de/docs/software/distributed_training.md index df76524930c2e41cbc7e00a0ec2b35865d722030..a1d1f0ee5797003cae77c4433c9d6037276e7439 100644 --- a/doc.zih.tu-dresden.de/docs/software/distributed_training.md +++ b/doc.zih.tu-dresden.de/docs/software/distributed_training.md @@ -208,8 +208,8 @@ parameter `--ntasks-per-node=<N>` equals the number of GPUs you use per node. Also, it can be useful to increase `memory/cpu` parameters if you run larger models. Memory can be set up to: -- `--mem=250G` and `--cpus-per-task=7` for the partition `power9`. -- `--mem=900G` and `--cpus-per-task=6` for the partition `alpha`. +- `--mem=250G` and `--cpus-per-task=7` for the `Power9` cluster. +- `--mem=900G` and `--cpus-per-task=6` for the `Alpha` cluster. Keep in mind that only one memory parameter (`--mem-per-cpu=<MB>` or `--mem=<MB>`) can be specified. diff --git a/doc.zih.tu-dresden.de/docs/software/modules.md b/doc.zih.tu-dresden.de/docs/software/modules.md index 88379b8cd0970c40eacca58f52d871e677e08d5e..1f26e889631030fa0bfae9b2a22f9ad421379e0c 100644 --- a/doc.zih.tu-dresden.de/docs/software/modules.md +++ b/doc.zih.tu-dresden.de/docs/software/modules.md @@ -338,10 +338,11 @@ So the concept if this hierarchical toolchains is already built into this module ## Per-Architecture Builds Since we have a heterogeneous cluster, we do individual builds of the software for each -architecture present. This ensures that, no matter what partition the software runs on, a build +architecture present. +This ensures that, no matter what partition/cluster the software runs on, a build optimized for the host architecture is used automatically. -However, not every module will be available for each node type or partition. +However, not every module will be available on all clusters. Use `ml av` or `ml spider` to search for modules available on the sub-cluster you are on. ## Advanced Usage diff --git a/doc.zih.tu-dresden.de/docs/software/power_ai.md b/doc.zih.tu-dresden.de/docs/software/power_ai.md index e4578ba6982eb875bfd45e6dc5bdac09c19111ed..a4fc430fff59b646530b974a98254513e02fe645 100644 --- a/doc.zih.tu-dresden.de/docs/software/power_ai.md +++ b/doc.zih.tu-dresden.de/docs/software/power_ai.md @@ -5,7 +5,7 @@ the PowerAI Framework for Machine Learning. In the following the links are valid for PowerAI version 1.5.4. !!! warning - The information provided here is available from IBM and can be used on partition `power9` only! + The information provided here is available from IBM and can be used on the `Power9` cluster only! ## General Overview @@ -47,7 +47,7 @@ are valid for PowerAI version 1.5.4. (Open Neural Network Exchange) provides support for moving models between those frameworks. - [Distributed Deep Learning](https://www.ibm.com/support/knowledgecenter/SS5SF7_1.5.4/navigation/pai_getstarted_ddl.html?view=kc) - Distributed Deep Learning (DDL). Works on up to 4 nodes on partition `power9`. + Distributed Deep Learning (DDL). Works on up to 4 nodes on cluster `Power9`. ## PowerAI Container diff --git a/doc.zih.tu-dresden.de/docs/software/virtual_machines.md b/doc.zih.tu-dresden.de/docs/software/virtual_machines.md index dcd240778fbd0a8b668aa27372f3c260131e59c0..d8391dc2d0eed5e5ff71ac4f2d25f8e85c2cbd67 100644 --- a/doc.zih.tu-dresden.de/docs/software/virtual_machines.md +++ b/doc.zih.tu-dresden.de/docs/software/virtual_machines.md @@ -47,7 +47,7 @@ times till it succeeds. bash-4.2$ cat /tmp/marie_2759627/activate #!/bin/bash -if ! grep -q -- "Key for the VM on the partition power9" "/home/marie/.ssh/authorized_keys" > /dev/null; then +if ! grep -q -- "Key for the VM on the cluster power" "/home/marie/.ssh/authorized_keys" > /dev/null; then cat "/tmp/marie_2759627/kvm.pub" >> "/home/marie/.ssh/authorized_keys" else sed -i "s|.*Key for the VM on the cluster power.*|ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQC3siZfQ6vQ6PtXPG0RPZwtJXYYFY73TwGYgM6mhKoWHvg+ZzclbBWVU0OoU42B3Ddofld7TFE8sqkHM6M+9jh8u+pYH4rPZte0irw5/27yM73M93q1FyQLQ8Rbi2hurYl5gihCEqomda7NQVQUjdUNVc6fDAvF72giaoOxNYfvqAkw8lFyStpqTHSpcOIL7pm6f76Jx+DJg98sXAXkuf9QK8MurezYVj1qFMho570tY+83ukA04qQSMEY5QeZ+MJDhF0gh8NXjX/6+YQrdh8TklPgOCmcIOI8lwnPTUUieK109ndLsUFB5H0vKL27dA2LZ3ZK+XRCENdUbpdoG2Czz Key for the VM on the cluster power|" "/home/marie/.ssh/authorized_keys"