fixes for using clusters intead partitions

4622418b · Guilherme Calandrini · 04120b8c · 4622418b · 4622418b · 4622418b
Commit 4622418b authored 1 year ago by Guilherme Calandrini
--- a/doc.zih.tu-dresden.de/docs/contrib/content_rules.md
+++ b/doc.zih.tu-dresden.de/docs/contrib/content_rules.md
@@ -212,16 +212,16 @@ there is a list of conventions w.r.t. spelling and technical wording.
 | ZIH system(s) | Taurus, HRSK II, our HPC systems, etc. |
 | workspace | work space |
 |       | HPC-DA |
-| partition `ml` | ML partition, ml partition, `ml` partition, "ml" partition, etc. |
+| cluster `romeo` | ROMEO cluster, romeo cluster, `romeo` cluster, "romeo" cluster, etc. |

 ### Code Blocks and Command Prompts

 * Use ticks to mark code blocks and commands, not an italic font.
 * Specify language for code blocks ([see below](#code-blocks-and-syntax-highlighting)).
 * All code blocks and commands should be runnable from a login node or a node within a specific
-  partition (e.g., `ml`).
+  cluster (e.g., `alpha`).
 * It should be clear from the [prompt](#list-of-prompts), where the command is run (e.g., local
-  machine, login node, or specific partition).
+  machine, login node, or specific cluster).

 #### Code Blocks and Syntax Highlighting


--- a/doc.zih.tu-dresden.de/docs/software/gpu_programming.md
+++ b/doc.zih.tu-dresden.de/docs/software/gpu_programming.md
@@ -4,14 +4,12 @@

 The full hardware specifications of the GPU-compute nodes may be found in the
 [HPC Resources](../jobs_and_resources/hardware_overview.md#hpc-resources) page.
-Each node uses a different [module environment](modules.md#module-environments):
+Each node uses a different modules(modules.md#module-environments):

-* [NVIDIA Tesla K80 GPUs nodes](../jobs_and_resources/hardware_overview.md#island-2-phase-2-intel-haswell-cpus-nvidia-k80-gpus)
-(partition `gpu2`): use the default `scs5` module environment (`module switch modenv/scs5`).
-* [NVIDIA Tesla V100 nodes](../jobs_and_resources/hardware_overview.md#ibm-power9-nodes-for-machine-learning)
-(partition `ml`): use the `ml` module environment (`module switch modenv/ml`)
 * [NVIDIA A100 nodes](../jobs_and_resources/hardware_overview.md#amd-rome-cpus-nvidia-a100)
-(partition `alpha`): use the `hiera` module environment (`module switch modenv/hiera`)
+(cluster `alpha`): use the `hiera` module environment (`module switch modenv/hiera`)
+* [NVIDIA Tesla V100 nodes](../jobs_and_resources/hardware_overview.md#ibm-power9-nodes-for-machine-learning)
+(cluster `power9`): use the `module spider <module name>`

 ## Using GPUs with Slurm

@@ -19,45 +17,7 @@ For general information on how to use Slurm, read the respective [page in this c
 When allocating resources on a GPU-node, you must specify the number of requested GPUs by using the
 `--gres=gpu:<N>` option, like this:

-=== "partition `gpu2`"
-    ```bash
-    #!/bin/bash                           # Batch script starts with shebang line
-
-    #SBATCH --ntasks=1                    # All #SBATCH lines have to follow uninterrupted
-    #SBATCH --time=01:00:00               # after the shebang line
-    #SBATCH --account=<KTR>               # Comments start with # and do not count as interruptions
-    #SBATCH --job-name=fancyExp
-    #SBATCH --output=simulation-%j.out
-    #SBATCH --error=simulation-%j.err
-    #SBATCH --partition=gpu2
-    #SBATCH --gres=gpu:1                  # request GPU(s) from Slurm
-
-    module purge                          # Set up environment, e.g., clean modules environment
-    module switch modenv/scs5             # switch module environment
-    module load <modules>                 # and load necessary modules
-
-    srun ./application [options]          # Execute parallel application with srun
-    ```
-=== "partition `ml`"
-    ```bash
-    #!/bin/bash                           # Batch script starts with shebang line
-
-    #SBATCH --ntasks=1                    # All #SBATCH lines have to follow uninterrupted
-    #SBATCH --time=01:00:00               # after the shebang line
-    #SBATCH --account=<KTR>               # Comments start with # and do not count as interruptions
-    #SBATCH --job-name=fancyExp
-    #SBATCH --output=simulation-%j.out
-    #SBATCH --error=simulation-%j.err
-    #SBATCH --partition=ml
-    #SBATCH --gres=gpu:1                  # request GPU(s) from Slurm
-
-    module purge                          # Set up environment, e.g., clean modules environment
-    module switch modenv/ml               # switch module environment
-    module load <modules>                 # and load necessary modules
-
-    srun ./application [options]          # Execute parallel application with srun
-    ```
-=== "partition `alpha`"
+=== "cluster `alpha`"
    ```bash
    #!/bin/bash                           # Batch script starts with shebang line

@@ -67,20 +27,18 @@ When allocating resources on a GPU-node, you must specify the number of requeste
    #SBATCH --job-name=fancyExp
    #SBATCH --output=simulation-%j.out
    #SBATCH --error=simulation-%j.err
-    #SBATCH --partition=alpha
    #SBATCH --gres=gpu:1                  # request GPU(s) from Slurm

    module purge                          # Set up environment, e.g., clean modules environment
-    module switch modenv/hiera            # switch module environment
    module load <modules>                 # and load necessary modules

    srun ./application [options]          # Execute parallel application with srun
    ```

-Alternatively, you can work on the partitions interactively:
+Alternatively, you can work on the clusters interactively:

 ```bash
-marie@login$ srun --partition=<partition>-interactive --gres=gpu:<N> --pty bash
+marie@login.<cluster_name>$ srun --nodes=1 --gres=gpu:<N> --runtime=00:30:00 --pty bash
 marie@compute$ module purge; module switch modenv/<env>
 ```

@@ -149,7 +107,7 @@ the [official list](https://www.openmp.org/resources/openmp-compilers-tools/) fo
 Furthermore, some compilers, such as GCC, have basic support for target offloading, but do not
 enable these features by default and/or achieve poor performance.

-On the ZIH system, compilers with OpenMP target offloading support are provided on the partitions
+On the ZIH system, compilers with OpenMP target offloading support are provided on the clusters
 `ml` and `alpha`. Two compilers with good performance can be used: the NVIDIA HPC compiler and the
 IBM XL compiler.

@@ -167,7 +125,7 @@ available for OpenMP, including the `-gpu=ccXY` flag as mentioned above.
 #### Using OpenMP target offloading with the IBM XL compilers

 The IBM XL compilers (`xlc` for C, `xlc++` for C++ and `xlf` for Fortran (with sub-version for
-different versions of Fortran)) are only available on the partition `ml` with NVIDIA Tesla V100 GPUs.
+different versions of Fortran)) are only available on the cluster `ml` with NVIDIA Tesla V100 GPUs.
 They are available by default when switching to `modenv/ml`.

 * The `-qsmp -qoffload` combination of flags enables OpenMP target offloading support
@@ -328,8 +286,7 @@ documentation for a
 ### NVIDIA Nsight Compute

 Nsight Compute is used for the analysis of individual GPU-kernels. It supports GPUs from the Volta
-architecture onward (on the ZIH system: V100 and A100). Therefore, you cannot use Nsight Compute on
-the partition `gpu2`. If you are familiar with nvprof, you may want to consult the
+architecture onward (on the ZIH system: V100 and A100). If you are familiar with nvprof, you may want to consult the
 [Nvprof Transition Guide](https://docs.nvidia.com/nsight-compute/NsightComputeCli/index.html#nvprof-guide),
 as Nsight Compute uses a new scheme for metrics.
 We recommend those kernels as optimization targets that require a large portion of you run time,

--- a/doc.zih.tu-dresden.de/docs/software/hyperparameter_optimization.md
+++ b/doc.zih.tu-dresden.de/docs/software/hyperparameter_optimization.md
@@ -283,7 +283,7 @@ Using OmniOpt for a first trial example, it is often sufficient to concentrate o
 configuration parameters:

 1. **Optimization run name:** A name for an OmniOpt run given a belonging configuration.
-1. **Partition:** Choose the partition on the ZIH system that fits the programs' needs.
+1. **Partition:** Choose the cluster on the ZIH system that fits the programs' needs.
 1. **Enable GPU:** Decide whether a program could benefit from GPU usage or not.
 1. **Workdir:** The directory where OmniOpt is saving its necessary files and all results. Derived
   from the optimization run name, each configuration creates a single directory.

--- a/doc.zih.tu-dresden.de/docs/software/utilities.md
+++ b/doc.zih.tu-dresden.de/docs/software/utilities.md
@@ -151,11 +151,10 @@ It is also possible to run this command using a job file to retrieve the topolog
 ```bash
 #!/bin/bash

-#SBATCH --job-name=topo_haswell
+#SBATCH --job-name=topo_node
 #SBATCH --ntasks=1
 #SBATCH --cpus-per-task=1
 #SBATCH --mem-per-cpu=300m
-#SBATCH --partition=haswell
 #SBATCH --time=00:05:00
 #SBATCH --output=get_topo.out
 #SBATCH --error=get_topo.err

--- a/doc.zih.tu-dresden.de/docs/software/visualization.md
+++ b/doc.zih.tu-dresden.de/docs/software/visualization.md
@@ -110,9 +110,8 @@ cards (GPUs) specified by the device index. For that, make sure to use the modul
    #!/bin/bash

    #SBATCH --nodes=1
-    #SBATCH --cpus-per-task=12
-    #SBATCH --gres=gpu:2
-    #SBATCH --partition=gpu2
+    #SBATCH --cpus-per-task=1
+    #SBATCH --gres=gpu:1
    #SBATCH --time=01:00:00

    # Make sure to only use ParaView