Skip to content
Snippets Groups Projects
Commit 4622418b authored by Guilherme Calandrini's avatar Guilherme Calandrini
Browse files

fixes for using clusters intead partitions

parent 04120b8c
No related branches found
No related tags found
2 merge requests!990Automated merge from preview to main,!980fixes for using clusters intead partitions
......@@ -212,16 +212,16 @@ there is a list of conventions w.r.t. spelling and technical wording.
| ZIH system(s) | Taurus, HRSK II, our HPC systems, etc. |
| workspace | work space |
| | HPC-DA |
| partition `ml` | ML partition, ml partition, `ml` partition, "ml" partition, etc. |
| cluster `romeo` | ROMEO cluster, romeo cluster, `romeo` cluster, "romeo" cluster, etc. |
### Code Blocks and Command Prompts
* Use ticks to mark code blocks and commands, not an italic font.
* Specify language for code blocks ([see below](#code-blocks-and-syntax-highlighting)).
* All code blocks and commands should be runnable from a login node or a node within a specific
partition (e.g., `ml`).
cluster (e.g., `alpha`).
* It should be clear from the [prompt](#list-of-prompts), where the command is run (e.g., local
machine, login node, or specific partition).
machine, login node, or specific cluster).
#### Code Blocks and Syntax Highlighting
......
......@@ -4,14 +4,12 @@
The full hardware specifications of the GPU-compute nodes may be found in the
[HPC Resources](../jobs_and_resources/hardware_overview.md#hpc-resources) page.
Each node uses a different [module environment](modules.md#module-environments):
Each node uses a different modules(modules.md#module-environments):
* [NVIDIA Tesla K80 GPUs nodes](../jobs_and_resources/hardware_overview.md#island-2-phase-2-intel-haswell-cpus-nvidia-k80-gpus)
(partition `gpu2`): use the default `scs5` module environment (`module switch modenv/scs5`).
* [NVIDIA Tesla V100 nodes](../jobs_and_resources/hardware_overview.md#ibm-power9-nodes-for-machine-learning)
(partition `ml`): use the `ml` module environment (`module switch modenv/ml`)
* [NVIDIA A100 nodes](../jobs_and_resources/hardware_overview.md#amd-rome-cpus-nvidia-a100)
(partition `alpha`): use the `hiera` module environment (`module switch modenv/hiera`)
(cluster `alpha`): use the `hiera` module environment (`module switch modenv/hiera`)
* [NVIDIA Tesla V100 nodes](../jobs_and_resources/hardware_overview.md#ibm-power9-nodes-for-machine-learning)
(cluster `power9`): use the `module spider <module name>`
## Using GPUs with Slurm
......@@ -19,45 +17,7 @@ For general information on how to use Slurm, read the respective [page in this c
When allocating resources on a GPU-node, you must specify the number of requested GPUs by using the
`--gres=gpu:<N>` option, like this:
=== "partition `gpu2`"
```bash
#!/bin/bash # Batch script starts with shebang line
#SBATCH --ntasks=1 # All #SBATCH lines have to follow uninterrupted
#SBATCH --time=01:00:00 # after the shebang line
#SBATCH --account=<KTR> # Comments start with # and do not count as interruptions
#SBATCH --job-name=fancyExp
#SBATCH --output=simulation-%j.out
#SBATCH --error=simulation-%j.err
#SBATCH --partition=gpu2
#SBATCH --gres=gpu:1 # request GPU(s) from Slurm
module purge # Set up environment, e.g., clean modules environment
module switch modenv/scs5 # switch module environment
module load <modules> # and load necessary modules
srun ./application [options] # Execute parallel application with srun
```
=== "partition `ml`"
```bash
#!/bin/bash # Batch script starts with shebang line
#SBATCH --ntasks=1 # All #SBATCH lines have to follow uninterrupted
#SBATCH --time=01:00:00 # after the shebang line
#SBATCH --account=<KTR> # Comments start with # and do not count as interruptions
#SBATCH --job-name=fancyExp
#SBATCH --output=simulation-%j.out
#SBATCH --error=simulation-%j.err
#SBATCH --partition=ml
#SBATCH --gres=gpu:1 # request GPU(s) from Slurm
module purge # Set up environment, e.g., clean modules environment
module switch modenv/ml # switch module environment
module load <modules> # and load necessary modules
srun ./application [options] # Execute parallel application with srun
```
=== "partition `alpha`"
=== "cluster `alpha`"
```bash
#!/bin/bash # Batch script starts with shebang line
......@@ -67,20 +27,18 @@ When allocating resources on a GPU-node, you must specify the number of requeste
#SBATCH --job-name=fancyExp
#SBATCH --output=simulation-%j.out
#SBATCH --error=simulation-%j.err
#SBATCH --partition=alpha
#SBATCH --gres=gpu:1 # request GPU(s) from Slurm
module purge # Set up environment, e.g., clean modules environment
module switch modenv/hiera # switch module environment
module load <modules> # and load necessary modules
srun ./application [options] # Execute parallel application with srun
```
Alternatively, you can work on the partitions interactively:
Alternatively, you can work on the clusters interactively:
```bash
marie@login$ srun --partition=<partition>-interactive --gres=gpu:<N> --pty bash
marie@login.<cluster_name>$ srun --nodes=1 --gres=gpu:<N> --runtime=00:30:00 --pty bash
marie@compute$ module purge; module switch modenv/<env>
```
......@@ -149,7 +107,7 @@ the [official list](https://www.openmp.org/resources/openmp-compilers-tools/) fo
Furthermore, some compilers, such as GCC, have basic support for target offloading, but do not
enable these features by default and/or achieve poor performance.
On the ZIH system, compilers with OpenMP target offloading support are provided on the partitions
On the ZIH system, compilers with OpenMP target offloading support are provided on the clusters
`ml` and `alpha`. Two compilers with good performance can be used: the NVIDIA HPC compiler and the
IBM XL compiler.
......@@ -167,7 +125,7 @@ available for OpenMP, including the `-gpu=ccXY` flag as mentioned above.
#### Using OpenMP target offloading with the IBM XL compilers
The IBM XL compilers (`xlc` for C, `xlc++` for C++ and `xlf` for Fortran (with sub-version for
different versions of Fortran)) are only available on the partition `ml` with NVIDIA Tesla V100 GPUs.
different versions of Fortran)) are only available on the cluster `ml` with NVIDIA Tesla V100 GPUs.
They are available by default when switching to `modenv/ml`.
* The `-qsmp -qoffload` combination of flags enables OpenMP target offloading support
......@@ -328,8 +286,7 @@ documentation for a
### NVIDIA Nsight Compute
Nsight Compute is used for the analysis of individual GPU-kernels. It supports GPUs from the Volta
architecture onward (on the ZIH system: V100 and A100). Therefore, you cannot use Nsight Compute on
the partition `gpu2`. If you are familiar with nvprof, you may want to consult the
architecture onward (on the ZIH system: V100 and A100). If you are familiar with nvprof, you may want to consult the
[Nvprof Transition Guide](https://docs.nvidia.com/nsight-compute/NsightComputeCli/index.html#nvprof-guide),
as Nsight Compute uses a new scheme for metrics.
We recommend those kernels as optimization targets that require a large portion of you run time,
......
......@@ -283,7 +283,7 @@ Using OmniOpt for a first trial example, it is often sufficient to concentrate o
configuration parameters:
1. **Optimization run name:** A name for an OmniOpt run given a belonging configuration.
1. **Partition:** Choose the partition on the ZIH system that fits the programs' needs.
1. **Partition:** Choose the cluster on the ZIH system that fits the programs' needs.
1. **Enable GPU:** Decide whether a program could benefit from GPU usage or not.
1. **Workdir:** The directory where OmniOpt is saving its necessary files and all results. Derived
from the optimization run name, each configuration creates a single directory.
......
......@@ -151,11 +151,10 @@ It is also possible to run this command using a job file to retrieve the topolog
```bash
#!/bin/bash
#SBATCH --job-name=topo_haswell
#SBATCH --job-name=topo_node
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=1
#SBATCH --mem-per-cpu=300m
#SBATCH --partition=haswell
#SBATCH --time=00:05:00
#SBATCH --output=get_topo.out
#SBATCH --error=get_topo.err
......
......@@ -110,9 +110,8 @@ cards (GPUs) specified by the device index. For that, make sure to use the modul
#!/bin/bash
#SBATCH --nodes=1
#SBATCH --cpus-per-task=12
#SBATCH --gres=gpu:2
#SBATCH --partition=gpu2
#SBATCH --cpus-per-task=1
#SBATCH --gres=gpu:1
#SBATCH --time=01:00:00
# Make sure to only use ParaView
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment