Skip to content
Snippets Groups Projects
gpu_programming.md 19.12 KiB

GPU Programming

Available GPUs

The full hardware specifications of the GPU-compute nodes may be found in the HPC Resources page. Each node uses a different module environment:

Using GPUs with Slurm

For general information on how to use Slurm, read the respective page in this compendium. When allocating resources on a GPU-node, you must specify the number of requested GPUs by using the --gres=gpu:<N> option, like this:

=== "partition gpu2" ```bash #!/bin/bash # Batch script starts with shebang line

#SBATCH --ntasks=1                    # All #SBATCH lines have to follow uninterrupted
#SBATCH --time=01:00:00               # after the shebang line
#SBATCH --account=<KTR>               # Comments start with # and do not count as interruptions
#SBATCH --job-name=fancyExp
#SBATCH --output=simulation-%j.out
#SBATCH --error=simulation-%j.err
#SBATCH --partition=gpu2
#SBATCH --gres=gpu:1                  # request GPU(s) from Slurm

module purge                          # Set up environment, e.g., clean modules environment
module switch modenv/scs5             # switch module environment
module load <modules>                 # and load necessary modules

srun ./application [options]          # Execute parallel application with srun
```

=== "partition ml" ```bash #!/bin/bash # Batch script starts with shebang line

#SBATCH --ntasks=1                    # All #SBATCH lines have to follow uninterrupted
#SBATCH --time=01:00:00               # after the shebang line
#SBATCH --account=<KTR>               # Comments start with # and do not count as interruptions
#SBATCH --job-name=fancyExp
#SBATCH --output=simulation-%j.out
#SBATCH --error=simulation-%j.err
#SBATCH --partition=ml
#SBATCH --gres=gpu:1                  # request GPU(s) from Slurm

module purge                          # Set up environment, e.g., clean modules environment
module switch modenv/ml               # switch module environment
module load <modules>                 # and load necessary modules

srun ./application [options]          # Execute parallel application with srun
```

=== "partition alpha" ```bash #!/bin/bash # Batch script starts with shebang line

#SBATCH --ntasks=1                    # All #SBATCH lines have to follow uninterrupted
#SBATCH --time=01:00:00               # after the shebang line
#SBATCH --account=<KTR>               # Comments start with # and do not count as interruptions
#SBATCH --job-name=fancyExp
#SBATCH --output=simulation-%j.out
#SBATCH --error=simulation-%j.err
#SBATCH --partition=alpha
#SBATCH --gres=gpu:1                  # request GPU(s) from Slurm

module purge                          # Set up environment, e.g., clean modules environment
module switch modenv/hiera            # switch module environment
module load <modules>                 # and load necessary modules

srun ./application [options]          # Execute parallel application with srun
```

Alternatively, you can work on the partitions interactively:

marie@login$ srun --partition=<partition>-interactive --gres=gpu:<N> --pty bash
marie@compute$ module purge; module switch modenv/<env>

Directive Based GPU Programming

Directives are special compiler commands in your C/C++ or Fortran source code. They tell the compiler how to parallelize and offload work to a GPU. This section explains how to use this technique.

OpenACC

OpenACC is a directive based GPU programming model. It currently only supports NVIDIA GPUs as a target.

Please use the following information as a start on OpenACC:

Introduction

OpenACC can be used with the PGI and NVIDIA HPC compilers. The NVIDIA HPC compiler, as part of the NVIDIA HPC SDK, supersedes the PGI compiler.

Various versions of the PGI compiler are available on the NVIDIA Tesla K80 GPUs nodes (partition gpu2).

The nvc compiler (NOT the nvcc compiler, which is used for CUDA) is available for the NVIDIA Tesla V100 and Nvidia A100 nodes.

Using OpenACC with PGI compilers

  • Load the latest version via module load PGI or search for available versions with module search PGI
  • For compilation, please add the compiler flag -acc to enable OpenACC interpreting by the compiler
  • -Minfo tells you what the compiler is actually doing to your code
  • Add -ta=nvidia:keple to enable optimizations for the K80 GPUs
  • You may find further information on the PGI compiler in the user guide and in the reference guide, which includes descriptions of available command line options

Using OpenACC with NVIDIA HPC compilers

  • Switch into the correct module environment for your selected compute nodes (see list of available GPUs)
  • Load the NVHPC module for the correct module environment. Either load the default (module load NVHPC) or search for a specific version.
  • Use the correct compiler for your code: nvc for C, nvc++ for C++ and nvfortran for Fortran
  • Use the -acc and -Minfo flag as with the PGI compiler
  • To create optimized code for either the V100 or A100, use -gpu=cc70 or -gpu=cc80, respectively
  • Further information on this compiler is provided in the user guide and the reference guide, which includes descriptions of available command line options
  • Information specific the use of OpenACC with the NVIDIA HPC compiler is compiled in a guide

OpenMP target offloading

OpenMP supports target offloading as of version 4.0. A dedicated set of compiler directives can be used to annotate code-sections that are intended for execution on the GPU (i.e., target offloading). Not all compilers with OpenMP support target offloading, refer to the official list for details. Furthermore, some compilers, such as GCC, have basic support for target offloading, but do not enable these features by default and/or achieve poor performance.

On the ZIH system, compilers with OpenMP target offloading support are provided on the partitions ml and alpha. Two compilers with good performance can be used: the NVIDIA HPC compiler and the IBM XL compiler.