diff --git a/doc.zih.tu-dresden.de/docs/software/gpu_programming.md b/doc.zih.tu-dresden.de/docs/software/gpu_programming.md
index 28612b63b28990323f514a3edeb268f6ae616abb..dc45a045cafbbd2777845fbb7a648fd542e82345 100644
--- a/doc.zih.tu-dresden.de/docs/software/gpu_programming.md
+++ b/doc.zih.tu-dresden.de/docs/software/gpu_programming.md
@@ -6,6 +6,24 @@ The full hardware specifications of the GPU-compute nodes may be found in the
 [HPC Resources](../jobs_and_resources/hardware_overview.md#hpc-resources) page.
 Note that the clusters may have different [modules](modules.md#module-environments) available:
 
+E.g. the available CUDA versions can be listed with
+
+```bash
+marie@compute$ module spider CUDA
+```
+
+Note that some modules use a specific CUDA version which is visible in the module name,
+e.g. `GDRCopy/2.1-CUDA-11.1.1` or `Horovod/0.28.1-CUDA-11.7.0-TensorFlow-2.11.0`.
+
+This especially applies to the optimized CUDA libraries like `cuDNN`, `NCCL` and `magma`.
+
+!!! important "CUDA-aware MPI"
+
+    When running CUDA applications using MPI for interprocess communication you need to additionally load the modules
+    that enable CUDA-aware MPI which may provide improved performance.
+    Those are `UCX-CUDA` and `UCC-CUDA` which supplement the `UCX` and `UCC` modules respectively.
+    Some modules, like `NCCL`, load those automatically.
+
 ## Using GPUs with Slurm
 
 For general information on how to use Slurm, read the respective [page in this compendium](../jobs_and_resources/slurm.md).