Its "Open MPI"

8ca87844 · Bert Wesarg · 3300ffc0 · 8ca87844
Commit 8ca87844 authored 1 year ago by Bert Wesarg
--- a/doc.zih.tu-dresden.de/docs/jobs_and_resources/mpi_issues.md
+++ b/doc.zih.tu-dresden.de/docs/jobs_and_resources/mpi_issues.md
-# Known Issues when Using MPI
+# Known Issues with MPI

 This pages holds known issues observed with MPI and concrete MPI implementations.

-## OpenMPI v4.1.x - Performance Loss with MPI-IO-Module OMPIO
+## Open MPI

-OpenMPI v4.1.x introduced a couple of major enhancements, e.g., the `OMPIO` module is now the
+### Version v4.1.x - Performance Loss with MPI-IO-Module OMPIO
+
+Open MPI v4.1.x introduced a couple of major enhancements, e.g., the `OMPIO` module is now the
 default module for MPI-IO on **all** filesystems incl. Lustre (cf.
-[NEWS file in OpenMPI source code](https://raw.githubusercontent.com/open-mpi/ompi/v4.1.x/NEWS)).
+[NEWS file in Open MPI source code](https://raw.githubusercontent.com/open-mpi/ompi/v4.1.x/NEWS)).
 Prior to this, `ROMIO` was the default MPI-IO module for Lustre.

 Colleagues of ZIH have found that some MPI-IO access patterns suffer a significant performance loss
-using `OMPIO` as MPI-IO module with OpenMPI/4.1.x modules on ZIH systems. At the moment, the root
+using `OMPIO` as MPI-IO module with `OpenMPI/4.1.x` modules on ZIH systems. At the moment, the root
 cause is unclear and needs further investigation.

 **A workaround** for this performance loss is to use "old", i.e., `ROMIO` MPI-IO-module. This
@@ -18,17 +20,17 @@ is achieved by setting the environment variable `OMPI_MCA_io` before executing t
 follows

 ```console
-export OMPI_MCA_io=^ompio
-srun ...
+marie@login$ export OMPI_MCA_io=^ompio
+marie@login$ srun …
 ```

 or setting the option as argument, in case you invoke `mpirun` directly

 ```console
-mpirun --mca io ^ompio ...
+marie@login$ mpirun --mca io ^ompio …
 ```

-## Mpirun on partition `alpha` and `ml`
+### Mpirun on partition `alpha` and `ml`

 Using `mpirun` on partitions `alpha` and `ml` leads to wrong resource distribution when more than
 one node is involved. This yields a strange distribution like e.g. `SLURM_NTASKS_PER_NODE=15,1`
@@ -39,23 +41,22 @@ Another issue arises when using the Intel toolchain: mpirun calls a different MP
 8-9x slowdown in the PALM app in comparison to using srun or the GCC-compiled version of the app
 (which uses the correct MPI).

-## R Parallel Library on Multiple Nodes
+### R Parallel Library on Multiple Nodes

 Using the R parallel library on MPI clusters has shown problems when using more than a few compute
-nodes. The error messages indicate that there are buggy interactions of R/Rmpi/OpenMPI and UCX.
+nodes. The error messages indicate that there are buggy interactions of R/Rmpi/Open MPI and UCX.
 Disabling UCX has solved these problems in our experiments.

 We invoked the R script successfully with the following command:

 ```console
-mpirun -mca btl_openib_allow_ib true --mca pml ^ucx --mca osc ^ucx -np 1 Rscript
--vanilla the-script.R
+marie@login$ mpirun -mca btl_openib_allow_ib true --mca pml ^ucx --mca osc ^ucx -np 1 Rscript --vanilla the-script.R
 ```

 where the arguments `-mca btl_openib_allow_ib true --mca pml ^ucx --mca osc ^ucx` disable usage of
 UCX.

-## MPI Function `MPI_Win_allocate`
+### MPI Function `MPI_Win_allocate`

 The function `MPI_Win_allocate` is a one-sided MPI call that allocates memory and returns a window
 object for RDMA operations (ref. [man page](https://www.open-mpi.org/doc/v3.0/man3/MPI_Win_allocate.3.php)).
@@ -65,6 +66,6 @@ object for RDMA operations (ref. [man page](https://www.open-mpi.org/doc/v3.0/ma

 It was observed for at least for the `OpenMPI/4.0.5` module that using `MPI_Win_Allocate` instead of
 `MPI_Alloc_mem` in conjunction with `MPI_Win_create` leads to segmentation faults in the calling
-application . To be precise, the segfaults occurred at partition `romeo` when about 200 GB per node
+application. To be precise, the segfaults occurred at partition `romeo` when about 200 GB per node
 where allocated. In contrast, the segmentation faults vanished when the implementation was
-refactored to call the `MPI_Alloc_mem + MPI_Win_create` functions.
+refactored to call the `MPI_Alloc_mem` + `MPI_Win_create` functions.