Skip to content
Snippets Groups Projects
Commit 8ca87844 authored by Bert Wesarg's avatar Bert Wesarg :keyboard:
Browse files

Its "Open MPI"

parent 3300ffc0
No related branches found
No related tags found
2 merge requests!919Automated merge from preview to main,!911Its "Open MPI"
# Known Issues when Using MPI
# Known Issues with MPI
This pages holds known issues observed with MPI and concrete MPI implementations.
## OpenMPI v4.1.x - Performance Loss with MPI-IO-Module OMPIO
## Open MPI
OpenMPI v4.1.x introduced a couple of major enhancements, e.g., the `OMPIO` module is now the
### Version v4.1.x - Performance Loss with MPI-IO-Module OMPIO
Open MPI v4.1.x introduced a couple of major enhancements, e.g., the `OMPIO` module is now the
default module for MPI-IO on **all** filesystems incl. Lustre (cf.
[NEWS file in OpenMPI source code](https://raw.githubusercontent.com/open-mpi/ompi/v4.1.x/NEWS)).
[NEWS file in Open MPI source code](https://raw.githubusercontent.com/open-mpi/ompi/v4.1.x/NEWS)).
Prior to this, `ROMIO` was the default MPI-IO module for Lustre.
Colleagues of ZIH have found that some MPI-IO access patterns suffer a significant performance loss
using `OMPIO` as MPI-IO module with OpenMPI/4.1.x modules on ZIH systems. At the moment, the root
using `OMPIO` as MPI-IO module with `OpenMPI/4.1.x` modules on ZIH systems. At the moment, the root
cause is unclear and needs further investigation.
**A workaround** for this performance loss is to use "old", i.e., `ROMIO` MPI-IO-module. This
......@@ -18,17 +20,17 @@ is achieved by setting the environment variable `OMPI_MCA_io` before executing t
follows
```console
export OMPI_MCA_io=^ompio
srun ...
marie@login$ export OMPI_MCA_io=^ompio
marie@login$ srun
```
or setting the option as argument, in case you invoke `mpirun` directly
```console
mpirun --mca io ^ompio ...
marie@login$ mpirun --mca io ^ompio
```
## Mpirun on partition `alpha` and `ml`
### Mpirun on partition `alpha` and `ml`
Using `mpirun` on partitions `alpha` and `ml` leads to wrong resource distribution when more than
one node is involved. This yields a strange distribution like e.g. `SLURM_NTASKS_PER_NODE=15,1`
......@@ -39,23 +41,22 @@ Another issue arises when using the Intel toolchain: mpirun calls a different MP
8-9x slowdown in the PALM app in comparison to using srun or the GCC-compiled version of the app
(which uses the correct MPI).
## R Parallel Library on Multiple Nodes
### R Parallel Library on Multiple Nodes
Using the R parallel library on MPI clusters has shown problems when using more than a few compute
nodes. The error messages indicate that there are buggy interactions of R/Rmpi/OpenMPI and UCX.
nodes. The error messages indicate that there are buggy interactions of R/Rmpi/Open MPI and UCX.
Disabling UCX has solved these problems in our experiments.
We invoked the R script successfully with the following command:
```console
mpirun -mca btl_openib_allow_ib true --mca pml ^ucx --mca osc ^ucx -np 1 Rscript
--vanilla the-script.R
marie@login$ mpirun -mca btl_openib_allow_ib true --mca pml ^ucx --mca osc ^ucx -np 1 Rscript --vanilla the-script.R
```
where the arguments `-mca btl_openib_allow_ib true --mca pml ^ucx --mca osc ^ucx` disable usage of
UCX.
## MPI Function `MPI_Win_allocate`
### MPI Function `MPI_Win_allocate`
The function `MPI_Win_allocate` is a one-sided MPI call that allocates memory and returns a window
object for RDMA operations (ref. [man page](https://www.open-mpi.org/doc/v3.0/man3/MPI_Win_allocate.3.php)).
......@@ -65,6 +66,6 @@ object for RDMA operations (ref. [man page](https://www.open-mpi.org/doc/v3.0/ma
It was observed for at least for the `OpenMPI/4.0.5` module that using `MPI_Win_Allocate` instead of
`MPI_Alloc_mem` in conjunction with `MPI_Win_create` leads to segmentation faults in the calling
application . To be precise, the segfaults occurred at partition `romeo` when about 200 GB per node
application. To be precise, the segfaults occurred at partition `romeo` when about 200 GB per node
where allocated. In contrast, the segmentation faults vanished when the implementation was
refactored to call the `MPI_Alloc_mem + MPI_Win_create` functions.
refactored to call the `MPI_Alloc_mem` + `MPI_Win_create` functions.
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment