Skip to content
GitLab
Explore
Sign in
Primary navigation
Search or go to…
Project
hpc-compendium
Manage
Activity
Members
Labels
Plan
Issues
Issue boards
Milestones
Wiki
Code
Merge requests
Repository
Branches
Commits
Tags
Repository graph
Compare revisions
Snippets
Deploy
Releases
Package Registry
Container Registry
Model registry
Operate
Terraform modules
Monitor
Incidents
Service Desk
Analyze
Value stream analytics
Contributor analytics
Repository analytics
Model experiments
Help
Help
Support
GitLab documentation
Compare GitLab plans
Community forum
Contribute to GitLab
Provide feedback
Terms and privacy
Keyboard shortcuts
?
Snippets
Groups
Projects
Show more breadcrumbs
ZIH
hpcsupport
hpc-compendium
Commits
ce6f840f
Commit
ce6f840f
authored
2 years ago
by
Martin Schroschk
Browse files
Options
Downloads
Patches
Plain Diff
Collect MPI usage issues on new page
* Resolves
#291
,
#270
, and
#163
parent
ca8766e1
No related branches found
Branches containing commit
No related tags found
2 merge requests
!707
Automated merge from preview to main
,
!630
Mpi errors
Changes
2
Hide whitespace changes
Inline
Side-by-side
Showing
2 changed files
doc.zih.tu-dresden.de/docs/jobs_and_resources/mpi_issues.md
+33
-0
33 additions, 0 deletions
doc.zih.tu-dresden.de/docs/jobs_and_resources/mpi_issues.md
doc.zih.tu-dresden.de/mkdocs.yml
+1
-0
1 addition, 0 deletions
doc.zih.tu-dresden.de/mkdocs.yml
with
34 additions
and
0 deletions
doc.zih.tu-dresden.de/docs/jobs_and_resources/mpi_issues.md
0 → 100644
+
33
−
0
View file @
ce6f840f
# Known MPI-Usage Issues
This pages holds known issues observed with MPI and concrete MPI implementations.
## R Parallel Library on Multiple Nodes
Using the R parallel library on MPI clusters has shown problems when using more than a few compute
nodes. The error messages indicate that there are buggy interactions of R/Rmpi/OpenMPI and UCX.
Disabling UCX has solved these problems in our experiments.
We invoked the R script successfully with the following command:
```
console mpirun -mca btl_openib_allow_ib true --mca pml ^ucx --mca osc ^ucx -np 1 Rscript
--vanilla the-script.R
```
where the arguments
`-mca btl_openib_allow_ib true --mca pml ^ucx --mca osc ^ucx`
disable usage of
UCX.
## MPI Function `MPI_Win_allocate`
The function
`MPI_Win_allocate`
is a one-sided MPI call that allocates memory and returns a window
object for RDMA operations (ref.
[
man page
](
https://www.open-mpi.org/doc/v3.0/man3/MPI_Win_allocate.3.php
)
).
> Using MPI_Win_allocate rather than separate MPI_Alloc_mem + MPI_Win_create may allow the MPI implementation to optimize the memory allocation.
> (Using advanced MPI)
It was observed for at least for the
`OpenMPI/4.0.5`
module that using
`MPI_Alloc_mem`
in
conjunction with
`MPI_Win_create`
instead of
`MPI_Win_Allocate`
leads to segmentation faults in the
calling application. To be precise, the segfaults occurred at partition
`romeo`
when about 200 GB
per node where allocated. In contrast, the segmentation faults vanished when the implementation was
refactored to call the
`MPI_Win_allocate`
function.
This diff is collapsed.
Click to expand it.
doc.zih.tu-dresden.de/mkdocs.yml
+
1
−
0
View file @
ce6f840f
...
...
@@ -80,6 +80,7 @@ nav:
-
Compilers and Flags
:
software/compilers.md
-
GPU Programming
:
software/gpu_programming.md
-
Mathematics Libraries
:
software/math_libraries.md
-
MPI Usage Issues
:
software/mpi_issues.md
-
Debugging
:
software/debuggers.md
-
Software Engineering Tools
:
-
MPI Error Detection
:
software/mpi_usage_error_detection.md
...
...
This diff is collapsed.
Click to expand it.
Preview
0%
Loading
Try again
or
attach a new file
.
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Save comment
Cancel
Please
register
or
sign in
to comment