Skip to content
Snippets Groups Projects
Commit 36f59fb3 authored by Artem Polyakov's avatar Artem Polyakov Committed by Danny Auble
Browse files

mpi/pmix: Fix UCX connection error case handling


Signed-off-by: default avatarArtem Polyakov <artpol84@gmail.com>
parent bc916aeb
No related branches found
No related tags found
No related merge requests found
......@@ -228,9 +228,9 @@ static inline int pmixp_dconn_connect(
if (SLURM_SUCCESS == rc){
dconn->state = PMIXP_DIRECT_CONNECTED;
} else {
/* drop the state to INIT so we will try again later
* if it will always be failing - we will always use
* SLURM's protocol
/*
* Abort the application - we can't do what user requested.
* Make sure to provide enough info
*/
char *nodename = pmixp_info_job_host(dconn->nodeid);
xassert(nodename);
......@@ -239,10 +239,12 @@ static inline int pmixp_dconn_connect(
dconn->nodeid);
abort();
}
dconn->state = PMIXP_DIRECT_INIT;
PMIXP_ERROR("Cannot establish direct connection to %s (%d)",
nodename, dconn->nodeid);
xfree(nodename);
pmixp_debug_hang(0); /* enable hang to debug this! */
slurm_kill_job_step(pmixp_info_jobid(),
pmixp_info_stepid(), SIGKILL);
}
return rc;
}
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment