Skip to content
Snippets Groups Projects
Commit 6e1e8c3e authored by Morris Jette's avatar Morris Jette
Browse files

Srun to keep running after receiving unrecognized message

This can happen if something outside of Slurm opens the srun socket
and writes to it, since the data will not be of a form that Slurm
can decode.
Bug 354
parent 2d28ef52
No related branches found
No related tags found
No related merge requests found
......@@ -207,6 +207,16 @@ static bool _retry(void)
slurm_strerror(ESLURM_NODES_BUSY));
error_exit = immediate_exit;
return false;
} else if ((errno == SLURM_PROTOCOL_AUTHENTICATION_ERROR) ||
(errno == SLURM_UNEXPECTED_MSG_ERROR) ||
(errno == SLURM_PROTOCOL_INSANE_MSG_LENGTH)) {
static int external_msg_count = 0;
error("Srun communication socket apparently being written to "
"by something other than Slurm");
if (external_msg_count++ < 4)
return true;
error("Unable to allocate resources: %m");
return false;
} else {
error("Unable to allocate resources: %m");
return false;
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment