Skip to content
Snippets Groups Projects
Commit 680a2faf authored by Moe Jette's avatar Moe Jette
Browse files

Increase timeout on no-allocate job initiation because in testing we

were starting multiple jobs simultaneously and slurmd was not able
to respond to all of the requests without generating a message timeout.
(gnats:319)
parent 8be59afe
No related branches found
No related tags found
No related merge requests found
......@@ -484,6 +484,7 @@ _accept_msg_connection(job_t *job, int fdnum)
slurm_addr cli_addr;
char host[256];
short port;
int timeout = 0; /* slurm default value */
if ((fd = slurm_accept_msg_conn(job->jfd[fdnum], &cli_addr)) < 0) {
error("Unable to accept connection: %m");
......@@ -494,8 +495,14 @@ _accept_msg_connection(job_t *job, int fdnum)
debug2("got message connection from %s:%d", host, ntohs(port));
msg = xmalloc(sizeof(*msg));
/* multiple jobs (easily induced via no_alloc) sometimes result
* in slow message responses and timeouts. Raise the timeout
* to 5 seconds for no_alloc option only */
if (opt.no_alloc)
timeout = 5;
again:
if (slurm_receive_msg(fd, msg, 0) < 0) {
if (slurm_receive_msg(fd, msg, timeout) < 0) {
if (errno == EINTR)
goto again;
error("slurm_receive_msg[%s]: %m", host);
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment