Skip to content
Snippets Groups Projects
Commit 7dbf1ccd authored by Moe Jette's avatar Moe Jette
Browse files

Make agent logic more robust in the face of communications errors.

Send multiple SIGALRMs if needed and deal with possible abort of
a thread.
parent 598397ae
No related branches found
No related tags found
No related merge requests found
......@@ -362,8 +362,9 @@ static void *_wdog(void *args)
if (thread_ptr[i].end_time <= now) {
debug3("agent thread %lu timed out\n",
(unsigned long) thread_ptr[i].thread);
pthread_kill(thread_ptr[i].thread,
SIGALRM);
if (pthread_kill(thread_ptr[i].thread,
SIGALRM) == ESRCH)
thread_ptr[i].state = DSH_FAILED;
}
break;
case DSH_NEW:
......@@ -612,12 +613,12 @@ static void *_thread_per_node_rpc(void *args)
}
/*
* SIGALRM handler. This is just a stub because we are really interested
* in interrupting connect() in k4cmd/rcmd or select() in rsh() below and
* causing them to return EINTR.
* SIGALRM handler. We are really interested in interrupting hung communictions
* and causing them to return EINTR. Multiple interupts might be required.
*/
static void _alarm_handler(int dummy)
{
xsignal(SIGALRM, _alarm_handler);
}
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment