Skip to content
Snippets Groups Projects
Commit 61e7f91e authored by Moe Jette's avatar Moe Jette
Browse files

Increase the throughput rate of outgoing slurmctld message traffix.

Fixes problem running very large number of simultaneous jobs when
their inactivity limit was reached due to a backlog of message.
parent eedc7d11
No related branches found
No related tags found
No related merge requests found
......@@ -31,6 +31,9 @@ documents those changes that are of interest to users and admins.
systems (QsNet and QsNetII).
-- Added some missing read locks for references for slurmctld's
configuration data structure
-- Modify processing of queued slurmctld message traffic to get better
throughput (resulted in job inactivity limit being reached improperly
when hundreds of jobs running simultaneously)
* Changes in SLURM 0.3.0.0-pre6
===============================
......
......@@ -771,7 +771,7 @@ static void _list_delete_retry(void *retry_entry)
* agent_retry - Agent for retrying pending RPCs. One pending request is
* issued if it has been pending for at least min_wait seconds
* IN min_wait - Minimum wait time between re-issue of a pending RPC
* RET count of queued requests remaining
* RET count of queued requests remaining (zero if none are old)
*/
extern int agent_retry (int min_wait)
{
......@@ -782,14 +782,14 @@ extern int agent_retry (int min_wait)
slurm_mutex_lock(&retry_mutex);
if (retry_list) {
double age = 0;
list_size = list_count(retry_list);
queued_req_ptr = (queued_request_t *) list_peek(retry_list);
if (queued_req_ptr) {
age = difftime(now, queued_req_ptr->last_attempt);
if (age > min_wait)
if (age > min_wait) {
queued_req_ptr = (queued_request_t *)
list_pop(retry_list);
else /* too new */
list_size = list_count(retry_list);;
} else /* too new */
queued_req_ptr = NULL;
}
}
......
......@@ -613,6 +613,7 @@ static void *_slurmctld_background(void *no_data)
static time_t last_timelimit_time;
static time_t last_assert_primary_time;
time_t now;
int i;
/* Locks: Read config */
slurmctld_lock_t config_read_lock = {
......@@ -693,7 +694,11 @@ static void *_slurmctld_background(void *no_data)
unlock_slurmctld(job_read_lock);
}
(void) agent_retry(RPC_RETRY_INTERVAL);
/* Process pending agent work, issues several requests */
for (i=0; i<5; i++) {
if (agent_retry(RPC_RETRY_INTERVAL) == 0)
break;
}
if (difftime(now, last_group_time) >= PERIODIC_GROUP_CHECK) {
last_group_time = now;
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment