Skip to content
Snippets Groups Projects
Commit 8f91454f authored by Moe Jette's avatar Moe Jette
Browse files

If slurmd can't respond to ping (e.g. paging is keeping it from

responding in a timely fashion) then send a registration RPC
to slurmctld.
parent 01f3cad8
No related branches found
No related tags found
No related merge requests found
...@@ -6,6 +6,9 @@ documents those changes that are of interest to users and admins. ...@@ -6,6 +6,9 @@ documents those changes that are of interest to users and admins.
-- Srun wakes immeditely upon resource allocation (via new RPC) -- Srun wakes immeditely upon resource allocation (via new RPC)
rather than polling. rather than polling.
-- Deamons log current version number at startup -- Deamons log current version number at startup
-- If slurmd can't respond to ping (e.g. paging is keeping it from
responding in a timely fashion) then send a registration RPC
to slurmctld
* Changes in SLURM 0.3.1 * Changes in SLURM 0.3.1
======================== ========================
......
...@@ -477,8 +477,17 @@ _rpc_ping(slurm_msg_t *msg, slurm_addr *cli_addr) ...@@ -477,8 +477,17 @@ _rpc_ping(slurm_msg_t *msg, slurm_addr *cli_addr)
rc = ESLURM_USER_ID_MISSING; /* or bad in this case */ rc = ESLURM_USER_ID_MISSING; /* or bad in this case */
} }
/* return result */ /* Return result. If the reply can't be sent this indicates that
slurm_send_rc_msg(msg, rc); * 1. The network is broken OR
* 2. slurmctld has died OR
* 3. slurmd was paged out due to full memory
* If the reply request fails, we send an registration message to
* slurmctld in hopes of avoiding having the node set DOWN due to
* slurmd paging and not being able to respond in a timely fashion. */
if (slurm_send_rc_msg(msg, rc)) {
error("Error responding to ping: %m");
send_registration_msg(SLURM_SUCCESS);
}
return rc; return rc;
} }
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment