- Sep 21, 2016
-
-
Morris Jette authored
When powering up a node to change it's state (e.g. KNL NUMA or MCDRAM mode) then pass to the ResumeProgram the job ID assigned to the nodes in the SLURM_JOB_ID environment variable. bug 3100
-
Morris Jette authored
Don't log error for job end_time being zero if node health check is still running. bug 3053
-
Brian Christiansen authored
Previous logic duplicated checking error_codes returned from job_allocate. job_allocate() will set job state to FAILED if there was an actual issue.
-
Brian Christiansen authored
Was just checking for ESLURM_REQUESTED_PART_CONFIG_UNAVAILABLE and ENFORCE_ALL however in _slurm_rpc_allocate_resources() and _slurm_rpc_submit_batch_job() both check for ANY and ALL.
-
- Sep 20, 2016
-
-
Danny Auble authored
to siblings (If not already connected). This will happen when the next message is sent to them.
-
Tim Wickberg authored
-
Danny Auble authored
-
Danny Auble authored
sibling clusters.
-
Danny Auble authored
back a message to the caller.
-
Danny Auble authored
a federation connection, someone adding and removing the cluster from the federation lots of times at the same time the cluster could be not found.
-
Danny Auble authored
-
Danny Auble authored
-
Danny Auble authored
-
Danny Auble authored
-
Tim Wickberg authored
Fixes build issue caused by 844830d4.
-
Ben Matthews authored
-
- Sep 19, 2016
-
-
Danny Auble authored
connection
-
Danny Auble authored
-
Danny Auble authored
error.
-
Danny Auble authored
-
Danny Auble authored
at startup. Starting it up when you get a connection from another cluster could cause delays in processing the request.
-
Danny Auble authored
want to only wait for message_timeout instead of forever. Otherwise we could hit deadlock if the other person is trying to do the same thing.
-
Danny Auble authored
-
Danny Auble authored
processed at a time. Otherwise you could get issues if you are rapidly adding and removing a cluster from a federation. Probably not likely in real life, but in testing that is a different story.
-
Danny Auble authored
slurmctld.
-
Danny Auble authored
scenario when first added to a federation.
-
Danny Auble authored
-
Morris Jette authored
-
Damien François authored
-
- Sep 17, 2016
-
-
Danny Auble authored
the same logic that was found in the slurmdbd. Now both functionalities share the same code. This was done with the merge right before this commit.
-
Danny Auble authored
-
Danny Auble authored
update is sent to a slurmctld.
-
Danny Auble authored
with real persistent connections.
-
Danny Auble authored
-
Danny Auble authored
-
Danny Auble authored
the fd is set to -1 so we don't try to actually use it.
-
Danny Auble authored
when querying if there are runaway jobs.
-
Danny Auble authored
-
Danny Auble authored
out in, but it is back now!
-
Danny Auble authored
-