- Dec 08, 2015
-
-
Brian Christiansen authored
-
Tim Wickberg authored
This also corrects per user burst buffer space calculation
-
Brian Christiansen authored
-
Danny Auble authored
_calc_billable_tres that was missed there.
-
Danny Auble authored
-
Danny Auble authored
requests no time limit. http://bugs.schedmd.com/show_bug.cgi?id=2177
-
- Dec 07, 2015
-
-
Morris Jette authored
-
Danny Auble authored
-
Danny Auble authored
-
Morris Jette authored
Avoid logging every time that some interaction with the elasticsearch server happens. That would generate too many log messages. Fix allignment problem in the code without changing logic.
-
Alejandro Sanchez authored
-
Alejandro Sanchez authored
Added a limit of MAX_JOBS 100000 jobs enqueued and waiting to be indexed
-
Alejandro Sanchez authored
-
Tim Wickberg authored
Usernames are comma separated, not colon delimited. Bug #2222. While here fix a few spelling mistakes.
-
- Dec 06, 2015
-
-
Sourav Chakraborty authored
Fixed a typo that was causing slurm_cred_copy to segfault
-
jette authored
-
Tim Wickberg authored
If a job is configured to be sent a signal when approaching its timeout and the job is requeued after the signal is sent, then resend the signal at the approriate time after requeue. bug 1415
-
jette authored
If the backfill scheduler can not start at the appointed time, say because there are too many pending RPC, then only sleep 1 second and try again rather than the full bf_interval (30 seconds by default).
-
jette authored
Same logic was in both backfill plugin and slurmctld/job_scheduler.c No change in logic
-
- Dec 05, 2015
-
-
Brian Christiansen authored
Bug 2130
-
Brian Christiansen authored
Adopted processes didn't have access to the job's devices. Bug 2130
-
Brian Christiansen authored
-
- Dec 04, 2015
-
-
Danny Auble authored
Full revert of c2fbf88f, 13b64c35 had caught part of this, but this will revert it completely. The code just wasn't needed in modern Slurm. It appears the patch came from an older version of Slurm that didn't handle this correctly.
-
David Bigagli authored
This reverts commit 29f25688. Conflicts: NEWS Looks like this isn't needed, commit c2fbf88f doesn't appear to be needed and is what is causing this issue. c2fbf88f was added from an older version of Slurm where this was already handled correctly in commit 815e5a44.
-
Danny Auble authored
get messages like slurmctld: error: _handle_assoc_tres_run_secs: job 33355: assoc 1 TRES node grp_used_tres_run_secs underflow, tried to remove 21 seconds when only 0 remained. which are bad.
-
Morris Jette authored
-
Alejandro Sanchez authored
If a store fails, do not retry for 30 seconds
-
- Dec 03, 2015
-
-
Morris Jette authored
Cray job NHC delayed until after burst buffer released and epilog completes on all allocated nodes. bugs 2099 and 2192
-
Morris Jette authored
Add new buffer state of BB_STATE_POST_RUN indicating the post_run operation has been started Add new bb_p_job_test_post_run function to test status of post_run operation Execute post_run operation before the stage_out operation to opimize use of compute nodes (rather than burst buffer space)
-
Morris Jette authored
Release a job's allocated licenses only after epilog runs on all nodes rather than at start of termination process. bug 2192
-
Morris Jette authored
sched/backfill - Delay backfill scheduler for completing jobs only if CompleteWait configuration parameter is set (make code match documentation).
-
David Bigagli authored
-
David Bigagli authored
-
David Bigagli authored
-
Tim Wickberg authored
-
Tim Wickberg authored
moved code to common location.
-
- Dec 02, 2015
-
-
Brian Christiansen authored
Addition to b9668b2b
-
Josko Plazonic authored
Bug 2030
-
Morris Jette authored
Invalid user input could generate slurmctld error(), changed to info()
-
Morris Jette authored
-