- Jul 19, 2016
-
-
Brian Christiansen authored
-
Brian Christiansen authored
Happens when there are running jobs on the cluster.
-
Brian Christiansen authored
_process_running_jobs_result() only does something if there were results returned.
-
Brian Christiansen authored
-
Morris Jette authored
-
Morris Jette authored
If the user is now allowed to use the partition, then do not check that user's group access again for 5 seconds. bug 2913
-
Morris Jette authored
Improve partition AllowGroups caching. Update the table of UIDs permitted to use a partition based upon it's AllowGroups configuration parameter as new valid UIDs are found rather than looking up that user's group information for every job they submit, which can involve considerable overhead for some systems. bug 2913
-
Morris Jette authored
-
Morris Jette authored
Minimize preempted jobs for configurations with multiple jobs per node. Previous logic would preeempt every job on node allocated to pending job. bug 2906
-
Morris Jette authored
Fix for core selection with job --gres-flags=enforce-binding option. Previous logic would in some cases allocate a job zero cores, resulting in slurmctld abort. bug 2808
-
- Jul 18, 2016
-
-
Morris Jette authored
Add some indentation so that GRES topology-specific information logged is more readable.
-
Morris Jette authored
-
Morris Jette authored
A job allocation selecting nodes and no cores/CPUs could write off the end of arrays and corrupt memory. Now to figure out how the logic reached this point in the first place. bug 2808
-
Morris Jette authored
-
- Jul 16, 2016
-
-
Danny Auble authored
In commit b8190e5d many places that were mean to be pending step ids were changed to be extern_step id. The main problem was when we came up with the idea of the extern step we reused -1 (INFINITE) for the id. So pending steps also appeared to be extern steps as well. Hopefully this fixes the situation. Bug 2907
-
Morris Jette authored
-
Morris Jette authored
-
Morris Jette authored
Start power save thread only after the partition information is read in order to avoid trying to interpret the SuspendExcParts configuration information before the partition information is available, which would result in a slurmctld abort.
-
Morris Jette authored
Do not try to access part_list variable (partition list pointer) if not yet initialized. Return NULL pointer rather than aborting with NULL pointer.
-
- Jul 15, 2016
-
-
Tim Wickberg authored
-
Jacek Budzowski authored
bug 2900
-
Nicolas Joly authored
-
Nicolas Joly authored
-
Nicolas Joly authored
-
Morris Jette authored
-
Morris Jette authored
-
Morris Jette authored
Don't register newly found buffers that are less than OtherTimout old to avoid possible duplicates.
-
Morris Jette authored
This hardens the code with respect to a race condtion if the slurmctld restarts and a burts buffer creation for a job is in progress. Eliminate the possibility of a duplicate job allocation record.
-
Morris Jette authored
No change in functionality, just moved function call and added comment
-
Danny Auble authored
delete_step_records which would delete the steps without the killing flag set.
-
Danny Auble authored
What this does is treats the extern step like a normal step on exit. It doesn't appear the original code is needed anymore and this simplifies the code. The select_cray change is relevant since the add is needed only when killing the step as that is the only place _internal_step_complete isn't used.
-
Danny Auble authored
functions.
-
Danny Auble authored
-
Danny Auble authored
-
Danny Auble authored
Before it was showing it as TBD since pending steps and the extern step have the same stepid.
-
Danny Auble authored
What this does is set the state earlier to match a normal set. Remove the unneeded _send_pending_exit_msgs. There is only one task and we have the message for it, so don't worry about that one. Most important, wait for the other slurmstepd's to send their message, otherwise they could be lost on the other end.
-
Morris Jette authored
Only execute the DataWarp real_size function if there is a job burst buffer. Calling the function if the job only references persistent buffers generates an error that is not useful
-
- Jul 14, 2016
-
-
Morris Jette authored
Wrong argument type
-
Morris Jette authored
Preserve variable resp_msg for use in error message and use a different variable for temporary storage.
-
Morris Jette authored
-