- Jul 27, 2016
-
-
Brian Christiansen authored
Missed in b5bba34c
-
Danny Auble authored
on batch script completes.
-
Danny Auble authored
-
Danny Auble authored
code change.
-
- Jul 26, 2016
-
-
Morris Jette authored
-
Morris Jette authored
-
Danny Auble authored
-
Morris Jette authored
The problem only exists on a subset of KNL models, those with 68 cores. bug 2941
-
Danny Auble authored
-
Morris Jette authored
NOTE: Runaway jobs are jobs that don't exist in the controller but are still considered running in the datbase Should be NOTE: Runaway jobs are jobs that don't exist in the controller but are still considered running in the database
-
Danny Auble authored
(difference between start of job and when it was eligible).
-
- Jul 25, 2016
-
-
Danny Auble authored
-
David Gloe authored
Bug 2939.
-
Danny Auble authored
-
Morris Jette authored
This reverts commit fb8e3558 and moves the place where the SuspendExcNodes and SuspendExcParts configuration parameters are processed (needs to happen AFTER the partition and node tables in the slurmctld daemon are built. bug 2934
-
- Jul 23, 2016
-
-
Morris Jette authored
-
- Jul 22, 2016
-
-
Dominik Bartkiewicz authored
Inadvertently broken in commit 05eac196. Bug 2912.
-
Danny Auble authored
or failed based on the signal that would always be killing it.
-
Danny Auble authored
end of the job to do it.
-
Danny Auble authored
make them using the master job ID instead of the normal job ID.
-
- Jul 21, 2016
-
-
Morris Jette authored
-
Morris Jette authored
Treat invalid user ID in AllowUserBoot option of knl.conf file as error rather than fatal (log and do not exit).
-
- Jul 20, 2016
-
-
Morris Jette authored
Prevent slurmctld abort if job is killed or requeued while waiting for reboot of its allocated compute nodes. The _wait_boot() would reference job_ptr->node_bitmap, which would be NULL.
-
Boris Karasev authored
Bug 2908
-
Danny Auble authored
-
Tim Wickberg authored
Step hasn't been assigned resources, so the select_jobinfo struct hasn't yet been populated. Calling select_g_step_finish will dereference causing a segfault. Bug 2922.
-
- Jul 19, 2016
-
-
Morris Jette authored
-
Gennaro Oliva authored
-
Morris Jette authored
If the user is now allowed to use the partition, then do not check that user's group access again for 5 seconds. bug 2913
-
Morris Jette authored
Improve partition AllowGroups caching. Update the table of UIDs permitted to use a partition based upon it's AllowGroups configuration parameter as new valid UIDs are found rather than looking up that user's group information for every job they submit, which can involve considerable overhead for some systems. bug 2913
-
Morris Jette authored
Minimize preempted jobs for configurations with multiple jobs per node. Previous logic would preeempt every job on node allocated to pending job. bug 2906
-
Morris Jette authored
Fix for core selection with job --gres-flags=enforce-binding option. Previous logic would in some cases allocate a job zero cores, resulting in slurmctld abort. bug 2808
-
- Jul 18, 2016
-
-
Morris Jette authored
Add some indentation so that GRES topology-specific information logged is more readable.
-
Morris Jette authored
A job allocation selecting nodes and no cores/CPUs could write off the end of arrays and corrupt memory. Now to figure out how the logic reached this point in the first place. bug 2808
-
Morris Jette authored
-
- Jul 16, 2016
-
-
Danny Auble authored
In commit b8190e5d many places that were mean to be pending step ids were changed to be extern_step id. The main problem was when we came up with the idea of the extern step we reused -1 (INFINITE) for the id. So pending steps also appeared to be extern steps as well. Hopefully this fixes the situation. Bug 2907
-
Morris Jette authored
-
Morris Jette authored
Start power save thread only after the partition information is read in order to avoid trying to interpret the SuspendExcParts configuration information before the partition information is available, which would result in a slurmctld abort.
-
Morris Jette authored
Do not try to access part_list variable (partition list pointer) if not yet initialized. Return NULL pointer rather than aborting with NULL pointer.
-
- Jul 15, 2016
-
-
Tim Wickberg authored
-