- Dec 20, 2013
-
-
Danny Auble authored
midplane block that starts on a higher coordinate than it ends (i.e if a block has midplanes [0010,0013] 0013 is the start even though it is listed second in the hostlist).
-
- Dec 19, 2013
-
-
Morris Jette authored
It has been changed to improve the calculated value for pending jobs and use the actual node count value for jobs that have been started (including suspended, completed, etc.) bug 549
-
- Dec 18, 2013
-
-
Danny Auble authored
being in error.
-
- Dec 17, 2013
-
-
Danny Auble authored
-
Danny Auble authored
will return ENOTCONN and not initialize the addr_str causing valgrind errors.
-
- Dec 16, 2013
-
-
Hughes, Doug authored
This allows multiple job ids to hold, uhold, resume, suspend, release, etc.
-
- Dec 14, 2013
-
-
Danny Auble authored
-
- Dec 13, 2013
-
-
Danny Auble authored
-
Morris Jette authored
Fix slurmstepd race condition when separate threads are reading and modifying the job's environment, which can result in the slurmstepd failing with an invalid memory reference. Observed at shutdown when trying to run the task epilog and trying to read the env var: SLURM_STEP_KILLED_MSG_NODE_ID
-
- Dec 12, 2013
-
-
Morris Jette authored
Without this patch, free() is called on a random memory location (i.e. whatever is on the stack), which can result in slurmstepd dying and a completed job not being purged in a timely fashion.
-
- Dec 11, 2013
-
-
Danny Auble authored
-
Morris Jette authored
Fix race condition in authentication credential creation that could corrupt memory. (NOTE: This race condition has existed since 2003 and would be exceedingly rare.)
-
- Dec 09, 2013
-
-
Morris Jette authored
This is needed for job arrays with discontiguous task ID values (e.g. "123_[1,3,5,...99999]")
-
Morris Jette authored
Previously job arrays were only listed with their native job ID (e.g. 123_0 listed as 123, 123_1 as 124, etc). Now lists the job ID using both format (e.g. "123_1 (124)"). The same format is used for job step IDs (e.g. "123_1.2 (124.2)").
-
- Dec 08, 2013
-
-
jette authored
-
- Dec 07, 2013
-
-
Danny Auble authored
-
Philip D. Eckert authored
-
- Dec 06, 2013
-
-
Trofinoff Stephen authored
This adds a mechanism to kill a hung apbasil command
-
Jason Bacon authored
-
- Dec 05, 2013
-
-
Danny Auble authored
news.html.
-
- Dec 04, 2013
-
-
Morris Jette authored
Previous logic never reopened the file, preventing proper functioning of logrotate.
-
- Dec 03, 2013
-
-
Morris Jette authored
Use hash function to locate job records for improved performance.
-
Morris Jette authored
Change partition write lock to a read lock as we use a different mechanism for hidden partitions in getting individual jobs.
-
Morris Jette authored
Correct logic returning remaining job dependencies in job information reported by scontrol and squeue. Eliminates vestigial descriptors with no job ID values (e.g. "afterany"). As depdencies are removed, the job ID values were removed from the strings, but not the descriptors. This eliminates both. It also checks the full job ID to make sure we do not remove "afterany:1234" when job "123" completes.
-
- Dec 02, 2013
-
-
Morris Jette authored
Fix race condition on batch job termination that could result in a job exit code of 0xfffffffe if the slurmd on node zero registers its active jobs at the same time that slurmstepd is recording the job's exit code. but 535
-
David Bigagli authored
-
- Nov 29, 2013
-
-
Morris Jette authored
proctrack/cgroup - Add locking to prevent race condition where one job step is ending for a user or job at the same time another job stepsis starting and the user or job container is deleted from under the starting job step. bug 447
-
David Bigagli authored
Substantial performance improvement for systems with Shared=YES or FORCE and large numbers of running jobs (replace bubble sort with quick sort). Bug 525
-
- Nov 27, 2013
-
-
Morris Jette authored
Original code worked only for Cray systems. For other systems it set gres_alloc to the total number of each GRES allocated on each node to any job
-
- Nov 26, 2013
-
-
Chris Scheller authored
-
- Nov 14, 2013
-
-
Morris Jette authored
bug 511
-
- Nov 13, 2013
-
-
Morris Jette authored
This might have worked fine for core reservations or when there are sufficient idle nodes to use, the the select_g_resv_test() function clears the node bitmap for nodes that it can not use and the reservation create logic did not restore that bitmap after a failed resource selection attempt. This logic restores the node bitmap on a failed call to select_g_resv_test() so we can add nodes to the bitmap of available nodes rather than having it repeatedly cleared. The logic also adds some performance enhancements that I will add to in the next commit.
-
- Nov 08, 2013
-
-
Danny Auble authored
-
- Nov 05, 2013
-
-
Morris Jette authored
Correction to hostlist parsing bug introduced in v2.6.4 for hostlists with more than one numeric range in brackets (e.g. rack[0-3]_blade[0-63]"). bug505
-
- Nov 04, 2013
-
-
Morris Jette authored
-
David Bigagli authored
-
- Nov 01, 2013
-
-
Morris Jette authored
Add argument to priority plugin's priority_p_reconfig function to note when the association and QOS used_cpu_run_secs field has been reset. Without this flag, we remove time on "scontrol setdebug" or "scontrol setdebugflag" that can result in used_cpu_run_secs going negative or otherwise get bad values. Correction to logic added in commit 6d793189 bug 423
-
Morris Jette authored
Fix to work with change logic introduced in Slurm version 2.6.3 scheduling logic which prevented Maui/Moab from starting jobs.
-
- Oct 29, 2013
-
-
David Bigagli authored
-
Morris Jette authored
Add support for -W block=true (wait for job completion) Clear PBS_NODEFILE environment variable Credit to NCSC
-