- Apr 30, 2014
-
-
Morris Jette authored
-
Morris Jette authored
-
David Bigagli authored
together.
-
Morris Jette authored
-
Morris Jette authored
-
Morris Jette authored
Switch/nrt - Properly track usage of CAU and RDMA resources with multiple tasks per compute node. Previous logic would allocate resources once per task and then deallocate once per node, leaking CMA and RDMA resources and preventing their use by future jobs.
-
Morris Jette authored
-
Morris Jette authored
If a job is held, then only release it with the "scontrol release <jobid>" command rather than a simple reset of the job's priority. This is needed to support job arrays better. Otherwise a priority reset of a job array would free all requeued/held jobs from that job array rather than leaving them held.
-
Morris Jette authored
-
Morris Jette authored
-
Morris Jette authored
If a job's priority is set non-zero then always clear the JOB_SPECIAL_EXIT job state flag, not only when the prior state is HELD_USER or HELD. I'm not sure how the job could have cleared the HELD state and changed to NO_REASON, but this would fix the problem. bug 760
-
- Apr 29, 2014
-
-
Morris Jette authored
Modify slurmd to keep track of which jobs have already been launched. It the launch is complete, then process suspend requests immediately. Previously the suspend request was always delayed by 1 second, which adversely impacts gang scheduling performance. If the job can't be found (say after a slurmd restart), then delay the suspend by up to 3 seconds, but only once.
-
Morris Jette authored
Change the integer to hex function to support 32-bit unsigned integers and exit on systems with more than 32 cpus per node since Expect can not work with numbers so large.
-
David Bigagli authored
-
Morris Jette authored
Change the integer to hex function to support 32-bit unsigned integers and exit on systems with more than 32 cpus per node since Expect can not work with numbers so large.
-
David Bigagli authored
-
David Bigagli authored
-
David Bigagli authored
-
- Apr 28, 2014
-
-
Morris Jette authored
Conflicts: src/slurmd/slurmstepd/fname.c
-
Morris Jette authored
This corrects some anomalies in task_id handling. The most significant fallout is the "%a" task ID printed for a NOT array job is now 0xfffffffe rather than 0xfffe, but this should not be used anyway.
-
Morris Jette authored
If the job's stdout or stderr file name contain a "%A" (Job array ID) and it is not a job array, then treat like "%j" (job ID).
-
Morris Jette authored
If the job's stdout or stderr file name contain a "%A" (Job array ID) and it is not a job array, then treat like "%j" (job ID).
-
Morris Jette authored
-
Morris Jette authored
-
Morris Jette authored
This actually causes more problems since there are a bunch of places where we set the reason only if it was previously NO_REASON, but we want to treat this the same way, i.e. untested. Adding logic everywhere to clear and reset this value seem to create too much overhead for very little value.
-
Morris Jette authored
-
Morris Jette authored
Previously partition priority was only considered when used as a component of a job's priority with the priority/multifactor plugin. Now the partition priority is considered first, as documented, and the job priority is considered second. bug 764
-
Morris Jette authored
-
Morris Jette authored
There were a couple of cases where a job's reason should have been set, but was not. In addition the state_desc string should have been cleared in some cases, but was not.
-
Morris Jette authored
Plus clear a job's state_desc when the reason is set to NO_REASON
-
Danny Auble authored
-
Danny Auble authored
in 2.0 :)
-
Morris Jette authored
See bug 757
-
Morris Jette authored
Previously partition priority was only considered when used as a component of a job's priority with the priority/multifactor plugin. Now the partition priority is considered first, as documented, and the job priority is considered second. bug 764
-
Morris Jette authored
-
- Apr 26, 2014
-
-
Morris Jette authored
-
Stuart Midgley authored
Add --priority option to the salloc, sbatch and srun commands.
-
Danny Auble authored
-
Danny Auble authored
-
Danny Auble authored
This code was originally put here to enforce checks to make sure jobs didn't go over the limit. If they didn't request the amount then we set the limit and worked off that as if it were a request. If we do this now we could get jobs deigned which would cancel the job at submit with a very unrelated note as to why the job failed. Since we now check this these limits after the node selection this isn't needed.
-