- Dec 15, 2015
-
-
David Bigagli authored
-
Artem Polyakov authored
-
Artem Polyakov authored
-
Artem Polyakov authored
-
Artem Polyakov authored
-
Artem Polyakov authored
-
Artem Polyakov authored
-
Artem Polyakov authored
Fix state machine to support corner cases when fan-in message for the next Fence appears before fan-out message completing the current Fence. This was observed on Orion as well as on containers environment in cases where Fences was light but the number of process was reasonably high.
-
Artem Polyakov authored
-
Artem Polyakov authored
Change the ordering of operations during fan-in. We will need this for the improved state machine. This changes are verified to work.
-
Artem Polyakov authored
Optimization of fan-in collective stage. Sense parent by sending him a zero-sized message before sending him a junk of data. This helps to resolve cases where parent slurmstepd in collective haven't been yet bootstrapped.
-
Artem Polyakov authored
-
Artem Polyakov authored
Fix EIO: try to process message right in the place if possible instead of always switching to the AIO event handling.
-
Artem Polyakov authored
-
Artem Polyakov authored
-
Artem Polyakov authored
-
Artem Polyakov authored
-
Artem Polyakov authored
-
Artem Polyakov authored
-
Artem Polyakov authored
-
Artem Polyakov authored
-
Artem Polyakov authored
-
Artem Polyakov authored
-
David Bigagli authored
This reverts commit 76098daf.
-
Morris Jette authored
-
David Bigagli authored
bug 2171
-
- Dec 14, 2015
-
-
Morris Jette authored
Decrease parallelism in job cancel request to prevent denial of service when cancelling huge numbers of jobs. bug 2256
-
Morris Jette authored
-
Morris Jette authored
Prevent triggering gang scheduling within a partition if configured with PreemptType=partition_prio and PreemptMode=suspend,gang. The essence of this fix is to change a "<=" to "<" in cons_res/job_test.c: - if ((p_ptr->part_ptr->priority <= jp_ptr->part_ptr->priority) && + if ((p_ptr->part_ptr->priority < jp_ptr->part_ptr->priority) && but logic was also added to insure that a partition configuration with PreemptMode did not override PreemptType != partition_prio. bug 2232
-
David Bigagli authored
-
David Bigagli authored
Avoiding conditional directives that break statements
-
David Bigagli authored
-
- Dec 13, 2015
-
-
Romero Malaquias authored
-
- Dec 11, 2015
-
-
Morris Jette authored
Conflicts: src/slurmctld/job_mgr.c
-
Tim Wickberg authored
No changes to logic
-
Morris Jette authored
-
Tim Wickberg authored
Previously an error() would be logged when the attempt to open the job script using the new directory format failed but the successive fallback to the old directory structure was successful, leading to confusion when troubleshooting. Move emitted warnings to debug(), and only error() after failing to open in both directory structures. Add a note about backwards compatibility to both functions - we cannot remove these fallbacks as directory structure for pending jobs does not change on Slurm version update, and people may need to chain multiple version update together to get to a current slurm version which would correctly update slurmctld state files but leave pending jobs in the old directory structure. Bug #2244.
-
Morris Jette authored
If a job is requeued while in the process of being launch, remove it's job ID from slurmd's record of active jobs in order to avoid generating a duplicate job ID error when launched for the second time (which would drain the node). bug 2240
-
Morris Jette authored
In slurmctld log file, log duplicate job ID found by slurmd. Previously was being logged as prolog/epilog failure. bug 2240
-
David Bigagli authored
-