- Dec 22, 2014
-
-
Daniel Ahlin authored
Correct parsing of AccountingStoragePass when specified in old format (just a path name)
-
Brian Christiansen authored
-
Brian Christiansen authored
Bug 1331
-
Rémi Palancher authored
Intel MPI, on MPI jobs initialisation through PMI, uses to call PMI_KVS_Put() many many times from task at rank 0, and each on these call is followed by PMI_KVS_Commit(). Slurm implementation of PMI_KVS_Commit() imposes a delay to avoid DDOS on original srun. This delay is proportional to the total number. It could be up to 3 secs for large jobs for ex. with 7168 tasks. Therefore, when Intel MPI calls PMI_KVS_Commit() 475 times (mesured on a test case) from task at rank 0, 28 minutes are spent in delay function. All other tasks in the job are waiting for a PMI_Barrier. Therefore, there is no risk for a DDOS from this single task 0. The patch alters the delaying time calculation to make sure task at rank 0 will does not be delayed. All other tasks are globally spreaded in the same time range as before.
-
- Dec 20, 2014
-
-
Danny Auble authored
of Slurm daemons. The slurmstepd still needs to be fixed, which most likely can't be fixed until 15.08.
-
Danny Auble authored
-
Danny Auble authored
-
- Dec 19, 2014
-
-
Danny Auble authored
of Slurm daemons.
-
Danny Auble authored
but then sets CPUs to only represent the number of cores on the node.
-
Danny Auble authored
-
Danny Auble authored
-
- Dec 17, 2014
-
-
Brian Christiansen authored
Bug 1327
-
Danny Auble authored
doesn't request a number of tasks.
-
- Dec 16, 2014
-
-
Morris Jette authored
Fix job array hash table bug, could result in slurmctld infinite loop or invalid memory reference. bug 1309
-
Nathan Yee authored
-
David Bigagli authored
-
David Bigagli authored
as it may cause core dumo in squeue. This reverts commit 322c783c.
-
- Dec 12, 2014
-
-
Morris Jette authored
If a master job array record is complete, then consider all pending tasks as also complete. This problem happens when a master job array record is pending (has pending tasks) and is cancelled. The result previously was a job record not visible to squeue/scontrol, but occupying memory. The same type of problem happened with respect to a dependency on a job array which was cancelled.
-
Morris Jette authored
-
Morris Jette authored
This change will better reveal any vestigial job records not being purged
-
Danny Auble authored
-
Danny Auble authored
-
Danny Auble authored
Conflicts: META
-
Danny Auble authored
-
Danny Auble authored
-
Morris Jette authored
Done in response to bug 1323
-
- Dec 11, 2014
-
-
Danny Auble authored
other config files like bluegene.conf and such.
-
Danny Auble authored
-
Danny Auble authored
If a QOS was added for the job and then removed and it just happened to be the largest QOS id wise if the slurmctld was restarted and the job wasn't flushed out yet it could mess things up.
-
Danny Auble authored
Conflicts: src/plugins/task/cray/task_cray.c
-
David Bigagli authored
-
Brian Christiansen authored
-
Nicolas Joly authored
slurmdb_purge_string(), there is no reason to check for this specific value anymore in read_config().
-
David Bigagli authored
-
Hongjia Cao authored
and initialize kvs_seq on mpi/pmi2 setup to support launching.
-
David Bigagli authored
it also need client side.
-
Morris Jette authored
-
Brian Christiansen authored
-
Morris Jette authored
-
Danny Auble authored
-