- Mar 11, 2015
-
-
Morris Jette authored
This should not be a problem in v14.03, but the logic will be important in v14.11 due to larger job arrays being supported.
-
Morris Jette authored
-
Morris Jette authored
Make sure variables are initialized before passing a pointer to them as a function argument. The variable is set in the underlying function, so this is mostly cosmetic.
-
- Feb 05, 2015
-
-
Brian Christiansen authored
Improve "Prolog and Epilog Scripts" in slurm.conf(5)
-
Pär Lindfors authored
-
Pär Lindfors authored
The environment variable name SLURM_JOB_CLUSTER_NAME should be SLURM_CLUSTER_NAME. This is also available in Prolog and Epilog, so remove note about it only being available in PrologSlurmctld and EpilogSlurmctld.
-
- Jan 30, 2015
-
-
David Bigagli authored
-
- Jan 21, 2015
-
-
Morris Jette authored
Squeue modified to not merge tasks of a job array if their wait reasons differ. bug 1388
-
- Jan 07, 2015
-
-
David Bigagli authored
Slurm 14.03 nccs
-
Aaron Knister authored
-
Rémi Palancher authored
Intel MPI, on MPI jobs initialisation through PMI, uses to call PMI_KVS_Put() many many times from task at rank 0, and each on these call is followed by PMI_KVS_Commit(). Slurm implementation of PMI_KVS_Commit() imposes a delay to avoid DDOS on original srun. This delay is proportional to the total number. It could be up to 3 secs for large jobs for ex. with 7168 tasks. Therefore, when Intel MPI calls PMI_KVS_Commit() 475 times (mesured on a test case) from task at rank 0, 28 minutes are spent in delay function. All other tasks in the job are waiting for a PMI_Barrier. Therefore, there is no risk for a DDOS from this single task 0. The patch alters the delaying time calculation to make sure task at rank 0 will does not be delayed. All other tasks are globally spreaded in the same time range as before.
-
Aaron Knister authored
-
Artem Polyakov authored
-
- Jan 05, 2015
-
-
David Bigagli authored
-
- Dec 20, 2014
-
-
Danny Auble authored
-
- Dec 19, 2014
-
-
Danny Auble authored
of Slurm daemons.
-
- Dec 12, 2014
-
-
Danny Auble authored
-
Danny Auble authored
-
- Dec 11, 2014
-
-
Danny Auble authored
If a QOS was added for the job and then removed and it just happened to be the largest QOS id wise if the slurmctld was restarted and the job wasn't flushed out yet it could mess things up.
-
David Bigagli authored
-
Danny Auble authored
accounting_storage/filetxt.
-
- Dec 08, 2014
-
-
Artem Polyakov authored
Logic introdiced in version 14.03.10 to support requeueing of jobs with GRES allocated to currently running steps broke select/linear due to differernces in the plugin logic. The commit with the bad logic is 1209a664
-
- Dec 05, 2014
-
-
Brian Christiansen authored
Bug 1301
-
- Dec 04, 2014
-
-
Brian Christiansen authored
Fix jobs from starting in overlapping reservations that won't finish before a "maint" reservation begins. Bug 1290
-
Danny Auble authored
when the DBD is down.
-
Danny Auble authored
-
- Dec 03, 2014
-
-
Morris Jette authored
This only prints the message if the user enables more detailed logging bug 1171
-
Morris Jette authored
Log Cray MPI job calling exit() without mpi_fini(), but do not treat it as a fatal error. This partially reverts logic added in version 14.03.9. bug 1171
-
- Dec 02, 2014
-
-
Danny Auble authored
better.
-
Danny Auble authored
in BASIL was changed.
-
Brian Christiansen authored
-
- Dec 01, 2014
-
-
Brian Christiansen authored
-
- Nov 24, 2014
-
-
Artem Polyakov authored
Double max string that Slurm can pack from 16MB to 32MB to support larger MPI2 configurations.
-
- Nov 21, 2014
-
-
Danny Auble authored
-
Danny Auble authored
-
Dominik Bartkiewicz authored
This can happen if the specified job ID is not found.
-
- Nov 20, 2014
-
-
David Bigagli authored
cgroup plugin.
-
Morris Jette authored
-
- Nov 18, 2014
-
-
David Bigagli authored
impact functionality.
-
- Nov 13, 2014
-
-
Brian Christiansen authored
Bug 1253
-