- Jun 28, 2013
-
-
Morris Jette authored
-
Morris Jette authored
Effects jobs with --exclusive and --cpus-per-task options bug 355
-
Stephen Trofinoff authored
A simple one-line fix to the "_adjust_cpus_nppcu" function that I had added. I had added this function as part of the NPPCU functionality; however it wasn't a problem until that squeue patch. That was because then squeue had been updated to use this function and in this one case the default value for the internal variable "ntasks_per_core" wound up not being the 0xffff (65535) that I previously had coded for (as in "select/cons_res") but instead was 0. Therefore, in that adjustment function of mine, I simply added a second clause to the if-statment where I check for the sentinel value that also checks whether it is 0. This resolved the problem. Because we do not usually use "select/serial", I did not notice this.
-
Danny Auble authored
-
Morris Jette authored
-
Morris Jette authored
-
Phil Eckert authored
-
Daniel M. Weeks authored
-
Morris Jette authored
This can happen if something outside of Slurm opens the srun socket and writes to it, since the data will not be of a form that Slurm can decode. Bug 354
-
Morris Jette authored
Rather than a job
-
Morris Jette authored
This removes logic added three years ago that would automatically set a job's cpus_per_task value in order to reset a job's mem_per_cpu value and scale the cpus_per_task by the same value. Equivalent logic did not exist in the step allocation logic. Just return an error instead. This change will be made in Slurm version 2.6, but this batch is made for version 2.5. The original patch introducing the problem is in commit: cc00cc70b9c90816afc511e0261e449857176332bug 352
-
Danny Auble authored
enum.
-
- Jun 27, 2013
-
-
Rod Schultz authored
Bug 351
-
Morris Jette authored
-
Morris Jette authored
-
Matthieu Hautreux authored
-
Morris Jette authored
This is extends the logic of commit ba58d59c to the following RPC types: job complete batch script complete and job step complete
-
Danny Auble authored
-
- Jun 26, 2013
-
-
Martin Perry authored
In acct_gather_energy/rapl initialization, if the fopen of /proc/cpuinfo fails we should treat this as a fatal condition rather than continue. Patch is attached. Problem found by Coverity tool, CID 20186 bug 331
-
Morris Jette authored
-
Morris Jette authored
-
Dominik Friedrich authored
-
Morris Jette authored
This applies the same logic as added for job signal and batch job submit as in commit ba58d59c
-
- Jun 25, 2013
-
-
Danny Auble authored
is larger than the number of tasks spread across the node.
-
Danny Auble authored
-
Danny Auble authored
since neither of those should be divided by 100 instead of 1024.
-
Danny Auble authored
divisible by 1024.
-
Danny Auble authored
future.
-
Thomas Cadeau authored
-
Danny Auble authored
for as the job/step id.
-
Morris Jette authored
-
Danny Auble authored
-
David Gloe authored
The SLURM Makefile.am scripts use pkglibexecdir. One source indicates that this was not added until automake 1.10.2 (https://github.com/rerun/rerun/issues/167). So we just made that to be the minimum.
-
jette authored
The logic added to wake pending job steps as soon as resources become available lacked signal handling logic. This adds signal handling logic. Fix for bug 339
-
- Jun 24, 2013
-
-
jette authored
Under very heavy load with many thousands of batch job submissions or job signals, the write lock can be held for very long periods of time preventing job scheduling, squeue response, etc. This code inserts a timing break to permit other functions to get the locks.
-
jette authored
-
jette authored
-
- Jun 21, 2013
-
-
Danny Auble authored
-
Danny Auble authored
-
Danny Auble authored
-