- Jul 06, 2013
-
-
Morris Jette authored
-
- Jul 05, 2013
-
-
John Thiltges authored
When using ThreadsPerCore > 1, it appears that DefMemPerCPU is being scaled by slurmctld, but not by slurmd/slurmstepd. For example, we set ThreadsPerCore=2 and DefMemPerCPU=100. Running a single core job, we would expect two threads to be allocated and AllocMem on the assigned node to increase by 200MB. scontrol reports that AllocMem increased by 200MB, but the task/cgroup plugin only sees 100M of RAM. It looks like the problem may lie in common/slurm_cred.c:format_core_allocs(). The function counts the job/step cores and multiplies the mem_limit's, but it does not scale the CPU count like in slurmd/slurmd/req.c:_check_job_credential(). See bug 309
-
Morris Jette authored
-
Morris Jette authored
-
jette authored
-
jette authored
-
jette authored
-
jette authored
-
jette authored
-
- Jul 03, 2013
-
-
Morris Jette authored
-
Piortr Lesnicki authored
This is the correction of the slurm client which otherwise fails the PMI2_Init() step. This hidden bug was introduced in the standalone client made from BULL and David's modification made it occur : a wrong size (by 1 too short) was passed to strncpy for keys, then David replaced strncpy by MPICH's MPIU_Strncpy forcing a terminal '\0'. Bug 359
-
Morris Jette authored
-
Danny Auble authored
messed up output. Use the parsable options if you want this kind of view.
-
- Jul 02, 2013
-
-
Morris Jette authored
-
Danny Auble authored
-
- Jul 01, 2013
-
-
Morris Jette authored
-
Nathan Yee authored
-
- Jun 28, 2013
-
-
Morris Jette authored
-
Morris Jette authored
Effects jobs with --exclusive and --cpus-per-task options bug 355
-
Stephen Trofinoff authored
A simple one-line fix to the "_adjust_cpus_nppcu" function that I had added. I had added this function as part of the NPPCU functionality; however it wasn't a problem until that squeue patch. That was because then squeue had been updated to use this function and in this one case the default value for the internal variable "ntasks_per_core" wound up not being the 0xffff (65535) that I previously had coded for (as in "select/cons_res") but instead was 0. Therefore, in that adjustment function of mine, I simply added a second clause to the if-statment where I check for the sentinel value that also checks whether it is 0. This resolved the problem. Because we do not usually use "select/serial", I did not notice this.
-
Danny Auble authored
-
Morris Jette authored
-
Morris Jette authored
-
Phil Eckert authored
-
Daniel M. Weeks authored
-
Morris Jette authored
This can happen if something outside of Slurm opens the srun socket and writes to it, since the data will not be of a form that Slurm can decode. Bug 354
-
Morris Jette authored
Rather than a job
-
Morris Jette authored
This removes logic added three years ago that would automatically set a job's cpus_per_task value in order to reset a job's mem_per_cpu value and scale the cpus_per_task by the same value. Equivalent logic did not exist in the step allocation logic. Just return an error instead. This change will be made in Slurm version 2.6, but this batch is made for version 2.5. The original patch introducing the problem is in commit: cc00cc70b9c90816afc511e0261e449857176332bug 352
-
Danny Auble authored
enum.
-
- Jun 27, 2013
-
-
Rod Schultz authored
Bug 351
-
Morris Jette authored
-
Morris Jette authored
-
Matthieu Hautreux authored
-
Morris Jette authored
This is extends the logic of commit ba58d59c to the following RPC types: job complete batch script complete and job step complete
-
Danny Auble authored
-
- Jun 26, 2013
-
-
Martin Perry authored
In acct_gather_energy/rapl initialization, if the fopen of /proc/cpuinfo fails we should treat this as a fatal condition rather than continue. Patch is attached. Problem found by Coverity tool, CID 20186 bug 331
-
Morris Jette authored
-
Morris Jette authored
-
Dominik Friedrich authored
-
Morris Jette authored
This applies the same logic as added for job signal and batch job submit as in commit ba58d59c
-