- Jan 29, 2016
-
-
Morris Jette authored
When the slurmctld is in background mode, it will issue double free calls on the incomming message buffers, likely leading an abort.
-
Morris Jette authored
If an invalid trigger message is received by slurmctld, it could result in a non-zero array counter and a NULL element array. If the element array is NULL, then clear the counter to avoid xfree calls of bad pointers.
-
- Jan 28, 2016
-
-
Morris Jette authored
Do not automatically relocate an advanced reservation for individual cores that spans multiple nodes when nodes in that reservation go down (e.g. a 1 core reservation on node "tux1" will be moved if node "tux1" goes down, but a reservation containing 2 cores on node "tux1" and 3 cores on "tux2" will not be moved node "tux1" goes down). Advanced reservations for whole nodes will be moved by default for down nodes. bug 2326
-
Tim Wickberg authored
avoid attempting to execve() a directory with a name that happens to matching that of the desired command. bug 2392.
-
Morris Jette authored
Allow an existing reservation with running jobs to be modified without Flags=IGNORE_JOBS. bug 2389
-
Morris Jette authored
burst_buffer/cray - Increase size of intermediate variable used to store buffer byte size read from DW instance from 32 to 64-bits to avoid overflow and reporting invalid buffer sizes. bug 2378
-
Danny Auble authored
-
- Jan 27, 2016
-
-
Danny Auble authored
-
Danny Auble authored
-
Danny Auble authored
gres types without a File.
-
Danny Auble authored
-
Danny Auble authored
to debug3 when trying to find the correct association. a continuation to commit 87d9370f
-
Alejandro Sanchez authored
-
- Jan 26, 2016
-
-
Morris Jette authored
Both the original logic and modified logic failed to lock the job data structure prior to decrementing "prolog_running" counter.
-
- Jan 25, 2016
-
-
Danny Auble authored
in the forward message logic.
-
Morris Jette authored
Previously under some conditions that boot completion was ignored and the job kept pending.
-
Tim Wickberg authored
-
Sergey Meirovich authored
-
- Jan 22, 2016
-
-
Danny Auble authored
-
- Jan 21, 2016
-
-
Danny Auble authored
Bug 2364
-
Danny Auble authored
Commit fa331e30 fixes this. The logic was bad to begin with... uint32_t new_cpus = detail_ptr->num_tasks / detail_ptr->cpus_per_task; The / should had been * this whole time. This was the reason we found this in the first place.
-
Morris Jette authored
bug 2369
-
Gennaro Oliva authored
-
Morris Jette authored
If scancel is operating on large number of jobs and RPC responses from slurmctld daemon are slow then introduce a delay in sending the cancel job requests from scancel in order to reduce load on slurmctld. bug 2256
-
Morris Jette authored
If a job launch is delayed, the test was failing due to bad parsing. These lines were being interpretted as a counter folloed by node names of "queued" and "has": srun: job 1332712 queued and waiting for resources srun: job 1332712 has been allocated resources
-
Morris Jette authored
-
Morris Jette authored
bug 2366
-
Danny Auble authored
-
Morris Jette authored
Backfill scheduling properly synchronized with Cray Node Health Check. Prior logic could result in highest priority job getting improperly postponed. bug 2350
-
Danny Auble authored
-
- Jan 20, 2016
-
-
Danny Auble authored
-
Danny Auble authored
-
Danny Auble authored
-
Morris Jette authored
-
Morris Jette authored
This corrects logic from commit e5a61746 that could result in use of NULL pointer
-
Morris Jette authored
It was previously triggered by executing "scontrol reconfig" on a front-end system while there was a job in completing state.
-
Morris Jette authored
Properly account for memory, CPUs and GRES when slurmctld is reconfigured while there is a suspended job. Previous logic would add the CPUs, but not memory or GPUs. This would result in underflow/overflow errors in select cons_res plugin. bug 2353
-
Morris Jette authored
The counter is really intended to reflect the count of running or suspended jobs rather than running jobs alone. Previous logic would report an underflow for the "job_cnt_run" variable if 1. job submitted 2. job suspended 3. scontrol reconfig 4. job cancelled
-
- Jan 19, 2016
-
-
Morris Jette authored
Log the length of bitmaps in addition to the bits set. Also increase the string length used for logging.
-
Morris Jette authored
Previous logic would prevent allocation of sockets to a job unless the entire socket was available. If there were any specialized cores, the socket was treated as being not available and unusable. For example, if a node had 2 sockets, then a job requesting 2 specialized cores would reserve one core on each of the two sockets and render the job not runnable.
-