- Nov 11, 2016
-
-
David Gloe authored
Bug 3253.
-
Tim Wickberg authored
-
- Nov 10, 2016
-
-
Tim Wickberg authored
If the input value mod 512 == 0, the value would be subject to unintended rounding. Rework the function to check against this on each unit promotion. Bug 3252.
-
Morris Jette authored
It was causing the loss of node available_features on startup with node_features/knl_cray bug 3241
-
Morris Jette authored
Check for zonesort file first, to save time over attempting to load a module that is already loaded. It may be loaded by default per administrator configuration.
-
- Nov 09, 2016
-
-
Morris Jette authored
Set per-node HBM availability as a GRES based upon the KNL node's MCDRAM state bug 3171
-
Alejandro Sanchez authored
Caused by race for local_energy which is dynamically allocated. Bail out of the update if that hasn't been allocated yet. Bug 3237.
-
- Nov 08, 2016
-
-
Morris Jette authored
Add new node state flag of NODE_STATE_REBOOT for node reboots triggered by "scontrol reboot" commands. Previous logic re-used NODE_STATE_MAINT flag, which could lead to inconsistencies. Add "ASAP" option to "scontrol reboot" command that will drain a node in order to reboot it as soon as possible, then return it to service. bug 3210
-
Morris Jette authored
bug 3213
-
Morris Jette authored
select/linear plugin modified to better support heterogeneous clusters when topology/none is also configured. Note that use of the select/cons_res plugin is strongly recommended for heterogeneous clusters. The use of OverSubscribe=exclusive can be used if whole node allocations is desired. bug 3212
-
Alejandro Sanchez authored
Bug 3224.
-
Morris Jette authored
If a job is started by the main scheduling logic and requeued while the backfill scheduler has locks released, that can result in an invalid data structure in select/cons_res. Namely, the backfill scheduler's attempt to start the job would clear the job resources node_bitmap. That leaves a NULL pointer in the select/cons_res plugin generating an abort. (That pointer is needed to clean up the job allocation records when the Epilog or Cray Node Health Check, NHC, are complete and the resources become available for another job. bug 3230
-
- Nov 07, 2016
-
-
Morris Jette authored
Backup slurmctld will now 1. Not abort due to NULL pointer (needed to move code around on restart) 2. Recover KNL MCDRAM and NUMA modes from state save files if capmc and cnselect not available bug 3241
-
- Nov 05, 2016
-
-
Morris Jette authored
cray/burst_buffer - Update "instance" parsing to match updated dw_wlm_cli output. bug 3222
-
- Nov 04, 2016
-
-
Morris Jette authored
Add "FreeSpace" information for each pool to the "scontrol show burstbuffer" output. Required changes to the burst_buffer_info_t data structure. bug 3222
-
Morris Jette authored
cray/burst_buffer - Preserve job ID and don't translate to job array ID after slurmctld restart. Prior logic would not set array_task_id to NO_VAL, so all job-buffer IDs would be reported in the form "JobID=0_0(123)" rather than "JobID=123"
-
Morris Jette authored
cray/busrt_buffer - Internally track both allocated and unusable space. The reported UsedSpace in a pool is now the allocated space (previously was unusable space). Base available space on whichever value leaves least free space. bug 3222
-
- Nov 02, 2016
-
-
Morris Jette authored
Add LaunchParameters=mem_sort option to set configur running of zonesort by default at step startup. Also add documentation about zonesort on KNL web page bug 3188
-
- Nov 01, 2016
-
-
Danny Auble authored
and request --ntasks-per-core=1 and only 1 task on the node the slurmd would abort on an infinite loop fatal. Regression is from commit 5265420d. Without this fix you can get into an infinite loop in the task/affinity plugin. The loop is handled by producing a fatal. Bug 3118
-
Morris Jette authored
cray/busrt_buffer - Fix for double counting of used_space at slurmctld startup. bug 3222
-
Morris Jette authored
cray/busrt_buffer - If total_space in a pool decreases, reset used_space rather than trying to account for buffer allocations in progress. bug 3222
-
- Oct 31, 2016
-
-
Morris Jette authored
bug 3188
-
- Oct 28, 2016
-
-
Danny Auble authored
more time than should be allowed would be accounted for. This only happened on jobs in the completing state when the slurmctld was shutdown. This will also be enhanced in 17.02 as the job's end_time_exp is not stored which is needed to determine if the job has already been through the decay_thread at end of job. Bug 3162
-
- Oct 27, 2016
-
-
Morris Jette authored
-
Morris Jette authored
bug 3139
-
Danny Auble authored
issue with gang scheduling. Bug 3211
-
Tim Wickberg authored
-
Brian Christiansen authored
Federated submissions
-
Morris Jette authored
-
- Oct 26, 2016
-
-
Morris Jette authored
Fix bug that was clearing MAINT mode on nodes scheduled for reboot (bug introduced in version 16.05.5 to address bug in overlapping reservations, commit 5eee1d28). Note that a node's MAINT flag is used for both a requested reboot and maintenance reservation. What I'd like to do is add a new node state flag to differenciate between these two cases, but that involves some significant changes that could introduce instability, so it will be defered to version 17.02 bug 3210
-
Alejandro Sanchez authored
salloc are requested with -n tasks < hosts from -w hostlist or from -N.
-
Danny Auble authored
-
Danny Auble authored
requested with -n tasks < hosts from -w hostlist.
-
Morris Jette authored
bug 2149
-
Morris Jette authored
Add new SchedulerParameter (max_array_tasks) to limit the maximum number of tasks in a job array independently from the maximum task ID (MaxArraySize). bug 2676
-
- Oct 25, 2016
-
-
Dominik Bartkiewicz authored
Bug 3194
-
Morris Jette authored
Add SbcastParameters configuration option to control default file destination directory and compression algorithm. bug 2977
-
Morris Jette authored
Replace sjstat, seff and sjobexit RPM packages with a single "contribs" package.
-
Danny Auble authored
-
Morris Jette authored
Remove separate slurm_blcr package. If Slurm is build with BLCR support, the files will now be part of the main Slurm packages. bug 2061
-