- Sep 17, 2016
-
-
Morris Jette authored
Restore ability to manually power down nodes, broken in 15.08.12 in commit b4904661 The patch introduced in commit b4904661 (not powering down dead node) has a bad side effect. Adding the "(node_ptr->last_idle != 0)" condition prevents from powering down nodes with the following command: scontrol update nodename=nX state=power_down because the state update function relies on zeroing the "last_idle" variable when a power_down is requested (see src/slurmctld/node_mgr.c, line 1589). Reverting this commit should solve the problem...but I let you decide... Didier GAZEN
-
- Sep 16, 2016
-
-
Morris Jette authored
node_features/knl_cray: If a node is rebooted outside of Slurm's direction, update it's active features with current MCDRAM and NUMA mode information. bug 3071
-
- Sep 15, 2016
-
-
Morris Jette authored
Fix race condition that could result in MCDRAM state information coming from capmc rather than cnselect (used state for next boot rather than latest boot). bug 3080
-
Nicolas Joly authored
-
- Sep 14, 2016
-
-
Alejandro Sanchez authored
No functional change, just silencing the warning message in this instance. Bug 3079.
-
Alejandro Sanchez authored
Bug 3073.
-
- Sep 09, 2016
-
-
Morris Jette authored
Modify srun task completion handling to only build the task/node string for logging purposes if it is needed. Modified for performance purposes. bug 3044
-
Tim Wickberg authored
This reverts commit 1ec2a4ae.
-
Alejandro Sanchez authored
Bug 3063.
-
- Sep 08, 2016
-
-
Morris Jette authored
Restructure srun command locking for task_exit processing logic for improved parallelism. This change decreases the amount of time consumed by serial logic by 2 orders of magnitude. bug 3044
-
- Sep 07, 2016
-
-
Morris Jette authored
Preserve node "RESERVATION" state when one of multiple overlapping reservations ends. Previous logic would clear the node's RESERVATION state flag when any one of the reservations on the node ended rather than keeping the node in RESERVATION state until the last reservation ended. bug 3057
-
Morris Jette authored
Handle case when slurmctld daemon restart while compute node reboot in progress. Return node to service rather than setting DOWN. bug 3042
-
- Sep 06, 2016
-
-
Morris Jette authored
Add salloc_wait_nodes option to the SchedulerParameters parameter in the slurm.conf file controlling when the salloc command returns in relation to when nodes are ready for use (i.e. booted). bug 3043
-
Gennaro Oliva authored
bug 3055
-
Gennaro Oliva authored
bug 3054
-
- Sep 02, 2016
-
-
Danny Auble authored
reservations.
-
- Sep 01, 2016
-
-
Morris Jette authored
sched/backfill - Check that a user's QOS is allowed to use a partition before trying to schedule resources on that partition for the job. bug 3039
-
Morris Jette authored
bug 3035 and 3009
-
- Aug 30, 2016
-
-
Tim Wickberg authored
-
Tim Wickberg authored
Otherwise blade_cnt is potentially greater than bit_size(jobinfo->blade_map) which leads to an assertion failure. Bug 3033.
-
- Aug 27, 2016
-
-
Artem Polyakov authored
with hwloc.
-
Morris Jette authored
This patch has two parts: 1. When a job is intially submitted, the Slurm was failing to set an initial reason for the job not starting. 2. After a job was submitted, it was sometimes failing to reset the job's reason. It was also failing to reset the "last_job_update" time, so something like "squeue -i1" would not get the new reason. bug 3025
-
- Aug 26, 2016
-
-
Alejandro Sanchez authored
Fix multipart srun submission with EnforcePartLimits=NO and job violating the partition limits. bug 3025
-
Alejandro Sanchez authored
bug 3011
-
- Aug 25, 2016
-
-
Morris Jette authored
If all GRES were not defined on all nodes OR if a regular expression was used for a GRES file configuration (e.g. in gres.conf "Type=gpu Files=/dev/nvidia[0-4]"), then memory corruption was likely. The logic has been bad since its inception several years ago.
-
- Aug 24, 2016
-
-
Joseph Mingrone authored
POLLRDHUP does not exist on BSD, define to POLLHUP as done elsewhere.
-
- Aug 23, 2016
-
-
David Gloe authored
The attached patch switches to a more reliable method of detecting service nodes, using xtcli status. In addition, it switches to the print function to be better compatible with python 3.
-
- Aug 22, 2016
-
-
Boris Karasev authored
-
Boris Karasev authored
To ease the distribution process, plugin names will be automatically adjusted to identify the version of API that it can support, ie: pmix_v1 and pmix_v2. This provides the ability for distro's to create separate non-conflicting packages for each API generation. Bug 2986
-
- Aug 20, 2016
-
-
Morris Jette authored
Insure reported expected job start time is not in the past for pending jobs. bug 3002
-
- Aug 19, 2016
-
-
Morris Jette authored
burst_buffer/cray: Requeue, but do not hold a job which fails the pre_run operation. bug 3009
-
- Aug 17, 2016
-
-
Morris Jette authored
-
- Aug 16, 2016
-
-
Alejandro Sanchez authored
Only mark job_id as zero for batch step (when all job steps would be cleared), not for individual steps which prevented successive steps from being cancelled. Bug 2984.
-
Danny Auble authored
-
Danny Auble authored
-
Morris Jette authored
slurmstepd modified to pre-load all relevant plugins at startup to avoid the possibility of modified plugins later resulting in inconsistent API or data structures and a failure of slurmstepd. bug 2334
-
- Aug 15, 2016
-
-
Danny Auble authored
-
- Aug 12, 2016
-
-
Danny Auble authored
-
Morris Jette authored
-
- Aug 11, 2016
-
-
Morris Jette authored
bug 2655
-