- Sep 12, 2017
-
-
Morris Jette authored
Enable them onlyh with SchedulerParameters=enable_hetero_jobs OR MPI type is "none"
-
Morris Jette authored
-
Brian Christiansen authored
-
Brian Christiansen authored
-
Brian Christiansen authored
Do pointer comparisons rather than strcmps. ~80x speedup Bug 3529 e.g. 1000 nodes 8000 tasks [Sep 11 14:24:15.873639 20992 srvcn 0x7f8c1cdda700] _task_layout_hostfile: hostfile processing took usec=2152678 (orig) [Sep 11 14:27:46.173424 20992 srvcn 0x7f8c1c6d3700] _task_layout_hostfile: hostfile processing took usec=2142997 (orig) [Sep 11 14:32:32.245420 4037 srvcn 0x7f12de4e4700] _task_layout_hostfile: hostfile processing took usec=26198 (node ptrs) [Sep 11 14:36:12.88769 4037 srvcn 0x7f12de6e6700] _task_layout_hostfile: hostfile processing took usec=25515 (node ptrs) [Sep 11 14:41:38.339162 4037 srvcn 0x7f132c8d5700] _task_layout_hostfile: hostfile processing took usec=27459 (node ptrs) [Sep 11 15:16:59.575189 1874 srvcn 0x7f3dae3f0700] _task_layout_hostfile: hostfile processing took usec=30129 (node ptrs) [Sep 11 15:20:50.365004 1874 srvcn 0x7f3dc8b34700] _task_layout_hostfile: hostfile processing took usec=29884 (node ptrs)
-
Brian Christiansen authored
-
- Sep 11, 2017
-
-
Tim Wickberg authored
-
Danny Auble authored
-
Morris Jette authored
Previous logic was recording each character in each node name as a separate job component's host name
-
Morris Jette authored
-
Morris Jette authored
Commit 241f31d7 resulted in heterogeneous jobs not building a valid script unless some valid burst buffer plugin was configured (no burst buffer plugin resulting in NULL batch script).
-
Tim Wickberg authored
After auditing the four calls using this function, it's clear that none of these fd's are ever meant to leak to a fork()'d process.
-
Tim Wickberg authored
-
Tim Wickberg authored
Created by VIM if .swp is already in use.
-
Morris Jette authored
-
Morris Jette authored
-
Ole H Nielsen authored
-
- Sep 09, 2017
-
-
Tim Wickberg authored
Per the creat() man page, creat() is equivalent to calling open with flags of O_CREAT|O_WRONLY|O_TRUNC. Add O_CLOEXEC as well.
-
Tim Wickberg authored
-
Tim Wickberg authored
-
Tim Wickberg authored
-
Tim Wickberg authored
Only cut over when the heartbeat file is not being updated any longer. Bug 4142.
-
Tim Wickberg authored
Will write out a timestamp into a 'heartbeat' file in StateSaveLocation every (SlurmctldTimeout / 4) seconds to demonstrate that the primary controller still has access to the directory, and thus the backup should avoid taking control. Bug 4142.
-
- Sep 08, 2017
-
-
Morris Jette authored
-
Isaac Hartung authored
-
Brian Christiansen authored
The change from strncpy to strlcpy was chopping off the last character of the name.
-
Tim Wickberg authored
xgroup_set_param() avoids the string parsing overhead and should be used instead.
-
Tim Wickberg authored
-
Tim Wickberg authored
Save re-parsing the input string back into the components.
-
Tim Wickberg authored
Since ReleaseAgent is no longer required, we can strip out all the supporting logic for it.
-
Morris Jette authored
Accidentally removed bracked in checked-in code
-
Morris Jette authored
-
Morris Jette authored
-
Dominik Bartkiewicz authored
If /proc was inaccessible proc_name would leak. Put an explicit length cap in sprintf to avoid warning. The size is checked immediate before here so this is just making the 10-char limit explicit. Bug 4062.
-
Morris Jette authored
-
Morris Jette authored
-
Dominik Bartkiewicz authored
-
Tim Wickberg authored
If the network path to shared storage used for the StateSaveLocation is separate from that used to communicate with the cluster, both the primary and backup controllers can end up acting as master on loss of the cluster network. Alter the HA takeover code path to make sure that the job state save file is not still being updated by the primary slurmctld. If it is, refuse to takeover and retry again later. Bug 3592.
-
Tim Wickberg authored
-
Tim Wickberg authored
-