- Oct 05, 2016
-
-
Morris Jette authored
node_features/knl_cray plugin: drain any node not reported by "capmc node_status" on startup or reconfig. Also re-tests on failed node restart for job.
-
Morris Jette authored
node_features/knl_cray plugin: Remove any KNL MCDRAM or NUMA features from node's configuration if capmc does NOT report the node as being KNL. For example, we don't want a non-KNL node with features="quad,cache".
-
Brian Christiansen authored
Found by clang. Continuation of 76d62ae4
-
- Oct 04, 2016
-
-
Morris Jette authored
Add new knl.conf configuration parameter CapmcRetries Modify capmc_suspend and capmc_resume to retry operations when Cray State Manager is down. Add retry logic to node_features/knl_cray to handle Cray State manager being down. bug 3100
-
Danny Auble authored
from commit ee4a9776.
-
Danny Auble authored
-
Danny Auble authored
reset by list_count. Also remove a nested if for cleaner code.
-
Danny Auble authored
-
Tim Wickberg authored
Logs go to both locations when running in non-daemonized mode. Don't refer to this as "debug" mode, while useful for debugging it's not directly related. Bug 3146.
-
- Oct 03, 2016
-
-
Dominik Bartkiewicz authored
-
- Sep 30, 2016
-
-
Alejandro Sanchez authored
Otherwise they'll truncate when packed into the RPC and end up as some bizarre value at the controller. Bug 3098.
-
Dominik Bartkiewicz authored
Set completed time for pending/running runaway jobs to the max of (start, eligible, submit) times. Bug 3075
-
Gennaro Oliva authored
-
Artem Polyakov authored
Avoid using slurm_forward_data because it causes thread spawn that introduces unwanted delays. Bug 3102.
-
Tim Wickberg authored
-
Morris Jette authored
-
- Sep 29, 2016
-
-
Morris Jette authored
Fix indent and add brackets
-
Morris Jette authored
-
Morris Jette authored
-
Alejandro Sanchez authored
Also correct the value of NICE_OFFSET used within the perl API. Bug 3098.
-
Morris Jette authored
-
Tim Wickberg authored
-
Tim Wickberg authored
-
Artem Polyakov authored
Bug 3051.
-
Tim Wickberg authored
Otherwise updates would be rejected for running jobs even if there would be no impact. Most common when the job_submit plugin is overriding QOS/GRES values on everything; without this change an update to "comment" or other fields would fail with ESLURM_JOB_NOT_PENDING. Bug 3117.
-
Tim Wickberg authored
Never ever run NHC, even on an edge case that NHC_NO would still launch NHC after. Bug 3105.
-
Tim Wickberg authored
Switch to list_for_each, and check if access list actually changed after each update before updating last_prat_update. This prevents the backfill scheduler from resetting mid-cycle unnecessarily. Bug 3123.
-
Josko Plazonic authored
Buffer was being allocated based upon the size of the wrong structure. The buffer would have been larger than required, so this bug should not have caused any failures. bug 3127
-
- Sep 28, 2016
-
-
Morris Jette authored
Added memory limits to the job and step. Without these, only one step may be able to run at a time and break the test
-
Morris Jette authored
Add "sbatch_wait_nodes" to SchedulerParameters to control default sbatch behaviour with respect to waiting for all allocated nodes to be ready for use. Job can override the configuration option using the --wait-all-nodes=# option. bug 3120
-
- Sep 27, 2016
-
-
Morris Jette authored
Prior logic would treat execute line like this: $ sbatch --wait-all-nodes -N3 tmp with "-N3" as being the argument to the "--wait-all-nodes" option. See bug 3120
-
Morris Jette authored
Add salloc/sbatch/srun option --use-min-nodes to prefer smaller node counts when a range of node counts is specified (e.g. "-N 2-4"). bug 2996
-
Tim Wickberg authored
Switch a few SLURM mentions for Slurm as well.
-
- Sep 26, 2016
-
-
Morris Jette authored
-
Morris Jette authored
It was out of alphabetic order before (e.g. after --power).
-
Morris Jette authored
Add salloc/sbatch/srun --priority option of "TOP" to set job priority to the highest possible value. This option is only available to Slurm operators and administrators. bug 3115
-
Morris Jette authored
The problem reported is just a configuration warning and not an error. Also change the test from ">=" to ">". bug 3086
-
Morris Jette authored
-
Morris Jette authored
This patch finally resolves absolute/relative CPU mapping for nodes where the NUMA (or sockets) have different core counts (e.g. KNL SNC4).
-
- Sep 25, 2016
-
-
Morris Jette authored
-