- Sep 21, 2023
-
-
Megan Dahl authored
Without this patch, when a node is being removed from an allocation that uses --no-kill the extern step gets killed on all nodes which broke things on the remaining nodes that rely on the extern step such as x11, job_container/tmpfs, and pam_slurm_adopt. To avoid this, when one or more nodes fail, only kill the extern step on the failed nodes. The extern step will still be killed when the job completes or is killed. If the job did not specify --no-kill and a node failes, then the job will be killed anyway including the extern step. Bug 17395
-
Nathan Rini authored
Bug 17438, 17476
-
Nathan Rini authored
Regression from 56a3fd79. Bug 17438
-
Nathan Rini authored
Regression from 56a3fd79. Bug 17438
-
Nathan Rini authored
Bug 17438
-
- Sep 20, 2023
-
-
Nathan Rini authored
Avoid having plugins->function being NULL if plugins->count > 0. Bug 17673
-
Nathan Rini authored
Bug 17673
-
Brian Christiansen authored
-
Ben Glines authored
Bug 15608
-
Ben Glines authored
The error message originally added in 23018040 does not apply to dynamic nodes, so this commit ensures that it does not appear for dynamic nodes. Bug 15603
-
Ben Glines authored
_set_features() and _restore_node_state() will attempt to find records and do work for dynamic nodes, but _preserve_dynamic_nodes() needs to be called before these functions so that the dyanmic node records exist and can be found. Bug 15603
-
- Sep 19, 2023
-
-
Benjamin Witham authored
Srun cpu-bind option would return different cpu-binds if the cpu-binds were "=v" and "=verbose" even though these should be the same. The case for the "=v" option is added to the if statement. The CPU_BIND_VERBOSE flag doesn't work becasue the flags are not set until after the _validate_threads_per_core_option function has run. This fixes a regression from commit 40a3bf37. Bug 17571
-
Albert Gil authored
The port for slurmctld was not accounted for in the original docs. Bug 17619
-
Albert Gil authored
Bug 17661
-
Caden Ellis authored
anon_thp is included in the anon stat, so we don't need to use it when calculating total_rss in cgroups v2. This fixes the formula used in commit fc814a39. Bug 16686
-
- Sep 18, 2023
-
-
Tim McMullan authored
Bug 17693
-
Tim McMullan authored
Bug 17693
-
Nathan Prisbrey authored
Bug 17238
-
Nathan Prisbrey authored
There is a known race condition reported in bug 16459 with "slurmd -b" that might lead to false failures. Add a temporary sleep to avoid it until it's fixed. Bug 17238
-
Nathan Prisbrey authored
Slurm code might be too fast for the test to assert intermediate/temporal states POWER_DOWN and COMPLETING and that may lead to false failures. That is, test might not be fast enough to poll for those states, therefore we should test for the final expected states only. Bug 17238
-
Nathan Prisbrey authored
The NOT_RESPONDING assert needs to be removed. Bug 17238
-
Nathan Prisbrey authored
Bug 17238
-
Albert Gil authored
Bug 17238 Signed-off-by:
Nathan Prisbrey <nathan@schedmd.com>
-
Albert Gil authored
-
Nathan Prisbrey authored
This test1.39 was removed in commit 8c17aa69.
-
- Sep 15, 2023
-
-
Marcin Stolarek authored
Bug 17632
-
Tim Wickberg authored
Link to JSSPP'23 paper, and provide an example citation block.
-
Marcin Stolarek authored
This function doesn't support removal of elements from gres_list, so we will need to generate a new gres_list instead of modifying the current one. Revert commit e652133c attempted to add the removal logic incorrectly. gres_list wasn't yet merged at that point so we had there non-typed elements representing cpus_per_tres and mem_per_tres which needed to be merged into typed gres (e.g. gpu:v100:x) first. Since those didn't contribute to GRES allocation, but affected other resources allocation, they didn't contribute to total_gres which was 0 and were incorrectly removed. Bug 17184
-
Marcin Stolarek authored
Bug 17184
-
Felip Moll authored
In case of update of GRES specification we call gres_job_state_validate on the job_desc requested by the update. If some member of the struct wasn't specified we need to use what we had before so we need to set the TRES job_desc fields. Reset them after validating. This is preparation for the next commits and implicitly fixes CpusPerTres not being updateable. Before: $scontrol update job=55 CpusPerTres=gres:gpu:2 Invalid generic resource (gres) specification for job 55 After: $scontrol show job | grep CpusPerTres CpusPerTres=gres:gpu:1 $scontrol update job=55 CpusPerTres=gres:gpu:2 $scontrol show job | grep CpusPerTres CpusPerTres=gres:gpu:2 Bug 17184
-
- Sep 14, 2023
-
-
Tim McMullan authored
-
Tim McMullan authored
-
Tim McMullan authored
Bug 13256 Signed-off-by:
Ben Roberts <ben@schedmd.com>
-
Ben Roberts authored
Bug 13256
-
Ben Roberts authored
Bug 13256
-
Ben Roberts authored
This information was present for many options already, but missing for these. Bug 13256
-
Ben Roberts authored
Make QOS format align with other sections in this page. There should be more detail about the parameter and how to set it in the 'SPECIFICATIONS FOR ...' section and a brief description in the 'LIST/SHOW ...' section. Bug 13256
-
Marcin Stolarek authored
Job submit plugin function are executed on slurmctld side. The way RPC was generated (API, specific tool) doesn't matter. Bug 17411
-
- Sep 12, 2023
-
-
Tim Wickberg authored
Bug 17669.
-
Tim Wickberg authored
-