- May 08, 2021
-
-
Danny Auble authored
-
Nate Rini authored
Bug 11416
-
Danny Auble authored
Bug 11566
-
- May 07, 2021
-
-
Albert Gil authored
If child slurmstepd gets a timeout when sending a REQUEST_STEP_COMPLETE to its parent it will retry it (_one_step_complete_msg()). Those messages are handled on _handle_completion(), where it does the jobacctinfo_aggregate(). This commit detects if the message was already processed to avoid errors on the aggregated jobacctinfo. The same thing can happen on the slurmctld side as well. Bug 10723
-
Nate Rini authored
slurmrestd accepts multiple host-port arguments. Bug 11530
-
- May 06, 2021
-
-
Scott Hilton authored
Bug 11513
-
Tim McMullan authored
Bug 11502
-
Oriol Vilarrubi authored
Bug 11517
-
- May 05, 2021
-
-
Danny Auble authored
-
Tim McMullan authored
Bug 11375
-
Tim McMullan authored
Bug 11375
-
- May 01, 2021
-
-
Albert Gil authored
-
Albert Gil authored
Use test_name to avoid test_id. Use get_config_parameter to simplify checking requirements. Use submit_job to simplify code avoiding unnecessary spawn/expect. Use wait_for and new auxiliary function to avoid sleeps. Use subtest/subpass to avoid exit_code. Use tolerance to simplify subtests. Bug 10315 Signed-off-by:
Scott Jackson <scottmo@schedmd.com>
-
Albert Gil authored
Bug 10315 Signed-off-by:
Scott Jackson <scottmo@schedmd.com>
-
Scott Jackson authored
By creating the file to read as setup step instead that in one of the tasks. Bug 10315
-
Albert Gil authored
-
Scott Jackson authored
Bug 10604
-
- Apr 30, 2021
-
-
John Thiltges authored
print_job_select() included an unnecessary argument in a printf() call, causing a 'Redundant argument in printf' error to be shown. Bug 11491.
-
- Apr 29, 2021
-
-
Ben Roberts authored
For salloc, sbatch and srun Bug 11461 Signed-off-by:
Tim Wickberg <tim@schedmd.com>
-
Ben Roberts authored
Bug 11430 Signed-off-by:
Tim Wickberg <tim@schedmd.com>
-
Ben Roberts authored
Bug 11206 Signed-off-by:
Tim Wickberg <tim@schedmd.com>
-
Scott Jackson authored
-
Danny Auble authored
-
Marcin Stolarek authored
This looks like a typo in initial commit b71efa62 - lock should be released before return not locked again. Bug 11480
-
Marshall Garey authored
In a multi-node job it was possible to be in a situation where there were more CPUs available for steps to use but steps would not launch. For example, if a node has 2 cores and 1 thread per core and this job is submitted: sbatch -N2 --ntasks-per-node=2 --mem=1000 job.bash And job.bash contains the following: for i in {1..4} do srun --exact --mem=100 -N1 -c1 -n1 sleep 60 & done wait In this case, two steps would run on the first node and one step would run on the second node, but the fourth step would not run until the first step completed, even though there is an available task and CPU on the second node in the allocation. Why does this happen? If the step requests CPUs <= number of nodes, then when _pick_step_nodes() calls _pick_step_nodes_cpus: node_tmp = _pick_step_nodes_cpus(job_ptr, nodes_avail, nodes_needed, cpus_needed, usable_cpu_cnt); it will simply return the first N nodes from the nodes_avail bitmap, where N is the number of nodes that the step requested. In this example job, all the CPUs on the first node are allocated, but the first node remains in the nodes_avail bitmap. Then _pick_step_nodes_cpus() selects the first node and adds it to the nodes_picked bitmap. Right after that, _pick_step_nodes() gets the number of CPUs from nodes in the nodes_picked bitmap, which is 0 CPUs. The fix is to remove fully allocated nodes from nodes_avail bitmap. But this also creates a problem where once all the nodes are fully allocated and another valid step request comes, then an incorrect error message of ESLURM_REQUESTED_NODE_CONFIG_UNAVAILABLE would happen, when the correct error message is ESLURM_NODES_BUSY. So we increment job_blocked_nodes if there are no available cpus on the node. Bug 11357
-
Carlos Tripiana Montes authored
Continuation of 8475ae9d CID 221511 Bug 11401
-
Albert Gil authored
-
Scott Jackson authored
Fix job submission count message Bug 10439
-
Scott Jackson authored
Bug 10439
-
- Apr 28, 2021
-
-
Marcin Stolarek authored
Bug 11059
-
Ben Roberts authored
Add note in cgroup.conf to match note in slurm.conf Bug 11209 Signed-off-by:
Tim Wickberg <tim@schedmd.com>
-
Carlos Tripiana Montes authored
Jobs requesting resources that could fit in 1 leaf switch are incorrectly spread across switches. Fixing this code also makes "--switches" work again. select/cons_res already works according to documentation. Bug 11401
-
Michael Hinton authored
Bug 10944
-
Ben Roberts authored
Fix extra newlines that were causing links to be created where they weren't needed. Removed extra .TP that prevented link from being added. Correctly closed bold tag that prevented link. Bug 10944
-
Ben Roberts authored
Extra .LP tags and other white space created larger than normal gaps between certain paragraphs. Bug 10944
-
Tim Wickberg authored
-
Ben Roberts authored
Fix items with descriptions that started on the same lines. Address cases where some descriptions were indented more than the rest of the descriptions in the list. Bug 10944
-
Ben Roberts authored
Fixed for salloc, sbatch and srun Bug 10944
-
Ben Roberts authored
Bug 10944
-
- Apr 27, 2021
-
-
Tim McMullan authored
Send a 0 as the file length over the pipe to the parent to break out of the safe_read(), rather than hanging indefinitely since the child has already exited. Bug 11460
-