- Apr 21, 2020
-
-
Nate Rini authored
Avoid blindly overwriting job wait reason when job unable to run due to partition configuration. As jobs finish, they can cause pending jobs to randomly change reason to priority and back to the real reason for their waiting. Bug 8347.
-
Marcin Stolarek authored
Command line tools option is stored as signed int, which is rather int16_t than uint32_t. Value of INFINITE is effectively -1 which is used by sbatch/salloc as "no val" effectively preventing the job_desc.ntasks_per_core being set as a result of --hint=multithread. On the slurmctld side, ntasks_per_core == INFINITE16 (see _set_multi_core_data()) is used as a "no val". In the case of CR_ONE_TASK_PER_CORE, using --multithread wasn't overriding this and allowing more than one task per core. Bug 8147
-
Jason Booth authored
Error's out pointing to possible database corruption. Bug 7583
-
Marcin Stolarek authored
The test runs sacctmgr show for various entities and checks if all headers are accepted in format= specification. Bug 7201
-
Marcin Stolarek authored
Both MaxSubmitPA and MaxSubmitPU are displayed in sacctmgr show qos header, so we should handle it correctly when specified in format= option. Bug 7201
-
- Apr 20, 2020
-
-
Nate Rini authored
Bug 8834.
-
Nate Rini authored
No functional change. Bug 8834.
-
Scott Jackson authored
Adjust regex in response to changes in GRES format in scontrol show job added on commit (51a95b70) Bug 8799
-
Albert Gil authored
-
Albert Gil authored
Commit 6eef5739 broke 1.88 and 38.7 on systems without strlcpy. We don't want to add any extra/slurm dependency for MPI jobs in tests 1.88 and 38.7, and we want them to run on systems without strlcpy too. This commit restore the original code of such tests. Changes on test7.17 are kept as it is already depending on slurm, and a candidate to become a unit test. Bug 7265 Signed-off-by:
Nate Rini <nate@schedmd.com>
-
- Apr 18, 2020
-
-
Dominik Bartkiewicz authored
Bug 8563
-
Marshall Garey authored
This is a partial revert of commit 74184e4d. From that commit message: "At the start of a scheduling cycle, the job's "reason" field can be cleared. If the scheduler fails to reach that job and set its value to a new reason, the original reason was lost and the state reports would report NoReason." This isn't true anymore so we revert that change (packing the previous state_reason) and just pack the job's state_reason. Bug 8848
-
- Apr 17, 2020
-
-
Danny Auble authored
-
Danny Auble authored
# Conflicts: # src/plugins/select/cons_tres/dist_tasks.c
-
Marcin Stolarek authored
Revert the changes from 6dcb598c and break appropriate loop. Bug 8818. Co-authored-by:
Danny Auble <da@schedmd.com>
-
Dominik Bartkiewicz authored
This was missed in commit 60c67e82 Bug 8859
-
Danny Auble authored
-
Marshall Garey authored
(1) In _pick_step_nodes(), we get a bitmap (nodes_idle) of "idle" nodes in the job - in other words, nodes that don't have any steps running on them. (2) A little farther down, we check if nodes_idle has any nodes, and try to allocate resources from the idle nodes for the step. (3) Then we check if there are any "available" nodes in the job (nodes that have some resources free but aren't completely idle), and try to allocate resources from them. We will fill up a node before allocating from the next one. The problem: when we check if nodes are idle[1], we check which nodes are being used by all running steps, *including* the extern and batch steps. This means that with PrologFlags=contain (which enables the extern step), there will never be any "idle" nodes for a job because the extern step runs on all the nodes. We will skip section (2) and go straight to section (3). Steps will still run, but where the steps are allocated is different than without the extern or batch steps. The buggy behavior is this: We pack all the steps onto the first node, then we pack all the steps onto the second node, and so on. The fixed behavior is the same as without a batch step or extern steps: we run new steps on idle nodes first, then we pack subsequent steps on each node until it is filled (only moving on to the next node once the current node is filled or doesn't have enough resources for the step request). Bug 8020.
-
Dominik Bartkiewicz authored
Bug 8584 Co-authored-by:
Brian Christiansen <brian@schedmd.com>
-
Tim Wickberg authored
-
- Apr 16, 2020
-
-
Danny Auble authored
# Conflicts: # src/plugins/select/cons_tres/select_cons_tres.c
-
Danny Auble authored
Bug 8370
-
Danny Auble authored
# Conflicts: # src/plugins/select/cons_tres/job_test.c
-
Felip Moll authored
Continuation of commit b00ce193, fixes declaration of opt variable in scancel. Bug 8370
-
Felip Moll authored
GCC 10 defaults to -fno-common which implies we cannot declare static global variables in .h files. The patch here fixes the building issues for the current version removing an unnecessary block in cons_tres code. Bug 8370
-
Albert Gil authored
-
Albert Gil authored
Previously if a job was ended for example with OUT_OF_MEMORY and we call wait_for_job DONE, the function will keep pulling until max_job_state_delay. Now all DONE state are checked, in order. NOT_FOUND is not a real state, but manually created in the function. Bug 8837 Signed-off-by:
Michael Hinton <hinton@schedmd.com>
-
Albert Gil authored
Test 1.35 was getting OOM in the steps. The patch also specifies the memory unit to avoid problems if user "default_gbytes". Bug 8837 Signed-off-by:
Michael Hinton <hinton@schedmd.com>
-
Albert Gil authored
-
Scott Jackson authored
We need to ensure that memset is not optimized by removing the -O at compile time, and touching some of the memory, plus marking it volatile. Bug 8755
-
- Apr 15, 2020
-
-
Brian Christiansen authored
to be reverted in 20.11 when network is no longer overloaded with a new option. Continuation of 6ee50489 Bug 7039 Signed-off-by:
Danny Auble <da@schedmd.com>
-
Brian Christiansen authored
Continuation of 6ee50489 Bug 7039 Signed-off-by:
Danny Auble <da@schedmd.com>
-
Colby Ashley authored
Bug 8734
-
Albert Gil authored
-
Nate Rini authored
Bug 7265
-
Felip Moll authored
Add XCC plugin existence in acct_gather man page and html, and in slurm.conf Bug 8484
-
- Apr 14, 2020
-
-
Dominik Bartkiewicz authored
e.g. if job_hash[JOB_HASH_INX()] contains jobs but not the job we are looking to remove. Bug 8861
-
Albert Gil authored
The documentation of the preemption parameters tries to be as close as possible to their documentation in slurm.conf. Bug 8710
-
Albert Gil authored
There is no GraceTime related to an assoc, it was always showing 0 using the default print_routine with NULL value. Bug 8710
-
Albert Gil authored
Improves and fixes the documentation about Preemption and Gang, specially for preempt/qos and Suspend on both slurm.conf and sacctmgr man pages. Bug 8710
-