- Apr 26, 2021
-
-
Ben Roberts authored
Bug 11318
-
- Apr 23, 2021
-
-
Brian Christiansen authored
This reverts commit 69d5d94b. submit_dir != working dir In 7092, the customer was seeing the interactive srun set SLURM_SUBMIT_DIR to the cwd in the interactive srun after -D had been applied -- which is wrong behavior. This will be fixed in 21.08. Bug 11407
-
Ben Roberts authored
Bug 11447
-
Marcin Stolarek authored
Bug 11396
-
- Apr 21, 2021
-
-
Ben Roberts authored
Bug 11363 Signed-off-by:
Tim Wickberg <tim@schedmd.com>
-
Danny Auble authored
Bug 11093
-
Carlos Tripiana Montes authored
It can happen that child proc fails at some point, and returns some exit code different from 0. But parent didn't check this properly, and finished returning w/o error. Now, if child fails, parent too. This spanws back the error and we know this job will have a problem. Bug 11093
-
Carlos Tripiana Montes authored
container_p_restore get now the list of jobs running from the spool dir with stepd_available. Then, it iterates over basepath entries and, for those which seems to have been a mount point (has .ns file), tries to mount it again. If it succeeds (it must), and if for this mount point the job is dead, it releases resources and tries to delete files. Remember the removal can fail if a resource is leaked. These would be fixed if slurmd starts after HW reboot (no kernel leaks). Bug 11093
-
Carlos Tripiana Montes authored
Then, make _rm_data call in _create_ns keep old behavior, as here it will only be removed if something fails in creation, and no previous NS leak is possible. So force removal, or fail. But for _delete_ns, it could be called at job's end after slurmd got killed and restarted, thus having leaked the NS. Even though, slurmd can recreate the NS and mount it in the same place, the .ns file can't be removed at the end of the job because EBUSY. Bug 11093
-
Carlos Tripiana Montes authored
In preparation for subsequent changes. No functional change. Bug 11093
-
Carlos Tripiana Montes authored
This flags checks if we are going to allow the NS creation over a previously used mount point. Right now, no change in behavior. Since the only call to _create_ns has remount to false, thus falling to the old behavior. In preparation for subsequent changes. Bug 11093
-
Carlos Tripiana Montes authored
In preparation for subsequent changes. No functional change. Bug 11093
-
Carlos Tripiana Montes authored
We shouldn't umount the NSs until the step FD is closed. In preparation for next commit as well. Bug 11093
-
Carlos Tripiana Montes authored
Now, if umount2 for basepath fails, it still frees everything up. In preparation for subsequent changes as well. Bug 11093
-
Danny Auble authored
Bug 11257
-
Marcin Stolarek authored
assoc_mgr_user_list should never be NULL here if we're enforcing associations, however, the issue of going into _get_assoc_mgr_user_list will only happen if locks were already acquired. This part will be revisited in master for code simplification. Bug 11257
-
Marcin Stolarek authored
After 6db0aca5 we acquire assoc_mgr_lock for QoS outside of the loop, in assoc_mgr_get_admin_level we need to acquire assoc_mgr_lock for user. This is technically OK, but we want to avoid nested use of assoc_mgr_lock, which is checked by _store_locks assertion. validate_operator is often called in loops while assoc_mgr_get_admin_level is a heavy operation. This commit changes the functions to lighter versions geting slurmdb_user_rec_t instead of uid. Bug 11257
-
Marcin Stolarek authored
Add versions of part_is_visible, validate_operator that get slurmdb_user_rec_t* instead of uid. In case of execution in a loop it's more profitable from performance stand point to get the slurmdb_user_rec once and pass it to functions just performing logical checks. Bug 11257
-
Albert Gil authored
-
Danny Auble authored
This problem has been around a while, but the compiler (gcc 10.2.0) just now noticed it. Bug 10407
-
- Apr 19, 2021
-
-
Marcin Stolarek authored
Bug 11334.
-
Albert Gil authored
Bug 11245
-
- Apr 17, 2021
-
-
Scott Hilton authored
Bug 11301
-
Marcin Stolarek authored
Fix regression from a2678ab5. The original intent of a2678ab5 was to set port in node structure from the SlurmdPort when nothing was given in NodeName line. Unfortunately it changed the behavior for multiple nodes defined in one line with only a single Port value, like: NodeName=test[01-02] Port=3022 will set test01 on 3022, but test02 on SlurmdPort. Bug 11384
-
- Apr 16, 2021
-
-
Kevin Buckley authored
It's short for '--error'. Bug 11393.
-
Ben Roberts authored
Also update quickstart to list all client commands Bug 10835
-
Albert Gil authored
-
Albert Gil authored
Bug 10439 Signed-off-by:
Scott Jackson <scottmo@schedmd.com>
-
Albert Gil authored
Also remove unnecessary user_name variable. Bug 10439 Signed-off-by:
Scott Jackson <scottmo@schedmd.com>
-
Scott Jackson authored
This avoids intermittent race condition failures and the time to run the test was significantly reduced. Bug 10439
-
Scott Jackson authored
Created wait_for_command that repeats a command until successful or until a specified condition is met. wait_for_command_match is now implemented in terms of the new wait_for_command. Tests using the former wait_for_command were refactored as necessary. Bug 10439
-
- Apr 15, 2021
-
-
Marcin Stolarek authored
Continued from 22b995c0 Bug 9405
-
Marshall Garey authored
Bug 11341
-
Marcin Stolarek authored
Instead of using validate_slurm_user() in auth/jwt implement the check directly. The function is an external in slurmctld, and calling it from slurmdbd segfaults due to the missing symbol. Bug 11350
-
Ben Roberts authored
Fix typo for 'priority' ConstrainRAMSpace can't be used to enforce swappiness Improve wording in ConstrainSwapSpace Bug 11307
-
- Apr 14, 2021
-
-
Scott Hilton authored
Bug 9363
-
Danny Auble authored
Coverity CID 220675
-
Ben Roberts authored
Bug 11262 Signed-off-by:
Tim Wickberg <tim@schedmd.com>
-
Ben Roberts authored
Continuation of commit ea667809 Bug 11262 Signed-off-by:
Tim Wickberg <tim@schedmd.com>
-
- Apr 13, 2021
-
-
Skyler Malinowski authored
UnkillableStepProgram exposes environment variables to the script. Those variables are now documented. Bug 11231
-