- Jun 01, 2017
-
-
Danny Auble authored
# Conflicts: # src/slurmctld/job_mgr.c
-
Pablo Escobar authored
bug 3846
-
Danny Auble authored
-
Danny Auble authored
purge_files_list.
-
Danny Auble authored
-
Tim Wickberg authored
File deletion can be slow, especially when StateSaveLocation in on NFS or other network filesystems. Since purge_old_job() holds all the slurmctld write locks, this is especially performance sensitive. Moving this to an independent thread lets the slower filesystem cleanup happen without owning these locks. purge_old_job() then results in the purged job ids being queued in the purge_list. A race with the job id potentially wrapping around again is already prevented by _dup_job_file_test() in get_next_job_id(). Bug 3763.
-
Tim Wickberg authored
Only called from _list_delete_job once the MinJobAge has passed.
-
Tim Wickberg authored
This will need to be handled differently. The timeout can lead to the purge process falling further and further behind on high throughput systems if the number of job scripts that can be deleted within a second is lower than the job submission and completion rate of the cluster, eventually leading to the MaxJobCount limit being reached. Bug 3763.
-
Danny Auble authored
-
Danny Auble authored
-
- May 31, 2017
-
-
Danny Auble authored
it works better on multi-slurmd installs.
-
Isaac Hartung authored
Should be fed1,2,3 and not fed1,2,2
-
Isaac Hartung authored
Bug 3839
-
Tim Wickberg authored
Revert some of my b50f4661. Elaborate on tradeoffs, and point to HTC page as well which is a better location for this info.
-
Danny Auble authored
-
Brian Christiansen authored
-
Tim Wickberg authored
This is better discussed in the high_throughput.shtml doc. Also, "Contrain" is misspelled adding to the confusion.
-
Isaac Hartung authored
To submit sibling jobs to clusters that don't have the specified features. Bug 3859
-
Isaac Hartung authored
-
Brian Christiansen authored
-
Isaac Hartung authored
Bug 3640
-
Brian Christiansen authored
-
Brian Christiansen authored
Instead of waiting for all jobs to clear out of the system, wait until all jobs are completed. This helps so that you don't have to wait as long for the cluster to be drain and or removed.
-
Brian Christiansen authored
Clusters in the federation could be different rpc_versions so each cluster needs to talk each other's language.
-
Brian Christiansen authored
Routes request to origin cluster if it isn't running the job.
-
Brian Christiansen authored
to give more flexibility.
-
Brian Christiansen authored
if a federated job.
-
Brian Christiansen authored
-
Brian Christiansen authored
Move from slurm_send_recv_controller_rc_msg to slurm_send_recv_controller_msg. This allows scontrol requeue to be rerouted to the origin.
-
Brian Christiansen authored
-
Brian Christiansen authored
-
Tim Wickberg authored
-
Tim Wickberg authored
-
Tim Shaw authored
Bug 3840.
-
Tim Shaw authored
-
Tim Wickberg authored
Another issue inadvertently caused by 8d10ebf5.
-
Tim Wickberg authored
Fixes build for elasticsearch plugin, inadvertently broken by commit 8d10ebf5.
-
- May 30, 2017
-
-
Morris Jette authored
-
Tim Shaw authored
node_featurs/knl_cray plugin: Don't clear configured GRES from non-KNL node. bug 3768
-
Morris Jette authored
-