- Sep 06, 2017
-
-
Tim Wickberg authored
This creates an empty directory, since we no longer bother copying the job script or environment as of 28b7f853.
-
- Sep 04, 2017
-
-
Alejandro Sanchez authored
** CID 175193: (FORWARD_NULL) Theoretically we shouldn't have a job_desc_msg_t without an associated part_record, but just in case let's harden the code. Introduced in previous commit: 24365514.
-
Alejandro Sanchez authored
Initially job mem limits were tested at submission time through _validate_min_mem_partition() -> _valid_pn_min_mem(), but not tested again at scheduling time, thus leading to jobs incorrectly being scheduled against partitions where the job exceeded their MaxMemPer* limit (which can in turn be inherited from the system wide limit too). NOTE: New WAIT_PN_MEM_LIMIT job_state_reason enum component added to support this new waiting reason. Bug 2291.
-
- Sep 01, 2017
-
-
Danny Auble authored
checked on submit. This only mattered when submitting a job to multiple partitions. Bug 4066
-
Danny Auble authored
on node 0. Bug 4035
-
Tim Wickberg authored
Add the xassert() that the inline version has to _remove_job_hash().
-
- Aug 22, 2017
-
-
Danny Auble authored
-
- Aug 15, 2017
-
-
Morris Jette authored
Coverity CID 174399
-
Morris Jette authored
This insures that the batch script can have appropriate environment variables set for all components (i.e. the node list, cpu count, etc. for all pack groups).
-
- Aug 14, 2017
-
-
Morris Jette authored
When the "scontrol update jobid=#" command specifies a pack job leader, then modify all components of the pack job. To modiify only the pack job leader, specify "scontrol update jobid=#+0".
-
- Aug 12, 2017
-
-
Morris Jette authored
Modify scontrol job hold/release and update to operate with heterogeneous job id specification (e.g. "scontrol hold 123+4").
-
- Aug 11, 2017
-
-
Morris Jette authored
-
Morris Jette authored
Doing so would break the current scheduling logic.
-
- Aug 09, 2017
-
-
Morris Jette authored
This situation would be caused by an invalid user request, not a slurmctld error
-
Morris Jette authored
-
Morris Jette authored
In all of these cases, the input account name is NULL, so there should never be a failure. In every case, the returned association pointer is checked anyway. Coverity CID 44719, 44720, 44721
-
Morris Jette authored
Coverity, CID 45150
-
- Aug 02, 2017
-
-
Marshall Garey authored
srun jobs that could start immediately and requested multiple partitions didn't run in the highest priority partition if the highest priority partition wasn't listed first. It's possible that the scontrol show jobs will show the partition list in priority order now that the job's partition list gets sorted by priority. Bug 4015
-
- Jul 28, 2017
-
-
Morris Jette authored
If a pack job is only partitially allocated resources (likely due due to limits), deallocate resources from those components which have been started and requeue them.
-
- Jul 27, 2017
-
-
Morris Jette authored
This change adds a new function and moves some logic around so that limits can be tested on a pack job as a whole (that logic still needs to be developed).
-
- Jul 25, 2017
-
-
Morris Jette authored
Don't requeue a batch pack job component that is not found node zero of the allocation. Only the first pack job component is expected to have a running script.
-
Danny Auble authored
-
Morris Jette authored
-
- Jul 24, 2017
-
-
Dominik Bartkiewicz authored
Bug 3953
-
- Jul 19, 2017
-
-
Morris Jette authored
This removes several define statements with different names in various functions
-
- Jul 05, 2017
-
-
Brian Christiansen authored
It wasn't doing it for origin jobs.
-
Brian Christiansen authored
Previously remote jobs would be removed from the job_list as quickly as possible to prevent collisions with requeued jobs and to clear up the jobs and the orign job would stay around till minjobage on the origin. But the origin job didn't have the details from the job that ran on a remote cluster. Now just don't show revoked jobs. The origin tracking job will remain as revoked and not shown and the remote job will hang around for display till minjobage. scontrol show jobs will show the job from the cluster that ran the job. The job is requeuable as long as the origin job is still in the origin cluster's job_list.
-
Brian Christiansen authored
Just check for the revoked state instead of checking if it's a tracker job since an origin job will be revoked if it can't run on the origin or if it's running on a remote cluster.
-
- Jun 27, 2017
-
-
Tim Wickberg authored
No longer require part write lock.
-
Tim Wickberg authored
Replace with direct references to the struct.
-
- Jun 22, 2017
-
-
Brian Christiansen authored
This allows the origin to be able to sync up jobs after it has been down.
-
Brian Christiansen authored
-
Isaac Hartung authored
When a non-origin cluster is removed: - running jobs remain - fed_details removed so it can't call home. - origin cluster removes tracking job for running jobs - pending jobs are removed. - pending srun/sallocs don't get notified. - other clusters remove removed cluster from viable and active sibs When an origin cluster is removed: - all pending jobs are removed from all clusters that had job. - pending srun/sallocs are notified of termination - running jobs remain.
-
- Jun 21, 2017
-
-
Dominik Bartkiewicz authored
bug 3757
-
- Jun 20, 2017
-
-
Danny Auble authored
more than 1 partition or when the partition is changed with scontrol. Bug 3849
-
- Jun 19, 2017
-
-
Isaac Hartung authored
Continuation of b9719be2
-
Brian Christiansen authored
CID: 170772, 170773 Introduced by commit: 250378c2
-
- Jun 16, 2017
-
-
Tim Shaw authored
Bug 3502.
-
- Jun 13, 2017
-
-
Danny Auble authored
Bug 3888
-
- Jun 08, 2017
-
-
Dominik Bartkiewicz authored
Prevent segfault from pointer dereference to the QOS that is being deleted. Fix to commit 3e8aa451.
-