- Mar 12, 2012
-
-
Danny Auble authored
Conflicts: src/plugins/select/bluegene/bg_dynamic_block.c
-
Danny Auble authored
the queue when trying to place a larger than midplane job.
-
Morris Jette authored
-
- Mar 11, 2012
-
-
Danny Auble authored
-
- Mar 09, 2012
-
-
Danny Auble authored
-
Danny Auble authored
being honored for QOS.
-
Danny Auble authored
-
Danny Auble authored
-
Danny Auble authored
-
- Mar 08, 2012
-
-
Danny Auble authored
midplane added. If we don't do that then we are hosed. So we just always add it first to avoid issues.
-
Danny Auble authored
-
Danny Auble authored
always right and much easier to come by.
-
Danny Auble authored
people are polling the system at the exact same time.
-
Danny Auble authored
-
Danny Auble authored
that just would be silly.
-
Danny Auble authored
-
Morris Jette authored
-
Morris Jette authored
-
Morris Jette authored
-
- Mar 07, 2012
-
-
Morris Jette authored
-
Morris Jette authored
-
Danny Auble authored
-
Danny Auble authored
an admin updates the node to idle/resume the compute nodes will go instantly to idle instead of idle* which means no response.
-
- Mar 06, 2012
-
-
Danny Auble authored
-
Danny Auble authored
gone. Previously it had a timelimit which has proven to not be the right thing.
-
Danny Auble authored
-
Danny Auble authored
-
Danny Auble authored
-
Danny Auble authored
-
Danny Auble authored
-
Morris Jette authored
-
Morris Jette authored
-
- Mar 02, 2012
-
-
Morris Jette authored
In SLURM verstion 2.4, we now schedule jobs at priority=1 and no longer treat it as a special case.
-
Morris Jette authored
-
Morris Jette authored
-
Morris Jette authored
In cray/srun wrapper, only include aprun "-q" option when srun "--quiet" option is used.
-
Morris Jette authored
-
Morris Jette authored
-
Morris Jette authored
Here's what seems to have happened: - A job was pending, waiting for resources. - slurm.conf was changed to remove some nodes, and a scontrol reconfigure was done. - As a result of the reconfigure, the pending job became non-runnable, due to "Requested node configuration is not available". The scheduler set the job state to JOB_FAILED and called delete_job_details. - scontrol reconfigure was done again. - read_slurm_conf called _restore_job_dependencies. - _restore_job_dependencies called build_feature_list for each job in the job list - When build_feature_list tried to reference the now deleted job details for the failed job, it got a segmentation fault. The problem was reported by a customer on Slurm 2.2.7. I have not been able to reproduce it on 2.4.0-pre3, although the relevant code looks the same. There may be a timing window. The attached patch attempts to fix the problem by adding a check to _restore_job_dependencies. If the job state is JOB_FAILED, the job is skipped. Regards, Martin This is an alternative solutionh to bug316980fix.patch
-
- Mar 01, 2012
-
-
Morris Jette authored
-