- May 06, 2014
-
-
Danny Auble authored
-
- May 05, 2014
-
-
Danny Auble authored
-
Danny Auble authored
-
Danny Auble authored
-
Danny Auble authored
Related to bug 771
-
Danny Auble authored
cnode counts.
-
- May 02, 2014
-
-
Danny Auble authored
-
Danny Auble authored
-
- Apr 30, 2014
-
-
Morris Jette authored
Switch/nrt - Properly track usage of CAU and RDMA resources with multiple tasks per compute node. Previous logic would allocate resources once per task and then deallocate once per node, leaking CMA and RDMA resources and preventing their use by future jobs.
-
- Apr 18, 2014
-
-
Morris Jette authored
On switch resource allocation failure, free partial allocation. Failure mode was CAU could be allocated on some nodes, but not others. The CAU allocated on nodes and switches up to the failure point were never released.
-
- Apr 08, 2014
-
-
Morris Jette authored
-
Morris Jette authored
-
Morris Jette authored
Fix logic bugs for SchedulerParameters option of max_rpc_cnt. Scheduling would be delayed for job arrays and backfill scheduling would be disabled unless max_rpc_cnt > 0.
-
Danny Auble authored
-
Danny Auble authored
on Mixed state.
-
- Apr 07, 2014
-
-
Morris Jette authored
This largely reverts commit 0ec2af27 just to cut down on some logging
-
Morris Jette authored
-
Morris Jette authored
-
Morris Jette authored
-
Danny Auble authored
in it. Signed-off-by:
Danny Auble <da@schedmd.com>
-
Danny Auble authored
-
Danny Auble authored
-
- Apr 05, 2014
-
-
Morris Jette authored
Rather than treat invalid SchedulerParameters options as a fatal error, print an error and use to the default value.
-
Morris Jette authored
Disables job scheduling when there are too many pending RPCs
-
Morris Jette authored
-
Morris Jette authored
This is related to defering batch job scheduling if there are a bunch of requests pending
-
Morris Jette authored
rather than 1 sec interval retries
-
Morris Jette authored
If pthread_create call fails, decrease sleep before retry from 1 sec to 0.1 sec
-
- Apr 04, 2014
-
-
Danny Auble authored
-
Danny Auble authored
-
Danny Auble authored
This also reverts commit 8cff3b08 and ced2fa3f
-
Danny Auble authored
-
Danny Auble authored
slurmdbd plugin.
-
Danny Auble authored
-
Danny Auble authored
9368ff2d
-
Danny Auble authored
-
- Apr 03, 2014
-
-
Danny Auble authored
new associations were added since it was started.
-
Morris Jette authored
Permit multiple batch job submissions to be made for each run of the scheduler logic if the job submissions occur at the nearly same time. bug 616
-
- Apr 02, 2014
-
-
Morris Jette authored
Decrease maximimum scheduler main loop run time from 10 secs to 4 secs for improved performance. If running with sched/backfill, do not run through all jobs on periodic scheduling loop, but only the default depth. The backfill scheduler can go through more jobs anyway due to its ability to relinquish and recover locks. See bug 616
-
Morris Jette authored
if an job step's network value is set by poe, either by directly executing poe or srun launching poe, that value was not being propagated to the job step creation RPC and the network was not being set up for the proper protocol (e.g. mpi, lapi, pami, etc.). The previous logic would only work if the srun execute line explicitly set the protocol using the --network option.
-