- May 05, 2014
-
-
Danny Auble authored
-
Danny Auble authored
-
Danny Auble authored
Related to bug 771
-
Danny Auble authored
cnode counts.
-
- May 02, 2014
-
-
Danny Auble authored
-
Danny Auble authored
-
- Apr 30, 2014
-
-
Morris Jette authored
Switch/nrt - Properly track usage of CAU and RDMA resources with multiple tasks per compute node. Previous logic would allocate resources once per task and then deallocate once per node, leaking CMA and RDMA resources and preventing their use by future jobs.
-
- Apr 18, 2014
-
-
Morris Jette authored
On switch resource allocation failure, free partial allocation. Failure mode was CAU could be allocated on some nodes, but not others. The CAU allocated on nodes and switches up to the failure point were never released.
-
- Apr 08, 2014
-
-
Morris Jette authored
-
Morris Jette authored
Fix logic bugs for SchedulerParameters option of max_rpc_cnt. Scheduling would be delayed for job arrays and backfill scheduling would be disabled unless max_rpc_cnt > 0.
-
Danny Auble authored
-
Danny Auble authored
on Mixed state.
-
- Apr 07, 2014
-
-
Morris Jette authored
-
Danny Auble authored
in it. Signed-off-by:
Danny Auble <da@schedmd.com>
-
Danny Auble authored
-
- Apr 05, 2014
-
-
Morris Jette authored
Disables job scheduling when there are too many pending RPCs
-
- Apr 04, 2014
-
-
Danny Auble authored
-
Danny Auble authored
This also reverts commit 8cff3b08 and ced2fa3f
-
Danny Auble authored
-
- Apr 03, 2014
-
-
Danny Auble authored
new associations were added since it was started.
-
Morris Jette authored
Permit multiple batch job submissions to be made for each run of the scheduler logic if the job submissions occur at the nearly same time. bug 616
-
- Apr 02, 2014
-
-
Morris Jette authored
if an job step's network value is set by poe, either by directly executing poe or srun launching poe, that value was not being propagated to the job step creation RPC and the network was not being set up for the proper protocol (e.g. mpi, lapi, pami, etc.). The previous logic would only work if the srun execute line explicitly set the protocol using the --network option.
-
- Mar 31, 2014
-
-
Marcin Stolarek authored
Prevent preemption of jobs in partition where PreemptMode=off
-
- Mar 26, 2014
-
-
David Bigagli authored
processes.
-
- Mar 25, 2014
-
-
Danny Auble authored
-
- Mar 24, 2014
-
-
Morris Jette authored
When slurmctld restarted, it would not recover dependencies on job array elements and would just discard the depenency. This corrects the parsing problem to recover the dependency. The old code would print a mesage like this and discard it: slurmctld: error: Invalid dependencies discarded for job 51: afterany:47_*
-
- Mar 21, 2014
-
-
Danny Auble authored
be setup for 1 node jobs. Here are some of the reasons from IBM... 1. PE expects it. 2. For failover, if there was some challenge or difficulty with the shared-memory method of data transfer, the protocol stack might want to go through the adapter instead. 3. For flexibility, the protocol stack might want to be able to transfer data using some variable combination of shared memory and adapter-based communication, and 4. Possibly most important, for overall performance, it might be that bandwidth or efficiency (BW per CPU cycles) might be better using the adapter resources. (An obvious case is for large messages, it might require a lot fewer CPU cycles to program the DMA engines on the adapter to move data between tasks, rather than depend on the CPU to move the data with loads and stores, or page re-mapping -- and a DMA engine might actually move the data more quickly, if it's well integrated with the memory system, as it is in the P775 case.)
-
- Mar 20, 2014
-
-
Danny Auble authored
than you really have.
-
Danny Auble authored
doesn't get chopped off.
-
- Mar 19, 2014
-
-
David Bigagli authored
-
Gennaro Oliva authored
a minus sign for options was intended.
-
- Mar 18, 2014
-
-
Danny Auble authored
-
Danny Auble authored
Some of these were resulting in the state of a job not being updated correctly to tools like sview.
-
Danny Auble authored
in waiting reason ReqNodeNotAvail.
-
- Mar 17, 2014
-
-
Danny Auble authored
-
- Mar 15, 2014
-
-
Morris Jette authored
Add support for job array options in the qsub command, in #PBS options for sbatch scripts and set the appropriate environment variables in the spank_pbs plugin (PBS_ARRAY_ID and PBS_ARRAY_INDEX). Note that Torque uses the "-t" option and PBS Pro uses the "-J" option.
-
- Mar 14, 2014
-
-
Danny Auble authored
-
Danny Auble authored
slurm.conf. Rebooting daemons after adding nodes to the slurm.conf is highly recommended.
-
- Mar 11, 2014
-
-
Danny Auble authored
-
Danny Auble authored
-