Commits · c3736a1b2368b6e8bbf0d58886d0cee3009612d2 · tud-zih-energy / Slurm

May 05, 2014
- Fix perlapi to compile correctly with perl 5.18 · 21ebf585
  Danny Auble authored 10 years ago
  
  21ebf585
- Handle node ranges better when dealing with accounting max node limits. · d849aadb
  Danny Auble authored 10 years ago
  
  d849aadb
- BGQ - Move code to only start job on a block after limits are checked. · 3a4246cc
  Danny Auble authored 10 years ago
  
  Related to bug 771
  3a4246cc
- BGQ - Fix issue where limits were checked on midplane counts instead of · 836b654f
  Danny Auble authored 10 years ago
  
  cnode counts.
  836b654f
May 02, 2014
- BGQ - Temp fix issue where job could be left on job_list after it finished. · e4f1a099
  Danny Auble authored 10 years ago
  
  e4f1a099
- Fix issue where user is requesting --acctg-freq=0 and no memory limits. · 17e4e2ac
  Danny Auble authored 10 years ago
  
  17e4e2ac
Apr 30, 2014

switch/nrt - CAU and RMDA tracking correction · 6f66fdef

Morris Jette authored 10 years ago

Switch/nrt - Properly track usage of CAU and RDMA resources with multiple
tasks per compute node. Previous logic would allocate resources once per
task and then deallocate once per node, leaking CMA and RDMA resources
and preventing their use by future jobs.

6f66fdef

Apr 18, 2014

switch/nrt - free partial allocation · a197a1da

Morris Jette authored 10 years ago

On switch resource allocation failure, free partial allocation.
Failure mode was CAU could be allocated on some nodes, but not
others. The CAU allocated on nodes and switches up to the failure
point were never released.

a197a1da

Apr 08, 2014
- Start NEWS for v2.6.10 · 3114d035
  Morris Jette authored 10 years ago
  
  3114d035
- Fix logic bugs for max_rpc_cnt SchedulerParameters · 78f9b4cc
  Morris Jette authored 10 years ago
  
  Fix logic bugs for SchedulerParameters option of max_rpc_cnt. Scheduling would be delayed for job arrays and backfill scheduling would be disabled unless max_rpc_cnt > 0.
  78f9b4cc
- Fix sacctmgr update user with no "where" condition. · 7ad6df27
  Danny Auble authored 10 years ago
  
  7ad6df27
- Fix sinfo to work correctly with draining/mixed nodes as well as filtering · 2fb004cf
  Danny Auble authored 10 years ago
  
  on Mixed state.
  2fb004cf
Apr 07, 2014
- Start NEWS for v2.6.9 · a51f6fbf
  Morris Jette authored 10 years ago
  
  a51f6fbf
- BGQ - Fix sub block steps using a block when the block has passthrough's · 91c70cc9
  Danny Auble authored 10 years ago
  
  in it. Signed-off-by: Danny Auble <da@schedmd.com>
  91c70cc9
- BGQ - Fix deny_pass to work correctly. · bee1ec08
  Danny Auble authored 10 years ago
  
  bee1ec08
Apr 05, 2014
- added SchedulerParameters option of max_rpc_cnt · ab381fd3
  Morris Jette authored 10 years ago
  
  Disables job scheduling when there are too many pending RPCs
  ab381fd3
Apr 04, 2014
- MySQL - Fix it so a lock isn't held unnecessarily. · 9fe5c605
  Danny Auble authored 10 years ago
  
  9fe5c605
- Fix sinfo to work correctly with draining/mixed nodes. (copied sview code) · 1d3b553c
  Danny Auble authored 10 years ago
  
  This also reverts commit 8cff3b08 and ced2fa3f
  1d3b553c
- NEWS for the last 2 commits · ac4b337a
  Danny Auble authored 10 years ago
  
  ac4b337a
Apr 03, 2014
- Fix issue where associations weren't correct if backup takes control and · 9368ff2d
  Danny Auble authored 10 years ago
  
  new associations were added since it was started.
  9368ff2d
- Defer scheduling for many batch jobs · dd4aa1c3
  Morris Jette authored 10 years ago
  
  Permit multiple batch job submissions to be made for each run of the scheduler logic if the job submissions occur at the nearly same time. bug 616
  dd4aa1c3
Apr 02, 2014

launch/poe - fix network value · ad7100b8

Morris Jette authored 10 years ago

if an job step's network value is set by poe, either by directly
executing poe or srun launching poe, that value was not being
propagated to the job step creation RPC and the network was not
being set up for the proper protocol (e.g. mpi, lapi, pami, etc.).
The previous logic would only work if the srun execute line
explicitly set the protocol using the --network option.

ad7100b8

Mar 31, 2014
- prempt/partition_prio fix · a0ba1865
  Marcin Stolarek authored 10 years ago
  
  Prevent preemption of jobs in partition where PreemptMode=off
  a0ba1865
Mar 26, 2014
- Lock the /cgroup/freezer subsystem when creating files for tracking · bd05aaf2
  David Bigagli authored 10 years ago
  
  processes.
  bd05aaf2
Mar 25, 2014
- mysql - Fix invalid memory reference. · 00cabba3
  Danny Auble authored 11 years ago
  
  00cabba3
Mar 24, 2014

job array dependency recovery fix · fca71890

Morris Jette authored 11 years ago

When slurmctld restarted, it would not recover dependencies on
job array elements and would just discard the depenency. This
corrects the parsing problem to recover the dependency. The old code
would print a mesage like this and discard it:
slurmctld: error: Invalid dependencies discarded for job 51: afterany:47_*

fca71890

Mar 21, 2014

NRT - Fix issue with 1 node jobs. It turns out the network does need to · 440932df

Danny Auble authored 11 years ago

be setup for 1 node jobs. Here are some of the reasons from IBM...

1. PE expects it.
2. For failover, if there was some challenge or difficulty with the
shared-memory method of data transfer, the protocol stack might
want to go through the adapter instead.
3. For flexibility, the protocol stack might want to be able to transfer
data using some variable combination of shared memory and adapter-based
communication, and
4. Possibly most important, for overall performance, it might be that
bandwidth or efficiency (BW per CPU cycles) might be better using the
adapter resources. (An obvious case is for large messages, it might
require a lot fewer CPU cycles to program the DMA engines on the
adapter to move data between tasks, rather than depend on the CPU
to move the data with loads and stores, or page re-mapping -- and
a DMA engine might actually move the data more quickly, if it's well
integrated with the memory system, as it is in the P775 case.)

440932df

Mar 20, 2014
- task/affinity - Protect against zero divide when simulating more hardware · 92b4de3c
  Danny Auble authored 11 years ago
  
  than you really have.
  92b4de3c
- sinfo - Make sure if partition name is long and the default the last char · c4bd5ba8
  Danny Auble authored 11 years ago
  
  doesn't get chopped off.
  c4bd5ba8
Mar 19, 2014
- Move the comment from 2.6.7 to 2.6.8 · 9950679b
  David Bigagli authored 11 years ago
  
  9950679b
- Fixed sacct.1 and srun.1 manual pages which contains a hyphen where · e1c8e670
  Gennaro Oliva authored 11 years ago
  
  a minus sign for options was intended.
  e1c8e670
Mar 18, 2014
- Free job_ptr->state_desc where ever state_reason is set. · c2ae6cfc
  Danny Auble authored 11 years ago
  
  c2ae6cfc
- Update last_job_update when a job's state_reason was modified. · b0cc7126
  Danny Auble authored 11 years ago
  
  Some of these were resulting in the state of a job not being updated correctly to tools like sview.
  b0cc7126
- Fix issue where jobs still pending after a reservation would remain · 77555c30
  Danny Auble authored 11 years ago
  
  in waiting reason ReqNodeNotAvail.
  77555c30
Mar 17, 2014
- CRAY/ALPS - Add support for CLE52 · a45170c2
  Danny Auble authored 11 years ago
  
  a45170c2
Mar 15, 2014

Add support for Torque/PBS job arrays · 11968284

Morris Jette authored 11 years ago

Add support for job array options in the qsub command, in #PBS
options for sbatch scripts and set the appropriate environment
variables in the spank_pbs plugin (PBS_ARRAY_ID and PBS_ARRAY_INDEX).
Note that Torque uses the "-t" option and PBS Pro uses the "-J"
option.

11968284

Mar 14, 2014
- update for potential next 2.6 · ea90d9d4
  Danny Auble authored 11 years ago
  
  ea90d9d4
- Fix a couple of issues with scontrol reconfig and adding nodes to · d03c9300
  Danny Auble authored 11 years ago
  
  slurm.conf. Rebooting daemons after adding nodes to the slurm.conf is highly recommended.
  d03c9300
Mar 11, 2014
- Add missing options to the print of TaskPluginParam. · 51acd3a5
  Danny Auble authored 11 years ago
  
  51acd3a5
- Change SLURM -> Slurm · 85336299
  Danny Auble authored 11 years ago
  
  85336299