Commits · 922df01a04474847a1be8c3c29698764cfccca7f · tud-zih-energy / Slurm

Jul 13, 2012
- switch/nrt: release resources from failed allocate · 922df01a
  Morris Jette authored 12 years ago
  
  922df01a
- switch/nrt: Improve logging · bb5d2e81
  Morris Jette authored 12 years ago
  
  bb5d2e81
- switch/nrt: merge CAU and RDMA de/alloc to same functions · 9b2df990
  Morris Jette authored 12 years ago
  
  9b2df990
- Merge branch 'slurm-2.4' · d4281c6a
  Morris Jette authored 12 years ago
  
  Conflicts: doc/html/high_throughput.shtml
  d4281c6a
- Update to high throughput computing web page with more option descriptions · 46a3767e
  Morris Jette authored 12 years ago
  
  46a3767e
- POE web page: describe new Power7 options · 1e563165
  Morris Jette authored 12 years ago
  
  1e563165
- swtich/nrt: support dynamic changes to adapter port_id, network_id, cau values etc. · eca2dcd7
  Morris Jette authored 12 years ago
  
  Without this change the values known to slurmctld do not change without a cold-start.
  eca2dcd7
Jul 12, 2012
- move an info message to be debug · 2d3c09ae
  Danny Auble authored 12 years ago
  
  2d3c09ae
- BGQ - add correct locking to ensure protected structures · 7432016e
  Danny Auble authored 12 years ago
  
  7432016e
- switch/nrt: Track actual CAU resource allocations · 9a60113a
  Morris Jette authored 12 years ago
  
  9a60113a
- BGQ - add creation of bitmap if it does not already exist · 8640832b
  Danny Auble authored 12 years ago
  
  8640832b
- BGQ - Make it possible for a multi midplane allocation to run on more · 010570f4
  Danny Auble authored 12 years ago
  
  than 1 midplane but not the entire allocation.
  010570f4
- BGQ - correct logic to place multiple (< 1 midplane) steps inside a · 5ed86088
  Danny Auble authored 12 years ago
  
  multi midplane block allocation.
  5ed86088
- switch/nrt: add tracking of immed_send_slots · e8964696
  Morris Jette authored 12 years ago
  
  e8964696
- switch/nrt: add cau_indexes & immed_slots to job record · 134a4aa9
  Morris Jette authored 12 years ago
  
  134a4aa9
- switch/nrt: add adapter cau & immed_send counts to adapter data structures · c5704adc
  Morris Jette authored 12 years ago
  
  c5704adc
- Merge remote-tracking branch 'origin/slurm-2.4' · 366f357d
  Danny Auble authored 12 years ago
  
  366f357d
- BGQ - correctly remove running jobs when freeing a shared block. · a1f9b6a7
  Danny Auble authored 12 years ago
  
  a1f9b6a7
- update slurm spec file to correctly build on a cray · 5fa2a17d
  Danny Auble authored 12 years ago
  
  5fa2a17d
- BLUEGENE - better debug messages · eeb31e78
  Danny Auble authored 12 years ago
  
  eeb31e78
- BLUEGENE - Handle job completion correctly if an admin removes a block · 5430c095
  Danny Auble authored 12 years ago
  
  where other blocks on an overlapping midplane are running jobs.
  5430c095
- switch/nrt: Add some hooks for CAU and Immediate_slots support · a61cc1b9
  Morris Jette authored 12 years ago
  
  a61cc1b9
- Merge branch 'slurm-2.4' · 8f525f29
  Morris Jette authored 12 years ago
  
  8f525f29
- Minor format change to sbatch man page · aedc5be9
  Morris Jette authored 12 years ago
  
  aedc5be9
- Fix for bad merge from v2.4, change in node data structure port field from number to string · 38aa3708
  Morris Jette authored 12 years ago
  
  38aa3708
- switch/nrt: Fix mem leaks and don't allocate tables for IP_ONLY devices · 85f23211
  Morris Jette authored 12 years ago
  
  85f23211
- switch/nrt: treat --network=ip the same as --network=ipv4 · c9506dec
  Morris Jette authored 12 years ago
  
  c9506dec
Jul 11, 2012
- Merge remote-tracking branch 'origin/slurm-2.4' · 7d125ca4
  Danny Auble authored 12 years ago
  
  7d125ca4
- BLUEGENE - If a large block (> 1 midplane) is in error and underlying · 0c371d36
  Danny Auble authored 12 years ago
  
  hardware is marked bad remove the larger block and create a block over just the bad hardware making the other hardware available to run on.
  0c371d36
- switch/nrt: Support wider range of NRT versions · 3797d6e7
  Morris Jette authored 12 years ago
  
  3797d6e7
- BGQ - make sure we have a valid block when creating or finishing a step · 4731a11b
  Danny Auble authored 12 years ago
  
  allocation.
  4731a11b
- BLUEGENE - Sanity check just to make sure BLOCK_MAGIC is correct · 74b70963
  Danny Auble authored 12 years ago
  
  74b70963
- BLUEGENE - remove race condition where if a block is removed while waiting · 11e2759f
  Danny Auble authored 12 years ago
  
  for a job to finish on it the number of unused cpus wasn't updated correctly.
  11e2759f
- launch/poe: document poe configuration file · 087862f2
  Morris Jette authored 12 years ago
  
  087862f2
- switch/nrt: Note known bug in NRT API · 447ecb7e
  Morris Jette authored 12 years ago
  
  447ecb7e
- switch/nrt: fix logic for sn_single if multple adapters of same · cdb3973c
  Morris Jette authored 12 years ago
  
  same type and network ID. Add logic to match adapter name also. This is needed due to the additional IP_ONLY adapter named virbr0 as used for virtualization.
  cdb3973c
- switch/nrt: fix several problems related to new CentOS and PE software · ffa3d713
  Morris Jette authored 12 years ago
  
  ffa3d713
- switch/nrt Changes for latest CentOS · 9938facd
  Morris Jette authored 12 years ago
  
  9938facd
Jul 10, 2012

NRT - Add information about if PMD is calling the libpermapi plugin or not · ccd24a9e
Danny Auble authored 12 years ago

ccd24a9e

Correct job node_cnt value for job completion plugin · 97ce2e19

Morris Jette authored 12 years ago

When using the jobcomp/script interface, we have noticed the NODECNT
environment variable is off-by-one when logging completed jobs in
the NODE_FAIL state (though the NODELIST is correct).

This appears to be because in many places in job_completion_logger()
is called after deallocate_nodes(), which appears to decrement
job->node_cnt for DOWN nodes.

If job_completion_logger() only called the job completion plugin,
then I would guess that it might be safe to move this call ahead
of deallocate_nodes(). However, it seems like job_completion_logger()
also does a bunch of accounting stuff (?), so perhaps that would
need to be split out first?

Also, there is the possibility that this is working as designed,
though if so a well placed comment in the code might be appreciated.
If the decreased nodecount is intended, though, should the DOWN
nodes also be removed from the job's NODELIST? - Mark Grondona

97ce2e19