Commits · 1b0c4a33590c24a6509750f48b2108c0baea4fbe · tud-zih-energy / Slurm

Mar 02, 2014

Add SchedulerParameters option bf_max_job_start · 1b0c4a33

jette authored 11 years ago

Add support for SchedulerParameters value of bf_max_job_start that limits
the total number of jobs that can be started in a single iteration of the
backfill scheduler.
bug 607

1b0c4a33

Feb 27, 2014
- Add SLURM_STEP_ID to Prolog environment · 34da34fd
  Morris Jette authored 11 years ago
  
  bug 607
  34da34fd
- NRT - Fix to supply correct error messages to poe/pmd when a launch fails. · b0460a3d
  Danny Auble authored 11 years ago
  
  b0460a3d
- MYSQL - Fix when updating QOS on an association. · 324a089e
  Danny Auble authored 11 years ago
  
  324a089e
Feb 26, 2014
- MYSQL - Fixed memory leak when querying clusters · c63eb7e6
  Danny Auble authored 11 years ago
  
  c63eb7e6
- Fixed minor memory leak in backfill scheduler. · 911beb99
  Danny Auble authored 11 years ago
  
  911beb99
Feb 25, 2014
- BGQ - Fix minor memory leak when selecting blocks that can't immediately be · 1330681b
  Danny Auble authored 11 years ago
  
  placed.
  1330681b
- Update sacct man page description of job states. · 57b3f0c4
  David Bigagli authored 11 years ago
  
  57b3f0c4
Feb 21, 2014
- Fix minor memory leak when adding a reservation with a nodelist and core · 950231cc
  Danny Auble authored 11 years ago
  
  count.
  950231cc
- Fix minor memory leak when updating a reservation on a partition using "ALL" · d0ab7ba8
  Danny Auble authored 11 years ago
  
  nodes.
  d0ab7ba8
- Fix minor memory leak when updating a job's name. · 7211ce17
  Danny Auble authored 11 years ago
  
  7211ce17
Feb 20, 2014

Scheduling fix for jobs with specific nodes required · eafc0a4f

Morris Jette authored 11 years ago

If a job requires specific nodes and can not run due to those nodes being
busy, the main scheduling loop will block those specific nodes rather than
the entire queue/partition.
bug 595

eafc0a4f

Fix typo in NEWS · 5db06c04
Morris Jette authored 11 years ago

5db06c04

Feb 19, 2014
- Fix slurmctld core dump when a jobs gets its qos updated but there · 2e0f8062
  David Bigagli authored 11 years ago
  
  is not a corresponding association.
  2e0f8062
- Fix slurmctld core dump when a jobs gets its qos updated but there · 72a53d4f
  David Bigagli authored 11 years ago
  
  is not a corresponding association.
  72a53d4f
Feb 14, 2014
- Update srun.1 man page documenting the PMI2 support. · 2c83c36e
  David Bigagli authored 11 years ago
  
  2c83c36e
- Fix issue where if using munge and munge wasn't running and a slurmd · ddc0b5c3
  Danny Auble authored 11 years ago
  
  needed to forward a message the slurmd would core dump.
  ddc0b5c3
Feb 13, 2014
- Correct the slurm.conf man pages and checkpoint_blcr.html page · b5a79c9f
  David Bigagli authored 11 years ago
  
  describing that jobs must be drained from cluster before deploying any checkpoint plugin.
  b5a79c9f
Feb 12, 2014

enforce cpus-per-task with mem-per-cpu option · cf367bb0

Morris Jette authored 11 years ago

Properly enforce a job's cpus-per-task option when a job's allocation is
constrained on some nodes by the mem-per-cpu option.
bug 590

cf367bb0

Feb 10, 2014
- Updates for start of v2.6.7 work · 2c9b35c3
  Morris Jette authored 11 years ago
  
  2c9b35c3
Feb 09, 2014
- CRAY - fix memory leak when using accelerators · ac64f883
  Moe Jette authored 11 years ago
  
  ac64f883
Feb 08, 2014
- replace old commit note erroneously taken out · 099f00ab
  Danny Auble authored 11 years ago
  
  099f00ab
- CRAY - fix issue with using CR_ONE_TASK_PER_CORE · fbb37db1
  Danny Auble authored 11 years ago
  
  fbb37db1
Feb 07, 2014
- Properly enforce GrpSubmit limit for job arrays. · 9469053d
  Morris Jette authored 11 years ago
  
  bug 586
  9469053d
Feb 05, 2014
- take back 2.6.6 · 9f97c2e9
  Danny Auble authored 11 years ago
  
  9f97c2e9
- Added support for selecting AMD GP · f728ee8e
  Dominik Bartkiewicz authored 11 years ago
  
  Set GPU_DEVICE_ORDINAL environment variable.
  f728ee8e
- new news file · c59fa258
  Danny Auble authored 11 years ago
  
  c59fa258
Feb 04, 2014
- Fix to reserving all nodes in partition · c4c462a2
  Morris Jette authored 11 years ago
  
  Previous logic would try to pick a specific node count and on a heterogeneous system, this would cause a problem. This change largely reverts commit a270417b
  c4c462a2
- Retry task exit message from slurmstepd to srun on message timeout. · 2ccda7f2
  Danny Auble authored 11 years ago
  
  2ccda7f2
Feb 03, 2014
- Update documentation about QOS limits · f9cfa21a
  Danny Auble authored 11 years ago
  
  f9cfa21a
Jan 31, 2014

Removed obsolete slurm_terminate_job() API. · 31d409b7
David Bigagli authored 11 years ago

31d409b7

Make sure node limits get assessed if no node count was given in request. · 5b0f9c39

Danny Auble authored 11 years ago

i.e. salloc -n32 doesn't request the number of nodes and with the previous
code if this request used 4 nodes and only 1 was left in GrpNodes it
would just run with no issue since we were checking things before we
selected how many nodes it ran on.

Now we check this afterwards so we always check the limits on how many
nodes, cpus and how much memory is to be used.

5b0f9c39

Fix step allocation failure due to memory use · 8b76b93c

Morris Jette authored 11 years ago

Fix step allocation when some CPUs are not available due to memory limits.
This happens when one step is active and using memory that blocks the
scheduling of another step on a portion of the CPUs needed. The new step
is now delayed rather than aborting with "Requested node configuration is
not available".
bug 577

8b76b93c

Jan 28, 2014
- BLUEGENE - If IONodesPerMP changes in bluegene.conf recalculate bitmaps · ee3844aa
  Danny Auble authored 11 years ago
  
  based on ionode count correctly on slurmctld restart.
  ee3844aa
Jan 23, 2014
- MYSQL - If starting the plugin and the database isn't up attempt to · 9ef64da7
  Danny Auble authored 11 years ago
  
  connect in a loop instead of producing a fatal.
  9ef64da7
- Fix purging of old reservation errors in database. · 8961739e
  Danny Auble authored 11 years ago
  
  8961739e
Jan 21, 2014
- Don't allow PMI_TIME to be zero which will cause floating exception. · 9f19dc67
  David Bigagli authored 11 years ago
  
  9f19dc67
- Revert "Increase the PW_BUF_SIZE so the getgrnam_r() can process large user" · 570c36bb
  David Bigagli authored 11 years ago
  
  This reverts commit 2fa28eb6. Conflicts: NEWS
  570c36bb
Jan 18, 2014

Fix the acct_gather_filesystem_lustre.c to compute the Lustre accounting · ab375c5a

David Bigagli authored 11 years ago

data correctly accumulating differences between sampling intervals.
Fix the data structure mismatch between acct_gather_filesystem_lustre.c
and slurm_jobacct_gather.h which caused the hdf5 plugin to log incorrect
data.

ab375c5a

Jan 16, 2014
- Correct the documentation to read filesystem instead of Lustre. Update · a08d9ae3
  David Bigagli authored 11 years ago
  
  the srun help.
  a08d9ae3