Skip to content
Snippets Groups Projects
  1. Mar 17, 2016
    • Tim Wickberg's avatar
      Prevent uid update from corrupting assoc_hash table. · 60b58b70
      Tim Wickberg authored
      The uid is used as part of the hash function, must remove old reference
      and recalculate if it may change, otherwise _delete_assoc_hash
      will not find it again when the association is removed, causing
      slurmctld to segfault.
      
      Bug 2560.
      60b58b70
  2. Mar 11, 2016
  3. Feb 24, 2016
  4. Jan 14, 2016
  5. Jan 07, 2016
  6. Jan 05, 2016
  7. Dec 31, 2015
  8. Dec 15, 2015
  9. Nov 25, 2015
  10. Nov 16, 2015
  11. Nov 13, 2015
  12. Nov 04, 2015
  13. Oct 22, 2015
  14. Oct 19, 2015
  15. Oct 09, 2015
  16. Oct 07, 2015
  17. Oct 06, 2015
  18. Oct 05, 2015
  19. Oct 03, 2015
  20. Oct 02, 2015
    • Morris Jette's avatar
      Don't mark powered down node as not responding · 8c03a8bc
      Morris Jette authored
      This will only happen if a PING RPC for the node is already queued
        when the decision is made to power it down, then fails to get
        a response for the ping (since the node is already down).
      bug 1995
      8c03a8bc
  21. Sep 30, 2015
    • Morris Jette's avatar
      Reset job CPU count if CPUs/task ratio increased for mem limit · 836912bf
      Morris Jette authored
      If a job's CPUs/task ratio is increased due to configured MaxMemPerCPU,
      then increase it's allocated CPU count in order to enforce CPU limits.
      Previous logic would increase/set the cpus_per_task as needed if a
      job's --mem-per-cpu was above the configured MaxMemPerCPU, but NOT
      increase the min_cpus or max_cpus varilable. This resulted in allocating
      the wrong CPU count.
      836912bf
    • Brian Christiansen's avatar
      Enable srun -I to use pending step logic. · 0bf0e71f
      Brian Christiansen authored
      Continuation of 1252d1a1
      Bug 1938
      0bf0e71f
    • Morris Jette's avatar
      Don't start duplicate batch job · c1513956
      Morris Jette authored
      Requeue/hold batch job launch request if job already running. This is
        possible if node went to DOWN state, but jobs remained active.
      In addition, if a prolog/epilog failed DRAIN the node rather than
        setting it down, which could kill jobs that could continue to
        run.
      bug 1985
      c1513956
  22. Sep 29, 2015
  23. Sep 28, 2015
    • Morris Jette's avatar
      Fix for node state when shrinking jobs · 6c9d4540
      Morris Jette authored
      When nodes have been allocated to a job and then released by the
        job while resizing, this patch prevents the nodes from continuing
        to appear allocated and unavailable to other jobs. Requires
        exclusive node allocation to trigger. This prevents the previously
        reported failure, but a proper fix will be quite complex and
        delayed to the next major release of Slurm (v 16.05).
      bug 1851
      6c9d4540
  24. Sep 23, 2015
  25. Sep 22, 2015
  26. Sep 21, 2015
Loading