Skip to content
Snippets Groups Projects
  1. Mar 09, 2016
    • Morris Jette's avatar
      cray job requeue bug · fec5e03b
      Morris Jette authored
      Fix Cray NHC spawning on job requeue. Previous logic would leave nodes
      allocated to a requeued job as non-usable on job termination.
      
      Specifically, each job has a "cleaning/cleaned" flag. Once a job
      terminates, the cleaning flag is set, then after the job node health
      check completes, the value gets set to cleaned. If the job is requeued,
      on its second (or subsequent) termination, the select/cray plugin
      is called to launch the NHC. The plugin sees the "cleaned" flag
      already set, it then logs:
      error: select_p_job_fini: Cleaned flag already set for job 1283858, this should never happen
      and returns, never launching the NHC. Since the termination of the
      job NHC triggers releasing job resources (CPUs, memory, and GRES),
      those resources are never released for use by other jobs.
      
      Bug 2384
      fec5e03b
  2. Mar 08, 2016
  3. Mar 05, 2016
  4. Mar 04, 2016
  5. Mar 03, 2016
  6. Mar 02, 2016
  7. Mar 01, 2016
  8. Feb 26, 2016
  9. Feb 25, 2016
  10. Feb 24, 2016
  11. Feb 23, 2016
  12. Feb 19, 2016
Loading