Skip to content
Snippets Groups Projects
  1. Mar 11, 2016
  2. Mar 10, 2016
  3. Mar 09, 2016
    • Morris Jette's avatar
      cray job requeue bug · fec5e03b
      Morris Jette authored
      Fix Cray NHC spawning on job requeue. Previous logic would leave nodes
      allocated to a requeued job as non-usable on job termination.
      
      Specifically, each job has a "cleaning/cleaned" flag. Once a job
      terminates, the cleaning flag is set, then after the job node health
      check completes, the value gets set to cleaned. If the job is requeued,
      on its second (or subsequent) termination, the select/cray plugin
      is called to launch the NHC. The plugin sees the "cleaned" flag
      already set, it then logs:
      error: select_p_job_fini: Cleaned flag already set for job 1283858, this should never happen
      and returns, never launching the NHC. Since the termination of the
      job NHC triggers releasing job resources (CPUs, memory, and GRES),
      those resources are never released for use by other jobs.
      
      Bug 2384
      fec5e03b
    • David Gloe's avatar
      Correctly parse nids in slurmconfgen_smw.py · 88ccc111
      David Gloe authored
      An error in slurmconfgen_smw.py caused it to parse the nic as the nid.
      On some systems those values differ, causing the generated slurm.conf file to
      be incorrect.
      
      Bug 2532.
      88ccc111
  4. Mar 08, 2016
  5. Mar 07, 2016
    • Tim Wickberg's avatar
      add additional tuning notes for mysql/mariadb · 49dc5d8d
      Tim Wickberg authored
      In particular, it seems that MariaDB has changed the default for
      innodb_lock_wait_timeout has been lowered which can cause issues
      for the various rollup processes on systems with high job counts.
      49dc5d8d
  6. Mar 05, 2016
  7. Mar 04, 2016
  8. Mar 03, 2016
  9. Mar 02, 2016
  10. Mar 01, 2016
  11. Feb 29, 2016
  12. Feb 26, 2016
  13. Feb 25, 2016
    • Tim Wickberg's avatar
      Add missing definition for val_to_char() · 344c74fc
      Tim Wickberg authored
      Since the function is inlined the single definition let GCC build everything
      properly, but debug builds (which disable inline) resulted in:
      slurmstepd: [465.0]: symbol lookup error:
      (trimmed path)/task_cgroup.so: undefined symbol: val_to_char
      when running srun --cpu_bind=v.
      
      task/affinity had this definition already, task/cgroup didn't.
      344c74fc
    • Morris Jette's avatar
      Fix for unititialized memory · c0509864
      Morris Jette authored
      Reported by valgrind running test7.2, but shouldn't cause any real problem
      c0509864
Loading