Skip to content
Snippets Groups Projects
  1. Jan 14, 2013
    • Hongjia Cao's avatar
      Prevent srun abort on task launch failure · 163d9547
      Hongjia Cao authored
      On job step launch failure, the function
      "slurm_step_launch_wait_finish()" will be called twice in launch/slurm,
      which causes srun to be aborted:
      
      srun: error: Task launch for 22495.0 failed on node cn6: Job credential
      expired
      srun: error: Application launch failed: Job credential expired
      srun: Job step aborted: Waiting up to 2 seconds for job step to finish.
      cn5
      cn4
      cn7
      srun: error: Timed out waiting for job step to complete
      srun: Job step aborted: Waiting up to 2 seconds for job step to finish.
      srun: error: Timed out waiting for job step to complete
      srun: bitstring.c:174: bit_test: Assertion `(b) != ((void *)0)' failed.
      Aborted (core dumped)
      
      The attached patch(version 2.5.1) fixes it. But the message of
      "
      Job step aborted: Waiting up to 2 seconds for job step to finish.
      Timed out waiting for job step to complete
      "
      will still be printed twice.
      163d9547
    • Morris Jette's avatar
      Add debugging hint to MPI guide for MPICH2 · dd8c22c7
      Morris Jette authored
      dd8c22c7
    • Yair Yarom's avatar
      Fix bug in accounting_storage/pgsql · 667cbf15
      Yair Yarom authored
      667cbf15
    • Morris Jette's avatar
      08cfbf0a
    • Morris Jette's avatar
      Revision of gres topology bug fix · e9c216c4
      Morris Jette authored
      e9c216c4
  2. Jan 11, 2013
  3. Jan 10, 2013
  4. Jan 09, 2013
  5. Jan 08, 2013
  6. Jan 03, 2013
  7. Dec 28, 2012
Loading