-
Morris Jette authored
Treat Cray MPI job calling exit() without mpi_fini() as fatal error for that specific task and let srun handle all timeout logic. Previous logic would cancel the entire job step and srun options for wait time and kill on exit were ignored. The new logic provides users with the following type of response: $ srun -n3 -K0 -N3 --wait=60 ./tmp Task:0 Cycle:1 Task:2 Cycle:1 Task:1 Cycle:1 Task:0 Cycle:2 Task:2 Cycle:2 slurmstepd: step 14927.0 task 1 exited without calling mpi_fini() srun: error: tux2: task 1: Killed Task:0 Cycle:3 Task:2 Cycle:3 Task:0 Cycle:4 ... bug 1171
Morris Jette authoredTreat Cray MPI job calling exit() without mpi_fini() as fatal error for that specific task and let srun handle all timeout logic. Previous logic would cancel the entire job step and srun options for wait time and kill on exit were ignored. The new logic provides users with the following type of response: $ srun -n3 -K0 -N3 --wait=60 ./tmp Task:0 Cycle:1 Task:2 Cycle:1 Task:1 Cycle:1 Task:0 Cycle:2 Task:2 Cycle:2 slurmstepd: step 14927.0 task 1 exited without calling mpi_fini() srun: error: tux2: task 1: Killed Task:0 Cycle:3 Task:2 Cycle:3 Task:0 Cycle:4 ... bug 1171