-
Morris Jette authored
The old logic would result in test16.4 failing some of the time. The failure was caused by the sattach command attaching to a job step before the original srun command received a RESPONSE_LAUNCH_TASKS message. That messsage would then be sent to the salloc command. Since srun never got the message, it would hang. This change does not mark the job step as RUNNING until after the original srun gets sent the RESPONSE_LAUNCH_TASKS message and sattach requests are blocked until that time.
38089f2bMorris Jette authoredThe old logic would result in test16.4 failing some of the time. The failure was caused by the sattach command attaching to a job step before the original srun command received a RESPONSE_LAUNCH_TASKS message. That messsage would then be sent to the salloc command. Since srun never got the message, it would hang. This change does not mark the job step as RUNNING until after the original srun gets sent the RESPONSE_LAUNCH_TASKS message and sattach requests are blocked until that time.
Loading