Skip to content
Snippets Groups Projects
Commit e1147ea9 authored by Moe Jette's avatar Moe Jette
Browse files

Improve fault-tolerance for batch jobs. If a node fails to respond to the

batch_job_launch RPC, then deallocate those resources and requeue the job.
If a node registers and fails to show a batch job that should have a
script running there (node zero of allocation), then consider the job
complete.
parent 9d351634
No related branches found
No related tags found
No related merge requests found
Loading
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment