job_container/tmpfs: add functionality to restore NSs state after restart (eeec6cd2) · Commits · tud-zih-energy / Slurm

Commit eeec6cd2 authored 3 years ago by Carlos Tripiana Montes Committed by Danny Auble 3 years ago

job_container/tmpfs: add functionality to restore NSs state after restart

container_p_restore get now the list of jobs running from the spool dir
with stepd_available.

Then, it iterates over basepath entries and, for those which seems to
have been a mount point (has .ns file), tries to mount it again.

If it succeeds (it must), and if for this mount point the job is dead,
it releases resources and tries to delete files. Remember the removal
can fail if a resource is leaked. These would be fixed if slurmd starts
after HW reboot (no kernel leaks).

Bug 11093

parent c530b15e

No related branches found

No related tags found

Hide whitespace changes

Inline Side-by-side

Showing with 95 additions and 0 deletions

Please register or to comment