job_container/tmpfs: add functionality to restore NSs state after restart
container_p_restore get now the list of jobs running from the spool dir with stepd_available. Then, it iterates over basepath entries and, for those which seems to have been a mount point (has .ns file), tries to mount it again. If it succeeds (it must), and if for this mount point the job is dead, it releases resources and tries to delete files. Remember the removal can fail if a resource is leaked. These would be fixed if slurmd starts after HW reboot (no kernel leaks). Bug 11093
Loading
Please register or sign in to comment