Skip to content
Snippets Groups Projects
  • Alejandro Sanchez's avatar
    7d246784
    slurmctld/power_save - Down nodes if not resumed after ResumeTimeout. · 7d246784
    Alejandro Sanchez authored
    Down waking nodes right after ResumeTimeout has been reached if they are
    not responding. Otherwise we have to wait for ping_nodes() to handle
    this work, thus SlurmdTimeout comes into play giving the sensation to
    the end user that nodes got stuck in ALLOCATED# and job in CF state
    until ping_nodes() decides to mark them DOWN and requeue the job.
    
    Bug 4182
    7d246784
    History
    slurmctld/power_save - Down nodes if not resumed after ResumeTimeout.
    Alejandro Sanchez authored
    Down waking nodes right after ResumeTimeout has been reached if they are
    not responding. Otherwise we have to wait for ping_nodes() to handle
    this work, thus SlurmdTimeout comes into play giving the sensation to
    the end user that nodes got stuck in ALLOCATED# and job in CF state
    until ping_nodes() decides to mark them DOWN and requeue the job.
    
    Bug 4182