Skip to content
Snippets Groups Projects
  • Brian Christiansen's avatar
    52d17c74
    Clear DRAIN on node after failing to resume before ResumeTimeout · 52d17c74
    Brian Christiansen authored
    
    DRAIN could be lingering on the node from a previous reboot asap.
    
    e.g.
    $ scontrol show nodes cloud1 | grep State
       State=IDLE+CLOUD+POWERED_DOWN
    $ sbatch --wrap="sleep 30" -pcloud
    Submitted batch job 222134
    $ scontrol show nodes cloud1 | grep State
       State=ALLOCATED+CLOUD+NOT_RESPONDING+POWERING_UP
    $ scontrol reboot asap cloud1
    $ scontrol show nodes cloud1 | grep State
       State=ALLOCATED+CLOUD+DRAIN+REBOOT_REQUESTED+NOT_RESPONDING+POWERING_UP
       NextState=RESUME
    $ scontrol update nodename=cloud1 state=power_down
    $ scontrol show nodes cloud1 | grep State
       State=ALLOCATED+CLOUD+DRAIN+POWER_DOWN+NOT_RESPONDING+POWERING_UP
    $ scontrol show nodes cloud1 | grep State
       State=DOWN+CLOUD+DRAIN+POWER_DOWN+POWERED_DOWN+NOT_RESPONDING
    
    $ scontrol show nodes cloud1 | grep "State\|Reason"
       State=DOWN+CLOUD+DRAIN+POWER_DOWN+POWERED_DOWN+NOT_RESPONDING
       Reason=ResumeTimeout reached [brian@2022-03-28T22:29:50]
    
    Bug 13515
    
    Signed-off-by: default avatarSkyler Malinowski <malinowski@schedmd.com>
    52d17c74
    History
    Clear DRAIN on node after failing to resume before ResumeTimeout
    Brian Christiansen authored
    
    DRAIN could be lingering on the node from a previous reboot asap.
    
    e.g.
    $ scontrol show nodes cloud1 | grep State
       State=IDLE+CLOUD+POWERED_DOWN
    $ sbatch --wrap="sleep 30" -pcloud
    Submitted batch job 222134
    $ scontrol show nodes cloud1 | grep State
       State=ALLOCATED+CLOUD+NOT_RESPONDING+POWERING_UP
    $ scontrol reboot asap cloud1
    $ scontrol show nodes cloud1 | grep State
       State=ALLOCATED+CLOUD+DRAIN+REBOOT_REQUESTED+NOT_RESPONDING+POWERING_UP
       NextState=RESUME
    $ scontrol update nodename=cloud1 state=power_down
    $ scontrol show nodes cloud1 | grep State
       State=ALLOCATED+CLOUD+DRAIN+POWER_DOWN+NOT_RESPONDING+POWERING_UP
    $ scontrol show nodes cloud1 | grep State
       State=DOWN+CLOUD+DRAIN+POWER_DOWN+POWERED_DOWN+NOT_RESPONDING
    
    $ scontrol show nodes cloud1 | grep "State\|Reason"
       State=DOWN+CLOUD+DRAIN+POWER_DOWN+POWERED_DOWN+NOT_RESPONDING
       Reason=ResumeTimeout reached [brian@2022-03-28T22:29:50]
    
    Bug 13515
    
    Signed-off-by: default avatarSkyler Malinowski <malinowski@schedmd.com>