Skip to content
Snippets Groups Projects
Commit d407b4dd authored by Morris Jette's avatar Morris Jette
Browse files

Clarify node state configuration in slurm.conf

parent 25b346b2
No related branches found
No related tags found
No related merge requests found
...@@ -2998,20 +2998,54 @@ The default value is 1. ...@@ -2998,20 +2998,54 @@ The default value is 1.
.TP .TP
\fBState\fR \fBState\fR
State of the node with respect to the initiation of user jobs. State of the node with respect to the initiation of user jobs.
Acceptable values are "DOWN", "DRAIN", "FAIL", "FAILING" and "UNKNOWN". Acceptable values are "CLOUD", "DOWN", "DRAIN", "FAIL", "FAILING", "FUTURE"
"DOWN" indicates the node failed and is unavailable to be allocated work. and "UNKNOWN".
"DRAIN" indicates the node is unavailable to be allocated work. Node states of "BUSY" and "IDLE" should not be specified in the node
"FAIL" indicates the node is expected to fail soon, has configuration, but set the node state to "UNKNOWN" instead.
Setting the node state to "UNKNOWN" will result in the node state being set to
"BUSY", "IDLE" or other appropriate state based upon recovered system state
information.
The default value is "UNKNOWN".
Also see the \fBDownNodes\fR parameter below.
.RS
.TP 10
\fBCLOUD\fP
Indicates the node exists in the cloud.
It's initial state will be treated as powered down.
The node will be available for use after it's state is recovered from SLURM's
state save file or the slurmd daemon starts on the compute node.
.TP
\fBDOWN\fP
Indicates the node failed and is unavailable to be allocated work.
.TP
\fBDRAIN\fP
Indicates the node is unavailable to be allocated work.on.
.TP
\fBFAIL\fP
Indicates the node is expected to fail soon, has
no jobs allocated to it, and will not be allocated no jobs allocated to it, and will not be allocated
to any new jobs. to any new jobs.
"FAILING" indicates the node is expected to fail soon, has .TP
\fBFAILING\fP
Indicates the node is expected to fail soon, has
one or more jobs allocated to it, but will not be allocated one or more jobs allocated to it, but will not be allocated
to any new jobs. to any new jobs.
"UNKNOWN" indicates the node's state is undefined (BUSY or IDLE), .TP
\fBFUTURE\fP
Indicates the node is defined for future use and need not
exist when the SLURM daemons are started. These nodes can be made available
for use simply by updating the node state using the scontrol command rather
than restarting the slurmctld daemon. After these nodes are made available,
change their \fRState\fR in the slurm.conf file. Until these nodes are made
available, they will not be seen using any SLURM commands or nor will
any attempt be made to contact them.
.TP
\fBUNKNOWN\fP
Indicates the node's state is undefined (BUSY or IDLE),
but will be established when the \fBslurmd\fR daemon on that node but will be established when the \fBslurmd\fR daemon on that node
registers. registers.
The default value is "UNKNOWN". The default value is "UNKNOWN".
Also see the \fBDownNodes\fR parameter below. .RE
.TP .TP
\fBThreadsPerCore\fR \fBThreadsPerCore\fR
...@@ -3085,16 +3119,15 @@ Identifies the reason for a node being in state "DOWN", "DRAIN", ...@@ -3085,16 +3119,15 @@ Identifies the reason for a node being in state "DOWN", "DRAIN",
.TP .TP
\fBState\fR \fBState\fR
State of the node with respect to the initiation of user jobs. State of the node with respect to the initiation of user jobs.
Acceptable values are "BUSY", "DOWN", "DRAIN", "FAIL", Acceptable values are "DOWN", "DRAIN", "FAIL", "FAILING" and "UNKNOWN".
"FAILING, "IDLE", and "UNKNOWN". Node states of "BUSY" and "IDLE" should not be specified in the node
configuration, but set the node state to "UNKNOWN" instead.
Setting the node state to "UNKNOWN" will result in the node state being set to
"BUSY", "IDLE" or other appropriate state based upon recovered system state
information.
The default value is "UNKNOWN".
.RS .RS
.TP 10 .TP 10
\fBCLOUD\fP
Indicates the node exists in the cloud.
It's initial state will be treated as powered down.
The node will be available for use after it's state is recovered from SLURM's
state save file or the slurmd daemon starts on the compute node.
.TP
\fBDOWN\fP \fBDOWN\fP
Indicates the node failed and is unavailable to be allocated work. Indicates the node failed and is unavailable to be allocated work.
.TP .TP
...@@ -3111,15 +3144,6 @@ Indicates the node is expected to fail soon, has ...@@ -3111,15 +3144,6 @@ Indicates the node is expected to fail soon, has
one or more jobs allocated to it, but will not be allocated one or more jobs allocated to it, but will not be allocated
to any new jobs. to any new jobs.
.TP .TP
\fBFUTURE\fP
Indicates the node is defined for future use and need not
exist when the SLURM daemons are started. These nodes can be made available
for use simply by updating the node state using the scontrol command rather
than restarting the slurmctld daemon. After these nodes are made available,
change their \fRState\fR in the slurm.conf file. Until these nodes are made
available, they will not be seen using any SLURM commands or nor will
any attempt be made to contact them.
.TP
\fBUNKNOWN\fP \fBUNKNOWN\fP
Indicates the node's state is undefined (BUSY or IDLE), Indicates the node's state is undefined (BUSY or IDLE),
but will be established when the \fBslurmd\fR daemon on that node but will be established when the \fBslurmd\fR daemon on that node
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment