diff --git a/NEWS b/NEWS index 8c03bff1b7688fa6899d289ec41a70e5f4fce5f7..9d6d3ceff8809e91f22cade9a003b2aee00277db 100644 --- a/NEWS +++ b/NEWS @@ -1,8 +1,12 @@ This file describes changes in recent versions of SLURM. It primarily documents those changes that are of interest to users and admins. +* Changes in SLURM 1.0.1 +======================== + -- Assorted updates and clarifications in documentation. + * Changes in SLURM 1.0.0 -============================= +======================== -- Fix sinfo filtering bug, especially "sinfo -R" output. -- Fix node state change bug, resuming down or drained nodes. -- Fix "scontrol show config" to display JobCredentialPrivateKey instead diff --git a/doc/html/faq.shtml b/doc/html/faq.shtml index f740c2a5e3077a47eaefcfe8f48d3c781447863d..dfcb24d059a3f42d245c67316c1e28e8a190ff8a 100644 --- a/doc/html/faq.shtml +++ b/doc/html/faq.shtml @@ -14,9 +14,10 @@ to run on nodes?</a></li> job?</a></li> <li><a href="#suspend">How is job suspend/resume useful?</a></li> <li><a href="#fast_schedule">How can I configure SLURM to use the resources actually -found on a node rather than what is defined in <i>slurm.conf</i>?</li> +found on a node rather than what is defined in <i>slurm.conf</i>?</a></li> <li><a href="#return_to_service">Why is a node shown in state DOWN when the node -has registered for service?</li> +has registered for service?</a></li> +<li><a href="#down_node">What happens when a node crashes?</a></li> </ol> <p><a name="comp"><b>1. Why is my job/node in COMPLETING state?</b></a><br> When a job is terminating, both the job and its nodes enter the state "completing." @@ -186,7 +187,7 @@ Set it's value to zero in order to use the resources actually found on each node, but with a higher overhead for scheduling. A value of one is the default and results in the node configuration defined in <i>slurm.conf</i> being used. See "man slurm.conf" -for more details. +for more details.</p> <p><a name="return_to_service"><b>11. Why is a node shown in state DOWN when the node has registered for service?</b></a><br> @@ -198,8 +199,19 @@ with a valid node configuration. A value of zero is the default and results in a node staying DOWN until an administrator explicity returns it to service using the command "scontrol update NodeName=whatever State=RESUME". -See "man slurm.conf" and "man scontrol" for more details. +See "man slurm.conf" and "man scontrol" for more +details.</p> + +<p><a name="down_node"><b>12. What happens when a node crashes?</b></a><br> +A node is set DOWN when the slurmd daemon on it stops responding +for <i>SlurmdTimeout</i> as defined in <i>slurm.conf</i>. +The node can also be set DOWN when certain errors occur or the +node's configuration is inconsistent with that defined in <i>slurm.conf</i>. +Any active job on that node will be killed unless it was submitted +with the srun option <i>--no-kill</i>. +Any active job step on that node will be killed. +See the slurm.conf and srun man pages for more information.</p> -<p style="text-align:center;">Last modified 16 January 2006</p> +<p style="text-align:center;">Last modified 18 January 2006</p> <!--#include virtual="footer.txt"-->