From 27036a7407175e2e81a73a0c31c67a6a41ae159d Mon Sep 17 00:00:00 2001 From: Moe Jette <jette1@llnl.gov> Date: Tue, 17 Jan 2006 16:38:21 +0000 Subject: [PATCH] Claify node state changes with v0.7 state mods (flag for DRAIN). --- doc/html/faq.shtml | 9 ++++++--- 1 file changed, 6 insertions(+), 3 deletions(-) diff --git a/doc/html/faq.shtml b/doc/html/faq.shtml index ad353c58d6c..af420e6cb6d 100644 --- a/doc/html/faq.shtml +++ b/doc/html/faq.shtml @@ -26,8 +26,11 @@ the job and one or more nodes can remain in the completing state for an extended period of time. This may be indicative of processes hung waiting for a core file to complete I/O or operating system failure. If this state persists, the system administrator should use the <span class="commandline">scontrol</span> command -to change the node's state to "down," reboot the node, then reset the -node's state to idle.</p> +to change the node's state to <i>DOWN</i> (e.g. "scontrol update +NodeName=<i>name</i> State=DOWN Reason=hung_completing"), reboot the node, +then reset the node's state to IDLE (e.g. "scontrol update +NodeName=<i>name</i> State=RESUME").</p> + <p><a name="rlimit"><b>2. Why do I see the error "Can't propagate RLIMIT_..."?</b></a><br> When the <span class="commandline">srun</span> command executes, it captures the resource limits in effect at that time. These limits are propagated to the allocated @@ -168,6 +171,6 @@ Suspending and resuming a job makes use of the SIGSTOP and SIGCONT signals respectively, so swap and disk space should be sufficient to accommodate all jobs allocated to a node, either running or suspended. -<p style="text-align:center;">Last modified 22 December 2005</p> +<p style="text-align:center;">Last modified 16 January 2006</p> <!--#include virtual="footer.txt"--> -- GitLab