From 98c45d4cbea76a7d76cfd54627b9fdee0417466a Mon Sep 17 00:00:00 2001 From: Moe Jette <jette1@llnl.gov> Date: Thu, 18 Jan 2007 22:12:19 +0000 Subject: [PATCH] Minor format updates. No change in content. --- doc/html/faq.shtml | 40 +++++++++++++++++++++------------------- doc/html/mail.shtml | 4 ++-- 2 files changed, 23 insertions(+), 21 deletions(-) diff --git a/doc/html/faq.shtml b/doc/html/faq.shtml index 550df4caa8a..702c1c2aa4a 100644 --- a/doc/html/faq.shtml +++ b/doc/html/faq.shtml @@ -111,37 +111,39 @@ The Maui Scheduler</a> or Moab Cluster Suite</a>. Please refer to its documentation for help. For any scheduler, you can check priorities of jobs using the command <span class="commandline">scontrol show job</span>.</p> + <p><a name="sharing"><b>4. Why does the srun --overcommit option not permit multiple jobs to run on nodes?</b></a><br> The <b>--overcommit</b> option is a means of indicating that a job or job step is willing to execute more than one task per processor in the job's allocation. For example, consider a cluster of two processor nodes. The srun execute line may be something -of this sort +of this sort</p> <blockquote> <p><span class="commandline">srun --ntasks=4 --nodes=1 a.out</span></p> </blockquote> -This will result in not one, but two nodes being allocated so that each of the four +<p>This will result in not one, but two nodes being allocated so that each of the four tasks is given its own processor. Note that the srun <b>--nodes</b> option specifies -a minimum node count and optionally a maximum node count. A command line of +a minimum node count and optionally a maximum node count. A command line of</p> <blockquote> <p><span class="commandline">srun --ntasks=4 --nodes=1-1 a.out</span></p> </blockquote> -would result in the request being rejected. If the <b>--overcommit</b> option +<p>would result in the request being rejected. If the <b>--overcommit</b> option is added to either command line, then only one node will be allocated for all -four tasks to use. +four tasks to use.</p> <p>More than one job can execute simultaneously on the same nodes through the use of srun's <b>--shared</b> option in conjunction with the <b>Shared</b> parameter in SLURM's partition configuration. See the man pages for srun and slurm.conf for -more information. +more information.</p> + <p><a name="purge"><b>5. Why is my job killed prematurely?</b></a><br> SLURM has a job purging mechanism to remove inactive jobs (resource allocations) before reaching its time limit, which could be infinite. This inactivity time limit is configurable by the system administrator. -You can check it's value with the command +You can check it's value with the command</p> <blockquote> <p><span class="commandline">scontrol show config | grep InactiveLimit</span></p> </blockquote> -The value of InactiveLimit is in seconds. +<p>The value of InactiveLimit is in seconds. A zero value indicates that job purging is disabled. A job is considered inactive if it has no active job steps or if the srun command creating the job is not responding. @@ -156,11 +158,11 @@ Everything after the command <span class="commandline">srun</span> is examined to determine if it is a valid option for srun. The first token that is not a valid option for srun is considered the command to execute and everything after that is treated as an option to -the command. For example: +the command. For example:</p> <blockquote> <p><span class="commandline">srun -N2 hostname -pdebug</span></p> </blockquote> -srun processes "-N2" as an option to itself. "hostname" is the +<p>srun processes "-N2" as an option to itself. "hostname" is the command to execute and "-pdebug" is treated as an option to the hostname command. Which will change the name of the computer on which SLURM executes the command - Very bad, <b>Don't run @@ -180,7 +182,7 @@ It was designed to perform backfill node scheduling for a homogeneous cluster. It does not manage scheduling on individual processors (or other consumable resources). It also does not update the required or excluded node list of individual jobs. These are the current limiations. You can use the -scontrol show command to check if these conditions apply. +scontrol show command to check if these conditions apply.</p> <ul> <li>partition: State=UP</li> <li>partition: RootOnly=NO</li> @@ -193,7 +195,7 @@ scontrol show command to check if these conditions apply. the partition</li> <li>job: MinProcs or MinNodes not to exceed partition's MaxNodes</li> </ul> -As soon as any priority-ordered job in the partition's queue fail to +<p>As soon as any priority-ordered job in the partition's queue fail to satisfy the request, no lower priority job in that partition's queue will be considered as a backfill candidate. Any programmer wishing to augment the existing code is welcome to do so. @@ -346,23 +348,23 @@ or access to compute nodes?</b></a><br> First, enable SLURM's use of PAM by setting <i>UsePAM=1</i> in <i>slurm.conf</i>.<br> Second, establish a PAM configuration file for slurm in <i>/etc/pam.d/slurm</i>. -A basic configuration you might use is:<br> +A basic configuration you might use is:</p> <pre> auth required pam_localuser.so account required pam_unix.so session required pam_limits.so </pre> -Third, set the desired limits in <i>/etc/security/limits.conf</i>. -For example, to set the locked memory limit to unlimited for all users:<br> +<p>Third, set the desired limits in <i>/etc/security/limits.conf</i>. +For example, to set the locked memory limit to unlimited for all users:</p> <pre> * hard memlock unlimited * soft memlock unlimited </pre> -Finally, you need to disable SLURM's forwarding of the limits from the +<p>Finally, you need to disable SLURM's forwarding of the limits from the session from which the <i>srun</i> initiating the job ran. By default all resource limits are propogated from that session. For example, adding the following line to <i>slurm.conf</i> will prevent the locked memory -limit from being propagated:<i>PropagateResourceLimitsExcept=MEMLOCK</i>. +limit from being propagated:<i>PropagateResourceLimitsExcept=MEMLOCK</i>.</p> <p>We also have a PAM module for SLURM that prevents users from logging into nodes that they have not been allocated (except for user @@ -395,7 +397,7 @@ partition. You can control the frequency of this ping with the backup controller?</b></a><br> If the cluster's computers used for the primary or backup controller will be out of service for an extended period of time, it may be desirable -to relocate them. In order to do so, follow this procedure: +to relocate them. In order to do so, follow this procedure:</p> <ol> <li>Stop all SLURM daemons</li> <li>Modify the <i>ControlMachine</i>, <i>ControlAddr</i>, @@ -403,7 +405,7 @@ to relocate them. In order to do so, follow this procedure: <li>Distribute the updated <i>slurm.conf</i> file file to all nodes</li> <li>Restart all SLURM daemons</li> </ol> -There should be no loss of any running or pending jobs. Insure that +<p>There should be no loss of any running or pending jobs. Insure that any nodes added to the cluster have a current <i>slurm.conf</i> file installed. <b>CAUTION:</b> If two nodes are simultaneously configured as the primary diff --git a/doc/html/mail.shtml b/doc/html/mail.shtml index 78e4a20a266..d86764a727d 100644 --- a/doc/html/mail.shtml +++ b/doc/html/mail.shtml @@ -1,14 +1,14 @@ <!--#include virtual="header.txt"--> <h1>Mailing Lists</h1> -<p>We maintain two SLURM mailing lists: +<p>We maintain two SLURM mailing lists:</p> <ul> <li><b>slurm-announce</b> is designated for communications about SLURM releases [low traffic].</li> <li><b>slurm-dev</b> is designated for communications to SLURM developers [high traffic at times].</li> </ul> -To subscribe to either list, send a message to +<p>To subscribe to either list, send a message to <a href="mailto:majordomo@lists.llnl.gov">majordomo@lists.llnl.gov</a> with the body of the message containing the word "subscribe" followed by the list name and your e-mail address (if not the sender). For example: <br> -- GitLab