From ac73463ce70cd572ca88a3b5c6ed770b429c210e Mon Sep 17 00:00:00 2001 From: Moe Jette <jette1@llnl.gov> Date: Thu, 27 Jan 2005 20:05:15 +0000 Subject: [PATCH] Various updates to html documents. --- NEWS | 1 + doc/html/faq.html | 25 +++++++++++++-- doc/html/help.html | 8 +++-- doc/html/news.html | 16 +++++----- doc/html/overview.html | 13 +++++--- doc/html/quickstart.html | 57 +++++++++++++++++++++++++--------- doc/html/quickstart_admin.html | 24 ++++++++------ doc/html/team.html | 6 ++-- 8 files changed, 105 insertions(+), 45 deletions(-) diff --git a/NEWS b/NEWS index d46ffec2dd6..146d4cbebc6 100644 --- a/NEWS +++ b/NEWS @@ -8,6 +8,7 @@ documents those changes that are of interest to users and admins. prolog is running (which can be slow on BlueGene). -- Add new error code, ESLURM_BATCH_ONLY for attepts to launch job steps on front-end system (e.g. Blue Gene). + -- Updates to html documents. * Changes in SLURM 0.4.0-pre9 ============================= diff --git a/doc/html/faq.html b/doc/html/faq.html index 963f3080f16..cae031dcc05 100644 --- a/doc/html/faq.html +++ b/doc/html/faq.html @@ -9,7 +9,7 @@ <meta http-equiv="keywords" content="Simple Linux Utility for Resource Management, SLURM, resource management, Linux clusters, high-performance computing, Livermore Computing"> <meta name="LLNLRandR" content="UCRL-WEB-204324"> -<meta name="LLNLRandRdate" content="12 January 2004"> +<meta name="LLNLRandRdate" content="27 January 2005"> <meta name="distribution" content="global"> <meta name="description" content="Simple Linux Utility for Resource Management"> <meta name="copyright" @@ -59,6 +59,7 @@ structure:Laboratories and Other Field Facilities"> <li><a href="#sharing">Why does the srun --overcommit option not permit multiple jobs to run on nodes?</a></li> <li><a href="#purge">Why is my job killed prematurely?</a></li> +<li><a href="#opts">Why are my srun options ignored?</a></li> </ol> <p><a name="comp"><b>1. Why is my job/node in "completing" state?</b></a><br> When a job is terminating, both the job and its nodes enter the state "completing." @@ -82,6 +83,7 @@ submit host are higher than the hard resource limits on the allocated host, SLUR will be unable to propagate the resource limits and print an error of the type shown above. It is recommended that the system administrator establish uniform hard resource limits on all nodes within a cluster to prevent this from occurring.</p> + <p><a name="pending"><b>3. Why is my job not running?</b></a><br> The answer to this question depends upon the scheduler used by SLURM. Executing the command</p> @@ -131,7 +133,7 @@ SLURM has a job purging mechanism to remove inactive jobs (resource allocations) before reaching its time limit, which could be infinite. This inactivity time limit is configurable by the system administrator. You can check it's value with the command -<blockquite> +<blockquote> <p><span class="commandline">scontrol show config | grep InactiveLimit</span></p> </blockquote> The value of InactiveLimit is in seconds. @@ -143,13 +145,30 @@ is submitted. Therefore batch job pre- and post-processing is limited to the InactiveLimit. Contact your system administrator if you believe the InactiveLimit value should be changed. + +<p><a name="opts"><b>6. Why are my srun options ignored?</b></a><br> +Everything after the command <span class="commandline">srun</span> is +examined to determine if it is a valid option for srun. The first +token that is not a valid option for srun is considered the command +to execute and everything after that is treated as an option to +the command. For example: +<blockquote> +<p><span class="commandline">srun -N2 hostname -pdebug</span></p> +</blockquote> +srun processes "-N2" as an option to itself. "hostname" is the +command to execute and "-pdebug" is treated as an option to the +hostname command. Which will change the name of the computer +on which SLURM executes the command - Very bad, <b>Don't run +this command as user root!</b></p> + +<br> </td> </tr> <tr> <td colspan="3"><hr> <p>For information about this page, contact <a href="mailto:slurm-dev@lists.llnl.gov">slurm-dev@lists.llnl.gov</a>.</p> <p><a href="http://www.llnl.gov/"><img align=middle src="lll.gif" width="32" height="32" border="0"></a></p> <p class="footer">UCRL-WEB-207187<br> -Last modified March 24, 2004</p></td> +Last modified 27 January 2005</p></td> </tr> </table> </td> diff --git a/doc/html/help.html b/doc/html/help.html index 7e5e3476d31..9a26050e653 100644 --- a/doc/html/help.html +++ b/doc/html/help.html @@ -9,7 +9,7 @@ <meta http-equiv="keywords" content="Simple Linux Utility for Resource Management, SLURM, resource management, Linux clusters, high-performance computing, Livermore Computing"> <meta name="LLNLRandR" content="UCRL-WEB-204324"> -<meta name="LLNLRandRdate" content="12 January 2004"> +<meta name="LLNLRandRdate" content="27 January 2005"> <meta name="distribution" content="global"> <meta name="description" content="Simple Linux Utility for Resource Management"> <meta name="copyright" @@ -58,14 +58,16 @@ structure:Laboratories and Other Field Facilities"> <li>For run-time problems, try running the command or daemons in verbose mode (<span class="commandline">-v</span> option), and see if additional information helps you resolve the problem.</li> -<li>Send a detailed description of the problem and logs to <a href="mailto:slurm-dev@lists.llnl.gov">slurm-dev@lists.llnl.gov</a>.</li> +<li>Customers of HP and Linux NetworX should contact their support staff.</li> +<li>Send a detailed description of the problem and logs to +<a href="mailto:slurm-dev@lists.llnl.gov">slurm-dev@lists.llnl.gov</a>.</li> </ol></td> </tr> <tr> <td colspan="3"><hr> <p>For information about this page, contact <a href="mailto:slurm-dev@lists.llnl.gov">slurm-dev@lists.llnl.gov</a>.</p> <p><a href="http://www.llnl.gov/"><img align=middle src="lll.gif" width="32" height="32" border="0"></a></p> <p class="footer">UCRL-WEB-207187<br> -Last modified January 15, 2004</p></td> +Last modified 27 January 2005</p></td> </tr> </table> </td> diff --git a/doc/html/news.html b/doc/html/news.html index 37ffa58a7b7..67e3c987828 100644 --- a/doc/html/news.html +++ b/doc/html/news.html @@ -9,7 +9,7 @@ <meta http-equiv="keywords" content="Simple Linux Utility for Resource Management, SLURM, resource management, Linux clusters, high-performance computing, Livermore Computing"> <meta name="LLNLRandR" content="UCRL-WEB-204324"> -<meta name="LLNLRandRdate" content="13 October 2004"> +<meta name="LLNLRandRdate" content="27 January 2005"> <meta name="distribution" content="global"> <meta name="description" content="Simple Linux Utility for Resource Management"> <meta name="copyright" @@ -53,7 +53,7 @@ structure:Laboratories and Other Field Facilities"> <td><img src="spacer.gif" width="10" height="1" alt=""></td> <td valign="top"><h2>What's New</h2> <h3>Major Updates in SLURM Version 0.3</h3> -<p>SLURM Version 0.3 become available in May 2004. +<p>SLURM Version 0.3 became available in May 2004. Major enhancements included: <ul> <li>Scheduler plugin developed for backfill scheduling and <a href="http://supercluster.org/maui"> @@ -73,8 +73,8 @@ For a complete list of enhancements in SLURM version 0.3, please see the NEWS file with the code distribution.</p> <h3>Major Updates in SLURM Version 0.4</h3> -<p>We expect to make SLURM Version 0.4 available in Deccember 2004. -Major enhancements include: +<p>SLURM Version 0.4 became available in January 2005. +Major enhancements included: <ul> <li>Support for <a href="http://www.platform.com/">Load Sharing Facility (LSF)</a></li> <li>Add support for <a href="http://www.myri.com/scs/">MPICH-GM</a> @@ -93,9 +93,9 @@ Major enhancements include: Major enhancements include: <ul> <li>Support for the IBM Federation switch.</li> -<li>Checkpoint plugin added with support for IBM system checkpoint.</li> -<li>I/O streams for all tasks on a node are transmitted through one pair of sockets -instead of distinct sockets for each task. This improves performance and scalability.</li> +<li>I/O streams for all tasks on a node are transmitted through one pair of +sockets instead of distinct sockets for each task. This improves performance +and scalability.</li> <li>Support for task communication/synchronization primitives.</li> </ul> @@ -105,6 +105,8 @@ not been finalized. Anyone desiring to perform SLURM development should notify <a href="mailto:slurm-dev@lists.llnl.gov">slurm-dev@lists.llnl.gov</a> to coordinate activies. Future development plans includes: <ul> +<li>Support for Infiniband</li> +<li>Support for IBM system-level checkpoint</li> <li>Support of various MPI types via a plugin mechanism.</li> <li>Permit resource allocations (jobs) to change size.</li> <li>Manage consumable resources on a per-node (e.g,. memory, disk space) diff --git a/doc/html/overview.html b/doc/html/overview.html index 42be473784c..f2735409976 100644 --- a/doc/html/overview.html +++ b/doc/html/overview.html @@ -9,7 +9,7 @@ <meta http-equiv="keywords" content="Simple Linux Utility for Resource Management, SLURM, resource management, Linux clusters, high-performance computing, Livermore Computing"> <meta name="LLNLRandR" content="UCRL-WEB-204324"> -<meta name="LLNLRandRdate" content="26 October 2004"> +<meta name="LLNLRandRdate" content="27 January 2005"> <meta name="distribution" content="global"> <meta name="description" content="Simple Linux Utility for Resource Management"> <meta name="copyright" @@ -93,9 +93,12 @@ infrastructure. These plugins presently include: <li>Node selection: Blue Gene (a 3-D torus interconnect) or linear.</li> <li>Scheduler: <a href="http://supercluster.org/maui">The Maui Scheduler</a>, backfill, or FIFO (default).</li> -<li>Switch or interconnect: <a href="http://www.quadrics.com/">Quadrics</a> (Elan3 -or Elan4) or none (actually means nothing requiring special handling, such -as ethernet, default).</li> +<li>Switch or interconnect: <a href="http://www.quadrics.com/">Quadrics</a> +(Elan3 or Elan4), Federation +(<a href="http://publib-b.boulder.ibm.com/Redbooks.nsf/f338d71ccde39f08852568dd006f956d/55258945787efc2e85256db00051980a?OpenDocument"> +IBM High Performance Switch</a>), or none (actually means nothing requiring +special handling, such as Ethernet or +<a href="http://www.myricom.com/">Myrinet</a>, default).</li> </ul> <p class="footer"><a href="#top">top</a></p> @@ -158,7 +161,7 @@ excellent. The throughput rate of simple 2000 task jobs across 1000 nodes is ove <td colspan="3"><hr> <p>For information about this page, contact <a href="mailto:slurm-dev@lists.llnl.gov">slurm-dev@lists.llnl.gov</a>.</p> <p><a href="http://www.llnl.gov/"><img align=middle src="lll.gif" width="32" height="32" border="0"></a></p> <p class="footer">UCRL-WEB-207187<br> -Last modified 26 October 2004</p></td> +Last modified 27 January 2005</p></td> </tr> </table> </td> diff --git a/doc/html/quickstart.html b/doc/html/quickstart.html index a9fee55b0a5..43af20d395f 100644 --- a/doc/html/quickstart.html +++ b/doc/html/quickstart.html @@ -9,7 +9,7 @@ <meta http-equiv="keywords" content="Simple Linux Utility for Resource Management, SLURM, resource management, Linux clusters, high-performance computing, Livermore Computing"> <meta name="LLNLRandR" content="UCRL-WEB-204324"> -<meta name="LLNLRandRdate" content="12 January 2004"> +<meta name="LLNLRandRdate" content="27 January 2005"> <meta name="distribution" content="global"> <meta name="description" content="Simple Linux Utility for Resource Management"> <meta name="copyright" @@ -184,19 +184,26 @@ batch 1 DOWN 2 3448 82306 adev8 7 IDLE 2 3448-3458 82306 adev[9-15] </pre> <p class="footer"><a href="#top">top</a></p> <h3>MPI</h3> -<p>MPI use depends upon the type of MPI being used. <a href="http://www.quadrics.com/">Quadrics</a> -MPI relies upon SLURM to allocate resources for the job and <span class="commandline">srun</span> -to initiate the tasks. One would build the MPI program in the normal manner then -initiate it using a command line of this sort:</p> +<p>MPI use depends upon the type of MPI being used. +Instructions for using several varieties of MPI with SLURM are +provided below.</p> + +<p> <a href="http://www.quadrics.com/">Quadrics MPI</a> relies upon SLURM to +allocate resources for the job and <span class="commandline">srun</span> +to initiate the tasks. One would build the MPI program in the normal manner +then initiate it using a command line of this sort:</p> <p class="commandline"> srun [OPTIONS] <program> [program args]</p> -<p> <a href="http://www.lam-mpi.org/">LAM/MPI</a> relies upon the SLURM <span class="commandline">srun</span> -command to allocate resources using either the <span class="commandline">--allocate</span> -or the <span class="commandline">--batch</span> option. In either case, specify -the maximum number of tasks required for the job. Then execute the <span class="commandline">lamboot</span> -command to start lamd daemons. <span class="commandline">lamboot</span> utilizes -SLURM's <span class="commandline">srun</span> command to launch these daemons. -Do not directly execute the <span class="commandline">srun</span> command to launch -LAM/MPI tasks. For example: + +<p> <a href="http://www.lam-mpi.org/">LAM/MPI</a> relies upon the SLURM +<span class="commandline">srun</span> command to allocate resources using +either the <span class="commandline">--allocate</span> or the +<span class="commandline">--batch</span> option. In either case, specify +the maximum number of tasks required for the job. Then execute the +<span class="commandline">lamboot</span> command to start lamd daemons. +<span class="commandline">lamboot</span> utilizes SLURM's +<span class="commandline">srun</span> command to launch these daemons. +Do not directly execute the <span class="commandline">srun</span> command +to launch LAM/MPI tasks. For example: <pre> adev0: srun -n16 -A # allocates resources and spawns shell for job adev0: lamboot @@ -207,13 +214,33 @@ etc. adev0: lamclean adev0: lamhalt adev0: exit # exits shell spawned by initial srun command -</pre> <p class="footer"><a href="#top">top</a></p></td> +</pre> <p class="footer"><a href="#top">top</a></p> + +<p><a href="http://www.hp.com/go/mpi">HP-MPI</a> uses the +<span class="commandline">mpirun</span> command with the <b>-srun</b> +option to launch jobs. For example: +<pre> +$MPI_ROOT/bin/mpirun -TCP -srun -N8 ./a.out +</pre></p> + +<p><a href="http://www.research.ibm.com/bluegene/">BlueGene MPI</a> relies +upon SLURM to create the resource allocation and then uses the native +<span class="commandline">mpirun</span> command to launch tasks. +Build a job script containing one or more invocations of the +<span class="commandline">mpirun</span> command. Then submit +the script to SLURM using <span class="commandline">srun</span> +command with the <b>--batch</b> option. For example: +<pre> +srun -N2 --batch my.script +</pre></p> + +</td> </tr> <tr> <td colspan="3"><hr> <p>For information about this page, contact <a href="mailto:slurm-dev@lists.llnl.gov">slurm-dev@lists.llnl.gov</a>.</p> <p><a href="http://www.llnl.gov/"><img align=middle src="lll.gif" width="32" height="32" border="0"></a></p> <p class="footer">UCRL-WEB-207187<br> -Last modified January 15, 2004</p></td> +Last modified 27 January 2005</p></td> </tr> </table> </td> diff --git a/doc/html/quickstart_admin.html b/doc/html/quickstart_admin.html index 4bbb9ee437c..a00dd468eb3 100644 --- a/doc/html/quickstart_admin.html +++ b/doc/html/quickstart_admin.html @@ -9,7 +9,7 @@ <meta http-equiv="keywords" content="Simple Linux Utility for Resource Management, SLURM, resource management, Linux clusters, high-performance computing, Livermore Computing"> <meta name="LLNLRandR" content="UCRL-WEB-204324"> -<meta name="LLNLRandRdate" content="11 November 2004"> +<meta name="LLNLRandRdate" content="27 January 2005"> <meta name="distribution" content="global"> <meta name="description" content="Simple Linux Utility for Resource Management"> <meta name="copyright" @@ -138,18 +138,24 @@ auth/munge plugin for communication, the clocks on all nodes will need to be synchronized. </p> <h4>MPI support</h4> -<p>Quadrics MPI works directly with SLURM on systems having Quadrics interconnects. -For non-Quadrics interconnect systems, <a href="http://www.lam-mpi.org/">LAM/MPI</a> -is the preferred MPI infrastructure. LAM/MPI uses the command <i>lamboot</i> to -initiate job-specific daemons on each node using SLURM's +<p>Quadrics MPI works directly with SLURM on systems having Quadrics +interconnects and is the prefered version of MPI for those systems.</p> + +<p>For <a href="http://www.myricom.com/">Myrinet</a> systems, MPICH-GM +is prefered. In order to use MPICH-GM, set the <b>MpichGmDirectSupport</b> +and <b>KillTree </b> configuration parameters in slurm.conf.</p> + +<p>HP customers would be well served by using +<a href="http://www.hp.com/go/mpi">HP-MPI</a>.</p> + +<p>A good open-source MPI for use with SLURM is +<a href="http://www.lam-mpi.org/">LAM/MPI</a>. LAM/MPI uses the command +<i>lamboot</i> to initiate job-specific daemons on each node using SLURM's <span class="commandline">srun</span> command. This places all MPI processes in a process-tree under the control of the <b>slurmd</b> daemon. LAM/MPI version 7.1 or higher contains support for SLURM.</p> -<p>In order to use MPICH-GM, set the <b>MpichGmDirectSupport</b> and <b>KillTree</b> -configuration parameters.</p> - <p>Note that the ordering of tasks within an job's allocation matches that of nodes in the slurm.conf configuration file. SLURM presently lacks the ability to arbitrarily order tasks across nodes.</p> @@ -455,7 +461,7 @@ in the NEWS file. <td colspan="3"><hr> <p>For information about this page, contact <a href="mailto:slurm-dev@lists.llnl.gov">slurm-dev@lists.llnl.gov</a>.</p> <p><a href="http://www.llnl.gov/"><img align=middle src="lll.gif" width="32" height="32" border="0"></a></p> <p class="footer">UCRL-WEB-207187<br> -Last modified 11 November 2004</p></td> +Last modified 27 January 2005</p></td> </tr> </table> </td> diff --git a/doc/html/team.html b/doc/html/team.html index 5ca5da31086..0bcd439f573 100644 --- a/doc/html/team.html +++ b/doc/html/team.html @@ -9,7 +9,7 @@ <meta http-equiv="keywords" content="Simple Linux Utility for Resource Management, SLURM, resource management, Linux clusters, high-performance computing, Livermore Computing"> <meta name="LLNLRandR" content="UCRL-WEB-204324"> -<meta name="LLNLRandRdate" content="24 January 2005"> +<meta name="LLNLRandRdate" content="27 January 2005"> <meta name="distribution" content="global"> <meta name="description" content="Simple Linux Utility for Resource Management"> <meta name="copyright" @@ -58,12 +58,12 @@ Livermore National Laboratory</a> (LLNL) and The primary SLURM development staff includes: </p> <ul> <li>Morris Jette (LLNL, Project leader)</li> +<li>Danny Auble (LLNL)</li> <li>Mark Grondona (LLNL)</li> <li>Jay Windley (Linux NetworX)</li> </ul> <p> SLURM contributers include: <ul> -<li>Danny Auble (LLNL)</li> <li>Susanne Balle (HP)</li> <li>Chris Dunlap (LLNL)</li> <li>Joey Eckstrom (LLNL/Bringham Young University)</li> @@ -79,7 +79,7 @@ The primary SLURM development staff includes: </p> <td colspan="3"><hr> <p>For information about this page, contact <a href="mailto:slurm-dev@lists.llnl.gov">slurm-dev@lists.llnl.gov</a>.</p> <p><a href="http://www.llnl.gov/"><img align=middle src="lll.gif" width="32" height="32" border="0"></a></p> <p class="footer">UCRL-WEB-207187<br> -Last modified 24 January 2005</p></td> +Last modified 27 January 2005</p></td> </tr> </table> </td> -- GitLab