From ff4515742ea997289f89ba09134fa6177cd4b989 Mon Sep 17 00:00:00 2001 From: Mark Grondona <mgrondona@llnl.gov> Date: Mon, 10 May 2004 23:12:19 +0000 Subject: [PATCH] o many updates to quickstart guide for admins --- doc/html/quickstart_admin.html | 115 ++++++++++++++++++++++----------- 1 file changed, 77 insertions(+), 38 deletions(-) diff --git a/doc/html/quickstart_admin.html b/doc/html/quickstart_admin.html index 60f287d3abe..2917d2e608a 100644 --- a/doc/html/quickstart_admin.html +++ b/doc/html/quickstart_admin.html @@ -55,6 +55,52 @@ structure:Laboratories and Other Field Facilities"> <h3>Overview</h3> Please see the <a href="quickstart.html">Quick Start User Guide</a> for a general overview. + +<h3>Building and Installing</h3> +<p>Basic instructions to build and install SLURM from source are shown below. +See the README and INSTALL files in the source distribution for more details. +</p> +<ol> +<li><span class="commandline">cd</span> to the directory containing the SLURM +source and type <i>.</i><span class="commandline">/configure</span> with appropriate +options.</li> +<li>Type <span class="commandline">make</span> to compile SLURM.</li> +<li> Type <span class="commandline">make install</span> to install the programs, +documentation, libaries, header files, etc.</li> +</ol> +<p>The most commonly used arguments to the <span class="commandline">configure</span> +command include: </p> +<p style="margin-left:.2in"><span class="commandline">--enable-debug</span><br> +Enable debugging of individual modules.</p> +<p style="margin-left:.2in"><span class="commandline">--prefix=<i>PREFIX</i></span><br> +</i> +Install architecture-independent files in PREFIX; default value is /usr/local.</p> +<p style="margin-left:.2in"><span class="commandline">--sysconfdir=<i>DIR</i></span><br> +</i> +Specify location of SLURM configuration file.</p> +<p>Optional SLURM plugins will be built automatically when the +<span class="commandline">configure</span> script detects that the required +build requirements are present. Build dependencies for various plugins +are denoted below. +</p> +<ul> +<li> <b>Munge</b> The auth/munge plugin will be built if Chris Dunlap's Munge + library is installed. </li> +<li> <b>Authd</b> The auth/authd plugin will be built and installed if + the libauth library and its dependency libe are installed. + </li> +<li> <b>QsNet</b> QsNet support in the form of the switch/elan plugin requires + that the qsnetlibs package (from Quadrics) be installed along + with its development counterpart (i.e. the qsnetheaders + package.) The switch/elan plugin also requires the + presence of the libelanosts library and /etc/elanhosts + configuration file. (See elanhosts(5) man page in that + package for more details) +</ul> +Please see the <a href=download.html>Download</a> page for references to +required software to build these plugins. +<p class="footer"><a href="#top">top</a></p> + <h3>Daemons</h3> <p><b>slurmctld</b> is sometimes called the "controller" daemon. It orchestrates SLURM activities, including queuing of job, monitoring node state, @@ -71,15 +117,27 @@ shell daemon to export control to SLURM. Because slurmd initiates and manages user jobs, it must execute as the user root.</p> <p><b>slurmctld</b> and/or <b>slurmd</b> should be initiated at node startup time per the SLURM configuration.</p> + <h3>Infrastructure</h3> -<p>All communications between SLURM components are authenticated. The authentication -infrastructure used is specified in the SLURM configuration file and options include: +<h4>Authentication of SLURM communications</h4> +<p>All communications between SLURM components are authenticated. The +authentication infrastructure is provided by a dynamically loaded +plugin chosen at runtame via the <b>AuthType</b> keyword in the SLURM +configuration file. Currently available authentication types include <a href="http://www.theether.org/authd/">authd</a>, <a href="ftp://ftp.llnl.gov/pub/linux/munge/">munge</a>, and none. The default authentication infrastructure is "none". This permits any user to execute any job as another user. This may be fine for testing purposes, but certainly not for production use. <b>Configure some AuthType value other than "none" if you want any security.</b> -We recommend the use of munge unless you are experience with authd.</p> +We recommend the use of Munge unless you are experienced with authd. +</p> +<p>While SLURM itself does not rely upon synchronized clocks on all nodes +of a cluster for proper operation, its underlying authentication mechanism +may have this requirement. For instance, if SLURM is making use of the +auth/munge plugin for communication, the clocks on all nodes will need to +be synchronized. +</p> +<h4>MPI support</h4> <p>Quadrics MPI works directly with SLURM on systems having Quadrics interconnects. For non-Quadrics interconnect systems, <a href="http://www.lam-mpi.org/">LAM/MPI</a> is the preferred MPI infrastructure. LAM/MPI uses the command <i>lamboot</i> to @@ -87,6 +145,7 @@ initiate job-specific daemons on each node using SLURM's <span class="commandlin command. This places all MPI processes in a process-tree under the control of the <b>slurmd</b> daemon. LAM/MPI version 7.1 or higher contains support for SLURM.</p> +<h4>Scheduler support</h4> <p>SLURM's default scheduler is FIFO (First-In First-Out). A backfill scheduler plugin is also available. Backfill scheduling will initiate a lower-priority job if doing so does not delay the expected initiation time of higher priority jobs; @@ -96,43 +155,23 @@ scheduling algorithms to control SLURM's workload. Motivated users can even deve their own scheduler plugin if so desired. </p> <p>SLURM uses the syslog function to record events. It uses a range of importance levels for these messages. Be certain that your system's syslog functionality -is operational. </p> -<p>There is no necessity for synchronized clocks on the nodes. Events occur either -in real-time or based upon message traffic. However, synchronized clocks will -permit easier analysis of SLURM logs from multiple nodes.</p> -<p class="footer"><a href="#top">top</a></p> +is operational. +</p> +<h4>Corefile format</h4> <p>SLURM is designed to support generating a variety of core file formats for -application codes that fail (see the <i>--core</i> option of the <i>srun</i>n -command). Of particular interest, LLNL has developed a light-weight core file -library to log traceback information. We expect to make this library available -to others at some point in the future.</p> - -<h3>Building and Installing</h3> -<p>Basic instructions to build and install SLURM are shown below. See the INSTALL -file for more details. </p> -<ol> -<li><span class="commandline">cd</span> to the directory containing the SLURM -source and type <i>.</i><span class="commandline">/configure</span> with appropriate -options.</li> -<li>Type <span class="commandline">make</span> to compile SLURM.</li> -<li> Type <span class="commandline">make install</span> to install the programs, -documentation, libaries, header files, etc.</li> -</ol> -<p>The most commonly used arguments to the <span class="commandline">configure</span> -command include: </p> -<p style="margin-left:.2in"><span class="commandline">--enable-debug</span><br> -Enable debugging of individual modules.</p> -<p style="margin-left:.2in"><span class="commandline">--prefix=<i>PREFIX</i></span><br> -</i> -Install architecture-independent files in PREFIX; default value is /usr/local.</p> -<p style="margin-left:.2in"><span class="commandline">--sysconfdir=<i>DIR</i></span><br> -</i> -Specify location of SLURM configuration file.</p> -<p style="margin-left:.2in"><span class="commandline">--with-totalview</span><br> -Compile with support for the TotalView debugger -(see <a href="http://www.etnus.com/">http://www.etnus.com</a>). -The kernel patch in <b>etc/ptrace.patch</b> may also be required.</p> +application codes that fail (see the <i>--core</i> option of the <i>srun</i> +command). As of now, SLURM only supports a locally developed lightweight +corefile library which has not yet been released to the public. It is +expected that this library will be available in the near future. +</p> +<h4>Parallel debugger support</h4> +<p>SLURM exports information for parallel debuggers using the specification +detailed <a href=http://www-unix.mcs.anl.gov/mpi/mpi-debug/mpich-attach.txt>here</a>. +This is meant to be exploited by any parallel debugger (notably, TotalView), +and support is unconditionally compiled into SLURM code. +</p> <p class="footer"><a href="#top">top</a></p> + <h3>Configuration</h3> <p>The SLURM configuration file includes a wide variety of parameters. This configuration file must be available on each node of the cluster. A full -- GitLab