Skip to content
Snippets Groups Projects
quickstart_admin.shtml 27.92 KiB
<!--#include virtual="header.txt"-->

<h1>Quick Start Administrator Guide</h1>
<h2>Overview</h2>
Please see the <a href="quickstart.html">Quick Start User Guide</a> for a general 
overview. 
 
<h2>Building and Installing</h2>

<p>Instructions to build and install SLURM manually are shown below. 
See the README and INSTALL files in the source distribution for more details.
</p>
<ol>
<li><span class="commandline">gunzip</span> the distributed tar-ball and 
<span class="commandline">untar</span> the files.
<li><span class="commandline">cd</span> to the directory containing the SLURM 
source and type <i>.</i><span class="commandline">/configure</span> with appropriate 
options.</li>
<li>Type <span class="commandline">make</span> to compile SLURM.</li>
<li> Type <span class="commandline">make install</span> to install the programs, 
documentation, libaries, header files, etc.</li>
</ol>
<p>The most commonly used arguments to the <span class="commandline">configure</span> 
command include: </p>
<p style="margin-left:.2in"><span class="commandline">--enable-debug</span><br>
Enable additional debugging logic within SLURM.</p>
<p style="margin-left:.2in"><span class="commandline">--prefix=<i>PREFIX</i></span><br>
</i> 
Install architecture-independent files in PREFIX; default value is /usr/local.</p>
<p style="margin-left:.2in"><span class="commandline">--sysconfdir=<i>DIR</i></span><br>
</i> 
Specify location of SLURM configuration file. </p>

<p>If required libraries or header files are in non-standard locations, 
set CFLAGS and LDFLAGS environment variables accordingly.
Type <i>control --help</i> for a more complete description of options.
Optional SLURM plugins will be built automatically when the
<span class="commandline">configure</span> script detects that the required 
build requirements are present. Build dependencies for various plugins
are denoted below.
</p>
<ul>
<li> <b>Munge</b> The auth/munge plugin will be built if Chris Dunlap's Munge
                  library is installed. </li>
<li> <b>Authd</b> The auth/authd plugin will be built and installed if 
                  the libauth library and its dependency libe are installed. 
		  </li>
<li> <b>Federation</b> The switch/federation plugin will be built and installed
		  if the IBM Federation switch libary is installed.
<li> <b>QsNet</b> support in the form of the switch/elan plugin requires
                  that the qsnetlibs package (from Quadrics) be installed along
		  with its development counterpart (i.e. the qsnetheaders
		  package.) The switch/elan plugin also requires the 
		  presence of the libelanosts library and /etc/elanhosts
		  configuration file. (See elanhosts(5) man page in that
		  package for more details). Define the nodes in the SLURM
		  configuration file <i>slurm.conf</i> in the same order as 
		  defined in the <i>elanhosts</i> configuration file so that 
		  node allocation for jobs can be performed so as to optimize
		  their performance. We highly recommend assigning the nodes 
		  a numeric suffix equal to its Elan address for ease of 
		  administration and because the Elan driver does not seem 
		  to function otherwise
		  (e.g. /etc/elanhosts to contain two lines of this sort:<br>
		  eip  [0-15]  linux[0-15]<br>
		  eth  [0-15]  linux[0-15]<br>
		  for fifteen nodes with a prefix of &quot;linux&quot; and 
		  numeric suffix between zero and 15).  Finally, the 
		  &quot;ptrack&quot; kernel patch is required for process 
		  tracking.
</ul>
Please see the <a href=download.html>Download</a> page for references to
required software to build these plugins.</p>

<p>To build RPMs directly, copy the distributed tar-ball into the directory
<b>/usr/src/redhat/SOURCES</b> and execute a command of this sort (substitute
the appropriate SLURM version number):<br>
<span class="commandline">rpmbuild -ta slurm-0.5.0-1.tgz</span> or <br>
<span class="commandline">rpmbuild -ta slurm-0.6.0-1.tar.bz2</span>.</p>

<p>You can control some aspects of the RPM built with a <i>.rpmmacros</i>
file in your home directory. <b>Special macro definitions will likely 
only be required if files are installed in unconventional locations.</b>
Some macro definitions that may be used in building SLURM include:
<dl>
<dt>_enable_debug
<dd>Specify if debugging logic within SLURM is to be enabled
<dt>_prefix
<dd>Pathname of directory to contain the SLURM files
<dt>_sysconfdir
<dd>Pathname of directory containing the slurm.conf configuration file
<dt>with_munge
<dd>Specifies munge (authentication library) installation location
<dt>with_proctrack
<dd>Specifies AIX process tracking kernel extension header file location
<dt>with_ssl
<dd>Specifies SSL libary installation location
</dl>
To build SLURM on our AIX system, the following .rpmmacros file is used:
<pre>
# .rpmmacros
# For AIX at LLNL
# Override some RPM macros from /usr/lib/rpm/macros
# Set other SLURM-specific macros for unconventional file locations
#
%_enable_debug     "--with-debug"
%_prefix           /admin/llnl
%_sysconfdir       %{_prefix}/etc/slurm
%with_munge        "--with-munge=/admin/llnl"
%with_proctrack    "--with-proctrack=/admin/llnl/include"
%with_ssl          "--with-ssl=/opt/freeware"
</pre></p>

<p class="footer"><a href="#top">top</a></p>

<h2>Daemons</h2>
<p><b>slurmctld</b> is sometimes called the &quot;controller&quot; daemon. It 
orchestrates SLURM activities, including queuing of job, monitoring node state, 
and allocating resources (nodes) to jobs. There is an optional backup controller 
that automatically assumes control in the event the primary controller fails. 
The primary controller resumes control whenever it is restored to service. The 
controller saves its state to disk whenever there is a change.
This state can be recovered by the controller at startup time.
State changes are saved so that jobs and other state can be preserved when 
controller moves (to or from backup controller) or is restarted.</p>

<p>We recommend that you create a Unix user <i>slurm</i> for use by 
<b>slurmctld</b>. This user name will also be specified using the 
<b>SlurmUser</b> in the slurm.conf configuration file.
Note that files and directories used by <b>slurmctld</b> will need to be 
readable or writable by the user <b>SlurmUser</b> (the slurm configuration 
files must be readable; the log file directory and state save directory 
must be writable).</p>

<p>The <b>slurmd</b> daemon executes on every compute node. It resembles a remote 
shell daemon to export control to SLURM. Because slurmd initiates and manages 
user jobs, it must execute as the user root.</p>

<p><b>slurmctld</b> and/or <b>slurmd</b> should be initiated at node startup time 
per the SLURM configuration.
A file <b>etc/init.d/slurm</b> is provided for this purpose. 
This script accepts commands <b>start</b>, <b>startclean</b> (ignores 
all saved state), <b>restart</b>, and <b>stop</b>.</p>

<h2>Infrastructure</h2>
<h3>Authentication of SLURM communications</h3>
<p>All communications between SLURM components are authenticated. The 
authentication infrastructure is provided by a dynamically loaded
plugin chosen at runtame via the <b>AuthType</b> keyword in the SLURM 
configuration file.  Currently available authentication types include
<a href="http://www.theether.org/authd/">authd</a>, 
<a href="ftp://ftp.llnl.gov/pub/linux/munge/">munge</a>, and none.
The default authentication infrastructure is "none". This permits any user to execute 
any job as another user. This may be fine for testing purposes, but certainly not for production 
use. <b>Configure some AuthType value other than "none" if you want any security.</b>
We recommend the use of Munge unless you are experienced with authd.
</p>
<p>While SLURM itself does not rely upon synchronized clocks on all nodes
of a cluster for proper operation, its underlying authentication mechanism 
may have this requirement. For instance, if SLURM is making use of the
auth/munge plugin for communication, the clocks on all nodes will need to 
be synchronized. </p>

<h3>MPI support</h3>
<p>Quadrics MPI works directly with SLURM on systems having Quadrics 
interconnects and is the preferred version of MPI for those systems.
Set the <b>MpiDefault=none</b> configuration parameter in slurm.conf.</p>

<p>For <a href="http://www.myricom.com/">Myrinet</a> systems, MPICH-GM
is preferred. In order to use MPICH-GM, set <b>MpiDefault=mpichgm</b> and 
<b>ProctrackType=proctrack/linuxproc</b> configuration parameters in 
slurm.conf.</p>

<p>HP customers would be well served by using 
<a href="http://www.hp.com/go/mpi">HP-MPI</a>.</p>

<p>A good open-source MPI for use with SLURM is 
<a href="http://www.lam-mpi.org/">LAM MPI</a>. LAM MPI uses the command 
<i>lamboot</i> to initiate job-specific daemons on each node using SLURM's 
<span class="commandline">srun</span> 
command. This places all MPI processes in a process-tree under the control of 
the <b>slurmd</b> daemon. LAM/MPI version 7.1 or higher contains support for 
SLURM. 
Set the <b>MpiDefault=none</b> configuration parameters in slurm.conf.
LAM MPI will explicitly set the mpi plugin type to "lam" on the  
<span class="commandline">srun</span> execute line as needed.</p>

<p>Another good open-source MPI for use with SLURM is
<a href="http://www.open-mpi.org/">Open MPI</a>. Open MPI initiates its 
processes using SLURM's <span class="commandline">srun</span> 
command. 
Set the <b>MpiDefault=none</b> configuration parameters in slurm.conf.
Open MPI will explicitly set the mpi plugin type to "lam" on the
<span class="commandline">srun</span> execute line as needed.</p>

<p>Note that the ordering of tasks within an job's allocation matches that of 
nodes in the slurm.conf configuration file. SLURM presently lacks the ability 
to arbitrarily order tasks across nodes.</p> 

<h3>Scheduler support</h3>
<p>The scheduler used by SLURM is controlled by the <b>SchedType</b> configuration 
parameter. This is meant to control the relative importance of pending jobs.
SLURM's default scheduler is FIFO (First-In First-Out). A backfill scheduler 
plugin is also available. Backfill scheduling will initiate a lower-priority job 
if doing so does not delay the expected initiation time of higher priority jobs; 
essentially using smaller jobs to fill holes in the resource allocation plan. 
SLURM also supports a plugin for use of 
<a href="http://www.clusterresources.com/pages/products/maui-cluster-scheduler.php">
The Maui Scheduler</a> or 
<a href="http://www.clusterresources.com/pages/products/moab-cluster-suite.php">
Moab Cluster Suite</a> which offer sophisticated scheduling algorithms. 
Motivated users can even develop their own scheduler plugin if so desired. </p>

<h3>Node selection</h3>
<p>The node selection mechanism used by SLURM is controlled by the 
<b>SelectType</b> configuration parameter. 
If you want to execute multiple jobs per node, but apportion the processors, 
memory and other resources, the <i>cons_res</i> (consumable resources) 
plugin is recommended.
If you tend to dedicate entire nodes to jobs, the <i>linear</i> plugin 
is recommended.
For more information, please see 
<a href="cons_res.html">Consumable Resources in SLURM</a>. 
For BlueGene systems, <i>bluegene</i> plugin is required (it is topology 
aware and interacts with the BlueGene bridge API).</p>

<h3>Logging</h3>
<p>SLURM uses the syslog function to record events. It uses a range of importance
levels for these messages. Be certain that your system's syslog functionality
is operational. </p>

<h3>Corefile format</h3>
<p>SLURM is designed to support generating a variety of core file formats for 
application codes that fail (see the <i>--core</i> option of the <i>srun</i>
command).  As of now, SLURM only supports a locally developed lightweight
corefile library which has not yet been released to the public. It is 
expected that this library will be available in the near future. </p>

<h3>Parallel debugger support</h3>
<p>SLURM exports information for parallel debuggers using the specification
detailed  <a href=http://www-unix.mcs.anl.gov/mpi/mpi-debug/mpich-attach.txt>here</a>.
This is meant to be exploited by any parallel debugger (notably, TotalView),
and support is unconditionally compiled into SLURM code. 
</p>
<p>We use a patched version of TotalView that looks for a "totalview_jobid" 
symbol in <b>srun</b> that it then uses (configurably) to perform a bulk 
launch of the <b>tvdsvr</b> daemons via a subsequent <b>srun</b>. Otherwise
it is difficult to get TotalView to use <b>srun</b> for a bulk launch, since 
<b>srun</b> will be unable to determine for which job it is launching tasks.
</p>
<p>Another solution would be to run TotalView within an existing <b>srun</b>
<i>--allocate</i> session. Then the Totalview bulk launch command to <b>srun</b>
could be set to ensure only a single task per node. This functions properly 
because the SLRUM_JOBID environment variable is set in the allocation shell 
environment.
</p>

<h3>Compute node access</h3>
<p>SLURM does not by itself limit access to allocated compute nodes, 
but it does provide mechanisms to accomplish this. 
There is a Pluggable Authentication Module (PAM) for restricting access 
to compute nodes available for download. 
When installed, the SLURM PAM module will prevent users from logging 
into any node that has not be assigned to that user.
On job termination, any processes initiated by the user outside of 
SLURM's control may be killed using an <i>Epilog</i> script configured 
in <i>slurm.conf</i>.
An example of such a script is included as <i>etc/slurm.epilog.clean</i>. 
Without these mechanisms any user can login to any compute node, 
even those allocated to other users.</p>

<p class="footer"><a href="#top">top</a></p>

<h2>Configuration</h2>
<p>The SLURM configuration file includes a wide variety of parameters. 
This configuration file must be available on each node of the cluster. A full
description of the parameters is included in the <i>slurm.conf</i> man page. Rather than 
duplicate that information, a minimal sample configuration file is shown below. 
Your slurm.conf file should define at least the configuration parameters defined 
in this sample and likely additional ones. Any text 
following a &quot;#&quot; is considered a comment. The keywords in the file are 
not case sensitive, although the argument typically is (e.g., &quot;SlurmUser=slurm&quot; 
might be specified as &quot;slurmuser=slurm&quot;). The control machine, like 
all other machine specifications, can include both the host name and the name 
used for communications. In this case, the host's name is &quot;mcri&quot; and 
the name &quot;emcri&quot; is used for communications. 
In this case &quot;emcri&quot; is the private management network interface 
for the host &quot;mcri&quot;. Port numbers to be used for 
communications are specified as well as various timer values.</p>

<p>A description of the nodes and their grouping into partitions is required. 
A simple node range expression may optionally be used to specify
ranges of nodes to avoid building a configuration file with large
numbers of entries. The node range expression can contain one
pair of square brackets with a sequence of comma separated
numbers and/or ranges of numbers separated by a &quot;-&quot;
(e.g. &quot;linux[0-64,128]&quot;, or &quot;lx[15,18,32-33]&quot;).
On BlueGene systems only, the square brackets should contain
pairs of three digit numbers separated by a &quot;x&quot;.
These numbers indicate the boundaries of a rectangular prism
(e.g. &quot;bgl[000x144,400x544]&quot;).
See our <a href="bluegene.html">Blue Gene User and Administrator Guide</a>
for more details.
Presently the numeric range must be the last characters in the
node name (e.g. &quot;unit[0-31]rack1&quot; is invalid).</p>

<p>Node names can have up to three name specifications: 
<b>NodeName</b> is the name used by all SLURM tools when referring to the node,
<b>NodeAddr</b> is the name or IP address SLURM uses to communicate with the node, and 
<b>NodeHostname</b> is the name returned by the command <i>/bin/hostname -s</i>.
Only <b>NodeName</b> is required (the others default to the same name), 
although supporting all three parameters provides complete control over 
naming and addressing the nodes.  See the <i>slurm.conf</i> man page for 
details on all configuration parameters.</p>

<p>Nodes can be in more than one partition and each partition can have different 
constraints (permitted users, time limits, job size limits, etc.).
Each partition can thus be considered a separate queue.
Partition and node specifications use node range expressions to identify 
nodes in a concise fashion. This configuration file defines a 1154-node cluster 
for SLURM, but it might be used for a much larger cluster by just changing a few 
node range expressions. Specify the minimum processor count (Procs), real memory 
space (RealMemory, megabytes), and temporary disk space (TmpDisk, megabytes) that 
a node should have to be considered available for use. Any node lacking these 
minimum configuration values will be considered DOWN and not scheduled.
Note that a more extensive sample configuration file is provided in
<b>etc/slurm.conf.example</b>. We also have a web-based 
<a href="configurator.html">configuration tool</a> which can 
be used to build a simple configuration file.</p>
<pre>
# 
# Sample /etc/slurm.conf for mcr.llnl.gov
#
ControlMachine=mcri   ControlAddr=emcri
BackupMachine=mcrj    BackupAddr=emcrj 
#
AuthType=auth/munge
Epilog=/usr/local/slurm/etc/epilog
FastSchedule=1
JobCompLoc=/var/tmp/jette/slurm.job.log
JobCompType=jobcomp/filetxt
JobCredentialPrivateKey=/usr/local/etc/slurm.key
JobCredentialPublicCertificate=/usr/local/etc/slurm.cert
PluginDir=/usr/local/slurm/lib/slurm
Prolog=/usr/local/slurm/etc/prolog
SchedulerType=sched/backfill
SelectType=select/linear
SlurmUser=slurm
SlurmctldPort=7002
SlurmctldTimeout=300
SlurmdPort=7003
SlurmdSpoolDir=/var/tmp/slurmd.spool
SlurmdTimeout=300
StateSaveLocation=/tmp/slurm.state
SwitchType=switch/elan
TreeWidth=50
#
# Node Configurations
#
NodeName=DEFAULT Procs=2 RealMemory=2000 TmpDisk=64000 State=UNKNOWN
NodeName=mcr[0-1151] NodeAddr=emcr[0-1151]
#
# Partition Configurations
#
PartitionName=DEFAULT State=UP    
PartitionName=pdebug Nodes=mcr[0-191] MaxTime=30 MaxNodes=32 Default=YES
PartitionName=pbatch Nodes=mcr[192-1151]
</pre> 
<h2>Security</h2>
<p>You will should create unique job credential keys for your site
using the program <a href="http://www.openssl.org/">openssl</a>. 
<b>You must use openssl and not ssh-genkey to construct these keys.</b>
An example of how to do this is shown below. Specify file names that 
match the values of <b>JobCredentialPrivateKey</b> and 
<b>JobCredentialPublicCertificate</b> in your configuration file. 
The <b>JobCredentialPrivateKey</b> file must be readable only by <b>SlurmUser</b>. 
The <b>JobCredentialPublicCertificate</b> file must be readable by all users. 
Both files must be available on all nodes in the cluster. 
These keys are used by <i>slurmctld</i> to construct a job credential, 
which is sent to <i>srun</i> and then forwarded to <i>slurmd</i> to 
initiate job steps.</p>

<p class="commandline" style="margin-left:.2in">openssl genrsa -out /usr/local/etc/slurm.key 
1024<br>
openssl rsa -in /usr/local/etc/slurm.key -pubout -out /usr/local/etc/slurm.cert 
</p>
<p>SLURM does not use reserved ports to authenticate communication between 
components, but relies upon an external entity to determine the user who 
initiated a request.  
You must specify one &quot;auth&quot; plugin for this purpose. 
Currently, only three 
authentication plugins are supported: <b>auth/none</b>, <b>auth/authd</b>, and 
<b>auth/munge</b>. The auth/none plugin is built and used by default, but either 
Brent Chun's <a href="http://www.theether.org/authd/">authd</a>, or Chris Dunlap's 
<a href="ftp://ftp.llnl.gov/pub/linux/munge/">munge</a> should be installed in order to 
get properly authenticated communications. 
Unless you are experience with authd, we recommend the use of munge.
The configure script in the top-level directory of this distribution will determine 
which authentication plugins may be built. The configuration file specifies which 
of the available plugins will be utilized. </p>

<p>A PAM module (Pluggable Authentication Module) is available for SLURM that 
can prevent a user from accessing a node which he has not been allocated, if that 
mode of operation is desired.</p>
<p class="footer"><a href="#top">top</a></p>

<h2>Starting the Daemons</h2>
<p>For testing purposes you may want to start by just running slurmctld and slurmd 
on one node. By default, they execute in the background. Use the <span class="commandline">-D</span> 
option for each daemon to execute them in the foreground and logging will be done 
to your terminal. The <span class="commandline">-v</span> option will log events 
in more detail with more v's increasing the level of detail (e.g. <span class="commandline">-vvvvvv</span>). 
You can use one window to execute <span class="commandline">slurmctld -D -vvvvvv</span>, 
a second window to execute <span class="commandline">slurmd -D -vvvvv</span>.
You may see errors such as "Connection refused" or "Node X not responding" 
while one daemon is operative and the other is being started, but the 
daemons can be started in any order and proper communications will be 
established once both daemons complete initialization. 
You can use a third window to execute commands such as 
<span class="commandline">srun -N1 /bin/hostname</span> to confirm 
functionality.</p>

<p>Another important option for the daemons is <span class="commandline">-c</span> 
to clear previous state information. Without the <span class="commandline">-c</span> 
option, the daemons will restore any previously saved state information: node 
state, job state, etc. With the <span class="commandline">-c</span> option all 
previously running jobs will be purged and node state will be restored to the 
values specified in the configuration file. This means that a node configured 
down manually using the <span class="commandline">scontrol</span> command will 
be returned to service unless also noted as being down in the configuration file. 
In practice, SLURM restarts with preservation consistently.</p>
<p>A thorough battery of tests written in the &quot;expect&quot; language is also 
available. </p>
<p class="footer"><a href="#top">top</a></p>

<h2>Administration Examples</h2>
<p><span class="commandline">scontrol</span> can be used to print all system information 
and modify most of it. Only a few examples are shown below. Please see the scontrol 
man page for full details. The commands and options are all case insensitive.</p>
<p>Print detailed state of all jobs in the system.</p>
<pre>
adev0: scontrol
scontrol: show job
JobId=475 UserId=bob(6885) Name=sleep JobState=COMPLETED
   Priority=4294901286 Partition=batch BatchFlag=0
   AllocNode:Sid=adevi:21432 TimeLimit=UNLIMITED
   StartTime=03/19-12:53:41 EndTime=03/19-12:53:59
   NodeList=adev8 NodeListIndecies=-1
   ReqProcs=0 MinNodes=0 Shared=0 Contiguous=0
   MinProcs=0 MinMemory=0 Features=(null) MinTmpDisk=0
   ReqNodeList=(null) ReqNodeListIndecies=-1

JobId=476 UserId=bob(6885) Name=sleep JobState=RUNNING
   Priority=4294901285 Partition=batch BatchFlag=0
   AllocNode:Sid=adevi:21432 TimeLimit=UNLIMITED
   StartTime=03/19-12:54:01 EndTime=NONE
   NodeList=adev8 NodeListIndecies=8,8,-1
   ReqProcs=0 MinNodes=0 Shared=0 Contiguous=0
   MinProcs=0 MinMemory=0 Features=(null) MinTmpDisk=0
   ReqNodeList=(null) ReqNodeListIndecies=-1
</pre> <p>Print the detailed state of job 477 and change its priority to 
zero. A priority of zero prevents a job from being initiated (it is held in &quot;pending&quot; 
state).</p>
<pre>
adev0: scontrol
scontrol: show job 477
JobId=477 UserId=bob(6885) Name=sleep JobState=PENDING
   Priority=4294901286 Partition=batch BatchFlag=0
   <i>more data removed....</i>
scontrol: update JobId=477 Priority=0
</pre> 
<p class="footer"><a href="#top">top</a></p>
<p>Print the state of node adev13 and drain it. To drain a node specify a new 
state of DRAIN, DRAINED, or DRAINING. SLURM will automatically set it to the appropriate 
value of either DRAINING or DRAINED depending on whether the node is allocated 
or not. Return it to service later.</p>
<pre>
adev0: scontrol
scontrol: show node adev13
NodeName=adev13 State=ALLOCATED CPUs=2 RealMemory=3448 TmpDisk=32000
   Weight=16 Partition=debug Features=(null) 
scontrol: update NodeName=adev13 State=DRAIN
scontrol: show node adev13
NodeName=adev13 State=DRAINING CPUs=2 RealMemory=3448 TmpDisk=32000
   Weight=16 Partition=debug Features=(null) 
scontrol: quit
<i>Later</i>
adev0: scontrol 
scontrol: show node adev13
NodeName=adev13 State=DRAINED CPUs=2 RealMemory=3448 TmpDisk=32000
   Weight=16 Partition=debug Features=(null) 
scontrol: update NodeName=adev13 State=IDLE
</pre> <p>Reconfigure all SLURM daemons on all nodes. This should 
be done after changing the SLURM configuration file.</p>
<pre>
adev0: scontrol reconfig
</pre> <p>Print the current SLURM configuration. This also reports if the 
primary and secondary controllers (slurmctld daemons) are responding. To just 
see the state of the controllers, use the command <span class="commandline">ping</span>.</p>
<pre>
adev0: scontrol show config
Configuration data as of 03/19-13:04:12
AuthType          = auth/munge
BackupAddr        = eadevj
BackupController  = adevj
ControlAddr       = eadevi
ControlMachine    = adevi
Epilog            = (null)
FastSchedule      = 1
FirstJobId        = 1
InactiveLimit     = 0
JobCompLoc        = /var/tmp/jette/slurm.job.log
JobCompType       = jobcomp/filetxt
JobCredPrivateKey = /etc/slurm/slurm.key
JobCredPublicKey  = /etc/slurm/slurm.cert
KillWait          = 30
MaxJobCnt         = 2000
MinJobAge         = 300
PluginDir         = /usr/lib/slurm
Prolog            = (null)
ReturnToService   = 1
SchedulerAuth     = (null)
SchedulerPort     = 65534
SchedulerType     = sched/backfill
SlurmUser         = slurm(97)
SlurmctldDebug    = 4
SlurmctldLogFile  = /tmp/slurmctld.log
SlurmctldPidFile  = /tmp/slurmctld.pid
SlurmctldPort     = 7002 
SlurmctldTimeout  = 300
SlurmdDebug       = 65534
SlurmdLogFile     = /tmp/slurmd.log
SlurmdPidFile     = /tmp/slurmd.pid
SlurmdPort        = 7003
SlurmdSpoolDir    = /tmp/slurmd
SlurmdTimeout     = 300
TreeWidth         = 50
JobAcctLogFile    = /tmp/jobacct.log
JobAcctFrequncy   = 5
JobAcctType       = jobacct/linux
SLURM_CONFIG_FILE = /etc/slurm/slurm.conf
StateSaveLocation = /usr/local/tmp/slurm/adev
SwitchType        = switch/elan
TmpFS             = /tmp
WaitTime          = 0

Slurmctld(primary/backup) at adevi/adevj are UP/UP
</pre> <p>Shutdown all SLURM daemons on all nodes.</p>
<pre>
adev0: scontrol shutdown
</pre> <p class="footer"><a href="#top">top</a></p>

<h2>Testing</h2>
<p>An extensive test suite is available within the SLURM distribution 
in <i>testsuite/expect</i>. 
There are about 250 tests which will execute on the order of 2000 jobs 
and 4000 job steps. 
Depending upon your system configuration and performance, this test 
suite will take roughly 40 minutes to complete.
The file <i>testsuite/expect/globals</i> contains default paths and
procedures for all of the individual tests.  You will need to edit this
file to specify where SLURM and other tools are installed.
Set your working directory to <i>testsuite/expect</i> before 
starting these tests.
Tests may be executed individually by name (e.g.  <i>test1.1</i>) 
or the full test suite may be executed with the single command 
<i>regression</i>.
See <i>testsuite/expect/README</i> for more information.</p>

<h2>Upgrades</h2>
<p>When upgrading to a new major or minor release of SLURM (e.g. 1.1.x to 1.2.x) 
all running and pending jobs will be purged due to changes in state save 
information. It is possible to develop software to translate state information 
between versions, but we do not normally expect to do so.
When upgrading to a new micro release of SLURM (e.g. 1.2.1 to 1.2.2) all
running and pending jobs will be preserved. Just install a new version of
SLURM and restart the daemons.
An exception to this is that jobs may be lost when installing new pre-release 
versions (e.g. 1.3.0-pre1 to 1.3.0-pre2). We'll try to note these cases 
in the NEWS file.

</pre> <p class="footer"><a href="#top">top</a></p>

<p style="text-align:center;">Last modified 29 September 2006</p>

<!--#include virtual="footer.txt"-->