Skip to content
Snippets Groups Projects
Commit fd8a5cc5 authored by Moe Jette's avatar Moe Jette
Browse files

more updates for slurm v1.3

parent 6a599954
No related branches found
No related tags found
No related merge requests found
......@@ -74,13 +74,35 @@ The default value of PMI_TIME is 500 and this is the number of
microseconds alloted to transmit each key-pair.
We have executed up to 16,000 tasks with a value of PMI_TIME=4000.</p>
<h2>Limits</h2>
<p>The individual slurmd daemons on compute nodes will initiate messages
to the slurmctld daemon only when they start up or when the epilog
completes for a job. When a job allocated a large number of nodes
completes, it can cause a very large number of messages to be sent
by the slurmd daemons on these nodes to the slurmctld daemon all at
the same time. In order to spread this message traffic out over time
and avoid message loss, The <i>EpilogMsgTime</i> parameter may be
used. Note that even if messages are lost, they will be retransmitted,
but this will result in a delay for reallocating resources to new jobs.</p>
<h2>Other</h2>
<p>SLURM uses hierarchical communications between the slurmd daemons
in order to increase parallelism and improve performance. The
<i>TreeWidth</i> configuration parameter controls the fanout of messages.
The default value is 50, meaning each slurmd daemon can communicate
with up to 50 other slurmd daemons and over 2500 nodes can be contacted
with two message hops.
The default value will work well for most clusters.
Optimal system performance can typically be achieved if <i>TreeWidth</i>
is set to the square root of the number of nodes in the cluster for
systems having no more than 2500 nodes or the cube root for larger
systems.</p>
<p>The srun command automatically increases its open file limit to
the hard limit in order to process all of the standard input and output
connections to the launched tasks. It is recommended that you set the
open file hard limit to 8192 across the cluster.</p>
<p style="text-align:center;">Last modified 29 January 2008</p>
<p style="text-align:center;">Last modified 11 March 2008</p>
<!--#include virtual="footer.txt"-->
......@@ -7,7 +7,7 @@ SLURM source can be downloaded from <br>
https://sourceforge.net/projects/slurm/</a><br>
There is also a Debian package named <i>slurm-llnl</i> available at <br>
<a href="http://www.debian.org/">http://www.debian.org/</a><br>
The latest stable release of SLURM is version 1.2.</p>
The latest stable release of SLURM is version 1.3.</p>
<p> Other software available for download includes
<ul>
......@@ -26,12 +26,19 @@ The latest stable release is version 1.4.</p>
<h1>Related Software</h1>
<ul>
<li><b>OpenSSL</b> is recommended for secure communications between SLURM
components. Download it from
<a href="http://www.openssl.org/">http://www.openssl.org/</a>.
</li>
<li>Digital signatures (Cypto plugin) are used to insure message are not altered.</li>
<ul>
<li><b>OpenSSL</b><br>
OpenSSL is recommended for generation of digital signatures.
Download it from <a href="http://www.openssl.org/">http://www.openssl.org/</a>.</li>
<li><b>Munge</b><br>
Munge can be used at an alternative to OpenSSL.
Munge is available under the Gnu General Public License, but is slower than OpenSSL
for the generation of digital signatures. Munge is available from
<a href="http://home.gna.org/munge/">http://home.gna.org/munge/</a>.</li>
</ul>
<li>Authentication plugins</li>
<li>Authentication plugins identifies the user originating a message.</li>
<ul>
<li><b>Munge</b><br>
In order to compile the "auth/munge" authentication plugin for SLURM, you will need
......@@ -68,7 +75,7 @@ https://sourceforge.net/projects/slurm/</a>.
<li><a href="http://www.quadrics.com/">Quadrics MPI</a></li>
</ul>
<li>Schedulers</li>
<li>Schedulers offering greater control over the workload</li>
<ul>
<li><a href="http://www.platform.com/">Load Sharing Facility (LSF)</a></li>
<li><a href="http://www.clusterresources.com/pages/products/maui-cluster-scheduler.php">
......@@ -77,6 +84,13 @@ Maui Scheduler</a></li>
Moab Cluster Suite</a></li>
</ul>
<li>Database tools available for managing accounting data</li>
<ul>
<li><a href="http://www.clusterresources.com/pages/products/gold-allocation-manager.php">Gold</a></li>
<li><a href="http://www.mysql.com/">MySQL</a></li>
<li><a href="http://www.postgresql.org/">PostregreSQL</a><li>
</ul>
<li>Task Affinity plugins</li>
<ul>
<li><a href="http://www.open-mpi.org/software/plpa/">
......@@ -85,6 +99,6 @@ Portable Linux Processor Affinity (PLPA)</a></li>
</ul>
<p style="text-align:center;">Last modified 6 December 2007</p>
<p style="text-align:center;">Last modified 11 March 2008</p>
<!--#include virtual="footer.txt"-->
......@@ -6,7 +6,7 @@
<ul>
<li><a href="#11">SLURM Version 1.1, May 2006</a></li>
<li><a href="#12">SLURM Version 1.2, February 2007</a></li>
<li><a href="#13">SLURM Version 1.3, Winter 2007</a></li>
<li><a href="#13">SLURM Version 1.3, March 2008</a></li>
<li><a href="#14">SLURM Version 1.4 and beyond</a></li>
</ul>
......@@ -65,16 +65,18 @@ task launch directly from the <i>srun</i> command.</li>
</ul>
<h2><a name="13">Major Updates in SLURM Version 1.3</a></h2>
<p>SLURM Version 1.3 is scheduled for release in the Winter of 2007.
<p>SLURM Version 1.3 was relased in March 2008.
Major enhancements include:
<ul>
<li>Job accounting and completion data stored in a database
<li>Job accounting and completion data can be stored in a database
(MySQL, PGSQL or simple text file).</li>
<li>SlurmDBD (Slurm Database Deamon) introduced to provide secure
database support across multiple clusters.</li>
<li>Gang scheduler plugin added (time-slicing of parallel jobs
without an external scheduler).</li>
<li>Cryptography logic moved to a separate plugin with the
option of using OpenSSL (default) or Munge (GPL).</li>
<li>Improved scheduling of multple job steps within a job's allocation.</li>
<li>Gang scheduling of jobs (time-slicing of parallel jobs
without an external scheduler).</li>
<li>Support for job specification of node features with node counts.</li>
<li><i>srun</i>'s --alloc, --attach, and --batch options removed (use
<i>salloc</i>, <i>sattach</i> or <i>sbatch</i> commands instead).</li>
......@@ -95,6 +97,6 @@ to coordinate activies. Future development plans includes:
and refresh.</li>
</ul>
<p style="text-align:center;">Last modified 27 November 2007</p>
<p style="text-align:center;">Last modified 11 March 2008</p>
<!--#include virtual="footer.txt"-->
......@@ -56,7 +56,7 @@ building block approach. These plugins presently include:
<li><a href="jobacct_storageplugins.html">Job Accounting Storage</a>:
text file (default if jobacct_gather != none),
<a href="http://www.clusterresources.com/pages/products/gold-allocation-manager.ph">Gold</a>
<a href="http://www.clusterresources.com/pages/products/gold-allocation-manager.php">Gold</a>
MySQL, PGSQL, SlurmDBD (Slurm Database Daemon) or none</li>
<li><a href="jobcompplugins.html">Job completion logging</a>:
......
......@@ -134,7 +134,7 @@ Some macro definitions that may be used in building SLURM include:
# .rpmmacros
# For AIX at LLNL
# Override some RPM macros from /usr/lib/rpm/macros
# Set other SLURM-specific macros for unconventional file locations
# Set SLURM-specific macros for unconventional file locations
#
%_enable_debug "--with-debug"
%_prefix /admin/llnl
......@@ -208,17 +208,24 @@ For more information, see <a href="quickstart.html#mpi">MPI</a>.
<h3>Scheduler support</h3>
<p>The scheduler used by SLURM is controlled by the <b>SchedType</b> configuration
parameter. This is meant to control the relative importance of pending jobs.
SLURM's default scheduler is FIFO (First-In First-Out). A backfill scheduler
plugin is also available. Backfill scheduling will initiate a lower-priority job
parameter. This is meant to control the relative importance of pending jobs and
several options are available
SLURM's default scheduler is FIFO (First-In First-Out).
SLURM offers a backfill scheduling plugin.
Backfill scheduling will initiate a lower-priority jobs
if doing so does not delay the expected initiation time of higher priority jobs;
essentially using smaller jobs to fill holes in the resource allocation plan.
Effective backfill scheduling does require users to specify job time limits.
SLURM offers a gang scheduler plugin, which provides time slicing of parallel
jobs sharing the same nodes.
SLURM also supports a plugin for use of
<a href="http://www.clusterresources.com/pages/products/maui-cluster-scheduler.php">
The Maui Scheduler</a> or
<a href="http://www.clusterresources.com/pages/products/moab-cluster-suite.php">
Moab Cluster Suite</a> which offer sophisticated scheduling algorithms.
Motivated users can even develop their own scheduler plugin if so desired. </p>
Moab Cluster Suite</a> which offer sophisticated scheduling algorithms.
For more information about these options see
<a href="gang_scheduling">Gang Scheduling</a> and
<a href="cons_res_share.html">Sharing Consumable Resources</a>.</p>
<h3>Node selection</h3>
<p>The node selection mechanism used by SLURM is controlled by the
......@@ -238,6 +245,12 @@ aware and interacts with the BlueGene bridge API).</p>
levels for these messages. Be certain that your system's syslog functionality
is operational. </p>
<h3>Accounting</h3>
<p>SLURM supports accounting records being written to a simple text file,
directly to a database (MySQL or PostgreSQL), or to a daemon securely
managing accounting data for multiple clusters. For more information
see <a href="accounting">Accounting</a></p>
<h3>Corefile format</h3>
<p>SLURM is designed to support generating a variety of core file formats for
application codes that fail (see the <i>--core</i> option of the <i>srun</i>
......@@ -341,7 +354,8 @@ minimum configuration values will be considered DOWN and not scheduled.
Note that a more extensive sample configuration file is provided in
<b>etc/slurm.conf.example</b>. We also have a web-based
<a href="configurator.html">configuration tool</a> which can
be used to build a simple configuration file.</p>
be used to build a simple configuration file, which can then be
manually edited for more complex configurations.</p>
<pre>
#
# Sample /etc/slurm.conf for mcr.llnl.gov
......@@ -545,50 +559,12 @@ Configuration data as of 03/19-13:04:12
AuthType = auth/munge
BackupAddr = eadevj
BackupController = adevj
BOOT_TIME = 01/10-09:19:21
CacheGroups = 0
CheckpointType = checkpoint/none
ControlAddr = eadevi
ControlMachine = adevi
DatabaseType = database/flatfile
DatabaseHost = (null)
DatabasePort = (null)
DatabaseUser = (null)
Epilog = (null)
FastSchedule = 1
FirstJobId = 1
InactiveLimit = 0
JobCompLoc = /var/tmp/jette/slurm.job.log
JobCompType = jobcomp/filetxt
JobCredPrivateKey = /etc/slurm/slurm.key
JobCredPublicKey = /etc/slurm/slurm.cert
KillWait = 30
MaxJobCnt = 2000
MinJobAge = 300
PluginDir = /usr/lib/slurm
Prolog = (null)
ReturnToService = 1
SchedulerAuth = (null)
SchedulerPort = 65534
SchedulerType = sched/backfill
SlurmUser = slurm(97)
SlurmctldDebug = 4
SlurmctldLogFile = /tmp/slurmctld.log
SlurmctldPidFile = /tmp/slurmctld.pid
SlurmctldPort = 7002
SlurmctldTimeout = 300
SlurmdDebug = 65534
SlurmdLogFile = /tmp/slurmd.log
SlurmdPidFile = /tmp/slurmd.pid
SlurmdPort = 7003
SlurmdSpoolDir = /tmp/slurmd
SlurmdTimeout = 300
TreeWidth = 50
JobAcctStorageType = jobacct_storage/filetxt
JobAcctStorageLoc = /tmp/jobacct.log
JobAcctGatherFrequncy = 5
JobAcctGatherType = jobacct_gather/linux
SLURM_CONFIG_FILE = /etc/slurm/slurm.conf
StateSaveLocation = /usr/local/tmp/slurm/adev
SwitchType = switch/elan
TmpFS = /tmp
...
WaitTime = 0
Slurmctld(primary/backup) at adevi/adevj are UP/UP
......@@ -601,9 +577,9 @@ adev0: scontrol shutdown
<p>An extensive test suite is available within the SLURM distribution
in <i>testsuite/expect</i>.
There are about 250 tests which will execute on the order of 2000 jobs
and 4000 job steps.
and 5000 job steps.
Depending upon your system configuration and performance, this test
suite will take roughly 40 minutes to complete.
suite will take roughly 80 minutes to complete.
The file <i>testsuite/expect/globals</i> contains default paths and
procedures for all of the individual tests. You will need to edit this
file to specify where SLURM and other tools are installed.
......@@ -628,6 +604,6 @@ in the NEWS file.
</pre> <p class="footer"><a href="#top">top</a></p>
<p style="text-align:center;">Last modified 1 October 2007</p>
<p style="text-align:center;">Last modified 11 March 2008</p>
<!--#include virtual="footer.txt"-->
......@@ -80,19 +80,19 @@ will be written to a the file specified by the
\fBAccountingStorageLoc\fR parameter.
The value "accounting_storage/gold" indicates that account records
will be written to Gold
(http://www.clusterresources.com/pages/products/gold-allocation-manager.ph),
(http://www.clusterresources.com/pages/products/gold-allocation-manager.php),
which maintains its own database.
The value "accounting_storage/mysql" indicates that accounting records
should be written to a mysql database specified by the
should be written to a MySQL database specified by the
\fBAccountingStorageLoc\fR parameter.
The default value is "accounting_storage/none", which means that
account records are not maintained.
The value "accounting_storage/pgsql" indicates that accounting records
should be written to a postresql database specified by the
should be written to a PostgreSQL database specified by the
\fBAccountingStorageLoc\fR parameter.
The value "accounting_storage/slurmdbd" indicates that accounting records
will be written to SlurmDbd, which maintains its own database. See
"man slurmdbd" for more information.
will be written to SlurmDDB, which manages an underlying MySQL or
PostgreSQL database. See "man slurmdbd" for more information.
Also see \fBDefaultStorageType\fR.
.TP
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment