Skip to content
Snippets Groups Projects
Commit ee2e218a authored by Tim Wickberg's avatar Tim Wickberg
Browse files

docs - remove maui and moab integration guides

They're full of dead links, and the plugins are deprecated
(announced at SLUG16).
parent 5a176620
No related branches found
No related tags found
No related merge requests found
......@@ -53,12 +53,10 @@ generated_html = \
fair_tree.html \
mail.html \
man_index.html \
maui.html \
mc_support.html \
mcs.html \
mcs_plugins.html \
meetings.html \
moab.html \
mpi_guide.html \
mpiplugins.html \
multi_cluster.html \
......
......@@ -507,12 +507,10 @@ generated_html = \
fair_tree.html \
mail.html \
man_index.html \
maui.html \
mc_support.html \
mcs.html \
mcs_plugins.html \
meetings.html \
moab.html \
mpi_guide.html \
mpiplugins.html \
multi_cluster.html \
......
......@@ -81,8 +81,6 @@ may be found in the <a href="http://slurm.schedmd.com/archive/">archive</a>.
</ul>
<li>External Schedulers</li>
<ul>
<li><a href="maui.html">Maui Scheduler Integration Guide</a></li>
<li><a href="moab.html">Moab Cluster Suite Integration Guide</a></li>
<li><a href="http://docs.hp.com/en/5991-4847/ch09s02.html">Submitting Jobs through LSF</a></li>
</ul>
<li>Specific Systems</li>
......
<!--#include virtual="header.txt"-->
<h1>Maui Scheduler Integration Guide</h1>
<h2>Overview</h2>
<p>Maui configuration is quite complicated and is really beyond the scope
of any documents we could supply with Slurm.
The best resource for Maui configuration information is the
online documents at Cluster Resources Inc.:
<a href="http://www.clusterresources.com/products/maui/docs/mauiadmin.shtml">
http://www.clusterresources.com/products/maui/docs/mauiadmin.shtml</a>.
<p>Maui use. Slurm commands and a wiki interface to communicate. See the
<a href="http://www.clusterresources.com/products/mwm/docs/wiki/wikiinterface.shtml">
Wiki Interface Specification</a> and
<a href="http://www.clusterresources.com/products/mwm/docs/wiki/socket.shtml">
Wiki Socket Protocol Description</a> for more information.</p>
<h2>Configuration</h2>
<p>First, download the Maui scheduler kit from their web site
<a href="http://www.clusterresources.com/pages/products/maui-cluster-scheduler.php">
http://www.clusterresources.com/pages/products/maui-cluster-scheduler.php</a>.
Note: maui-3.2.6p9 has been validated with Slurm, other versions
should also work properly.
We anticipate the Maui Scheduler to be upgraded to utilize a more
extensive interface to Slurm in early 2007.
The newer Maui Scheduler will be able to utilize a more ful featured
interface to Slurm as descripted in the
<a href="moab.html">Moab Cluster Suite Integration Guide</a>.
This guide will be upgrade at that time.</p>
<h3>Maui configuration</h3>
<p>Make sure that Slurm is installed and running before building Maui.
Then build Maui from its source distribution. This is a two step process:</p>
<ol>
<li>./configure --with-key=42 --with-wiki
<li>gmake
</ol>
<p>The key of 42 is arbitrary. You can use any value, but it will need to
be a number no larger than 4,294,967,295 (2^32) and specify the same
value as a Slurm configuration parameter described below.
Maui developers have assured us the authentication key will eventually be
set in a configuration file rather than at build time.</p>
<p>Update the Maui configuration file <i>maui.conf</i> (Copy the file
maui-3.2.6p9/maui.cfg.dist to maui.conf). Add the following configuration
parameters to maui.conf:</p>
<pre>
RMCFG[host] TYPE=WIKI
RMPORT 7321 # selected port
RMHOST host
RMAUTHTYPE[host] CHECKSUM
</pre>
<p><i>host</i> is the hostname where the Slurm controller is running.
This must match the value of <i>ControlMachine</i> configured in
slurm.conf. Note that <i>localhost</i> doesn't work. If you run Maui
and Slurm on the same machine, you must specify the actual host name.
The above example uses a TCP port number of 7321 for
communications between Slurm and Maui, but you can pick any port that
is available and accessible. You can also set a polling interval with</p>
<pre>
RMPOLLINTERVAL 00:00:20
</pre>
<p>It may be desired to have Maui poll Slurm quite often --
in this case every 20 seconds.
Note that a job submitted to an idle cluster will not be initiated until
the Maui daemon polls Slurm and decides to make it run, so the value of
RMPOLLINTERVAL should be set to a value appropriate for your site
considering both the desired system responsiveness and the overhead of
executing Maui daemons too frequently.</p>
<p>In order for Maui to be able to access your Slurm partition, you will
need to define in maui.conf a partition with the same name as the Slurm
partition(s). For example if nodes "linux[0-3]" are in Slurm partition
"PartA", slurm.conf includes a line of this sort:</p>
<pre>
PartitionName=PartA Default=yes Nodes=linux[0-3]
</pre>
<p>The add the corresponding lines to maui.cfg:</p>
<pre>
PARTITIONMODE ON
NODECFG[linux0] PARTITION=PartA
NODECFG[linux1] PARTITION=PartA
NODECFG[linux2] PARTITION=PartA
NODECFG[linux3] PARTITION=PartA
</pre>
<p>Set the following environment variables and path:
<pre>
set path=(/root/MAUI/maui-3.2.6p9/bin $path)
setenv MAUIHOMEDIR /root/MAUI/maui-3.2.6p9
</pre>
<p class="footer"><a href="#top">top</a></p>
<h3>Slurm configuration</h3>
<p>Set the slurm.conf scheduler parameters as follows:</p>
<pre>
SchedulerType=sched/wiki
SchedulerPort=7321
SchedulerAuth=42 (for Slurm version 1.1 and earlier only)
</pre>
<p>In this case, "SchedulerAuth" has been set to 42, which was the
authentication key specified when Maui was configured above.
Just make sure the numbers match.</p>
<p>For Slurm version 1.2 or higher, the authentication key
is stored in a file specific to the wiki-plugin named
<i>wiki.conf</i>.
This file should be protected from reading by users.
It only needs to be readable by <i>SlurmUser</i> (as configured
in <i>slurm.conf</i>) and only needs to exist on computers
where the <i>slurmctld</i> daemon executes.
More information about wiki.conf is available in
a man page distributed with Slurm, although that
includes a description of keywords presently only
supported by the sched/wiki2 plugin for use with the
Moab Scheduler.</p>
<p>Slurm has some internal scheduling capabilities which are not compatible
with Maui.
<ol>
<li>Do not configure Slurm to use the "priority/multifactor" plugin
as it would set job priorities which conflict with those set by Maui.</li>
<li>Do not use Slurm's <a href="reservations.html">reservation</a>
mechanism, but use that offered by Maui.</li>
<li>Do not use Slurm's <a href="resource_limits.html">resource limits</a>
as those may conflict with those managed by Maui.</li>
</ol></p>
<p>The wiki.conf keywords currently supported by Maui include:</p>
<p><b>AuthKey</b> is a DES based encryption key used to sign
communications between Slurm and Maui or Moab.
This use of this key is essential to insure that a user
not build his own program to cancel other user's jobs in
Slurm.
This should be no more than 32-bit unsigned integer and match
the encryption key in Maui (<i>--with-key</i> on the
configure line) or Moab (<i>KEY</i> parameter in the
<i>moab-private.cfg</i> file).
Note tha. Slurm's wiki plugin does not include a mechanism
to submit new jobs, so even without this key nobody could
run jobs as another user.</p>
<p><b>ExcludePartitions</b> is used to identify partitions
whose jobs are to be scheduled directly b. Slurm rather
than Maui.
These jobs will be scheduled on a First-Come-First-Served
basis.
This may provide faster response times than Maui scheduling.
Maui will account for and report the jobs, but their initiation
will be outside of Maui's control.
Note that Maui controls for resource reservation, fair share
scheduling, etc. will not apply to the initiation of these jobs.
If more than one partition is to be scheduled directly by
Slurm, use a comma separator between their names.</p>
<p><b>HidePartitionJobs</b> identifies partitions whose jobs are not
to be reported to Maui.
These jobs will not be accounted for or otherwise visible to Maui.
Any partitions listed here must also be listed in <b>ExcludePartitions</b>.
If more than one partition is to have its jobs hidden, use a comma
separator between their names.</p>
<p>Here is a sample <i>wiki.conf</i> file</p>
<pre>
# wiki.conf
. Slurm's wiki plugin configuration file
#
# Matches Maui's --with-key configuration parameter
AuthKey=42
#
# Slurm to directly schedule "debug" partition
# and hide the jobs from Maui
ExcludePartitions=debug
HidePartitionJobs=debug
</pre>
</p>
<p class="footer"><a href="#top">top</a></p>
<p style="text-align:center;">Last modified 15 April 2015</p>
<!--#include virtual="footer.txt"-->
<!--#include virtual="header.txt"-->
<h1>Moab Cluster Suite Integration Guide</h1>
<h2>Overview</h2>
<p>Moab Cluster Suite configuration is quite complicated and is
beyond the scope of any documents we could supply with Slurm.
The best resource for Moab configuration information is the
online documents at Cluster Resources Inc.:
<a href="http://www.clusterresources.com/products/mwm/docs/slurmintegration.shtml">
http://www.clusterresources.com/products/mwm/docs/slurmintegration.shtml</a>.</p>
<p>Moab use. Slurm commands and a wiki interface to communicate. See the
<a href="http://www.clusterresources.com/products/mwm/docs/wiki/wikiinterface.shtml">
Wiki Interface Specification</a> and
<a href="http://www.clusterresources.com/products/mwm/docs/wiki/socket.shtml">
Wiki Socket Protocol Description</a> for more information.</p>
<p>Somewhat more current information abou. Slurm's implementation of the
wiki interface was developed by Michal Novotny (Masaryk University, Czech Republic)
and can be found <a href="http://www.fi.muni.cz/~xnovot19/wiki2.html">here</a>.</p>
<h2>Configuration</h2>
<p>First, download the Moab scheduler kit from their web site
<a href="http://www.clusterresources.com/pages/products/moab-cluster-suite.php">
http://www.clusterresources.com/pages/products/moab-cluster-suite.php</a>.<br>
<b>Note:</b> Use Moab version 5.0.0 or higher and Slurm version 1.1.28
or higher.</p>
<h3>Slurm configuration</h3>
<h4>slurm.conf</h4>
<p>Set the <i>slurm.conf</i> scheduler parameters as follows:</p>
<pre>
SchedulerType=sched/wiki2
SchedulerPort=7321
</pre>
<p>Running multiple jobs per mode can be accomplished in two different
ways.
The <i>SelectType=select/cons_res</i> parameter can be used to let
Slurm allocate the individual processors, memory, and other
consumable resources.
Alternately, <i>SelectType=select/linear</i> or
<i>SelectType=select/bluegene</i> can be used with the
<i>OverSubscribe=yes</i> or <i>OverSubscribe=force</i> parameter in
partition configuration specifications.</p>
<p>The default value of <i>SchedulerPort</i> is 7321.</p>
<p>Slurm has some internal scheduling capabilities which are not compatible
with Moab.
<ol>
<li>Do not configure Slurm to use the "priority/multifactor" plugin
as it would set job priorities which conflict with those set by Moab.</li>
<li>Do not use Slurm's <a href="reservations.html">reservation</a>
mechanism, but use that offered by Moab.</li>
<li>Do not use Slurm's <a href="resource_limits.html">resource limits</a>
as those may conflict with those managed by Moab.</li>
</ol></p>
<h4>Slurm commands</h4>
<p> Note that the <i>srun --immediate</i> option is not compatible
with Moab.
All jobs must wait for Moab to schedule them rather than being
scheduled immediately by Slurm.</p>
<a name="wiki.conf"><h4>wiki.conf</h4></a>
<p>Slurm's wiki configuration is stored in a file
specific to the wiki-plugin named <i>wiki.conf</i>.
This file should be protected from reading by users.
It only needs to be readable by <i>SlurmUser</i> (as configured
in <i>slurm.conf</i>) and only needs to exist on computers
where the <i>slurmctld</i> daemon executes.
More information about wiki.conf is available in
a man page distributed with Slurm.</p>
<p>The currently supported wiki.conf keywords include:</p>
<p><b>AuthKey</b> is a DES based encryption key used to sign
communications between Slurm and Maui or Moab.
This use of this key is essential to insure that a user
not build his own program to cancel other user's jobs in
Slurm.
This should be no more than 32-bit unsigned integer and match
the encryption key in Maui (<i>--with-key</i> on the
configure line) or Moab (<i>KEY</i> parameter in the
<i>moab-private.cfg</i> file).
Note tha. Slurm's wiki plugin does not include a mechanism
to submit new jobs, so even without this key, nobody can
run jobs as another user.</p>
<p><b>EPort</b> is an event notification port in Moab.
When a job is submitted to or terminates in Slurm,
Moab is sent a message on this port to begin an attempt
to schedule the computer.
This numeric value should match <i>EPORT</i> configured
in the <i>moab.cnf</i> file.</p>
<p><b>EHost</b> is the event notification host for Moab.
This identifies the computer on which the Moab daemons
executes which should be notified of events.
By default EHost will be identical in value to the
ControlAddr configured in slurm.conf.</p>
<p><b>EHostBackup</b> is the event notification backup host for Moab.
Names the computer on which the backup Moab server executes.
It is used in establishing a communications path for event notification.
By default EHostBackup will be identical in value to the
BackupAddr configured in slurm.conf.</p>
<p><b>ExcludePartitions</b> is used to identify partitions
whose jobs are to be scheduled directly b. Slurm rather
than Moab.
This only affects jobs which are submitted using Slurm
commands (i.e. srun, salloc or sbatch, NOT msub from Moab).
These jobs will be scheduled on a First-Come-First-Served
basis.
This may provide faster response times than Moab scheduling.
Moab will account for and report the jobs, but their initiation
will be outside of Moab's control.
Note that Moab controls for resource reservation, fair share
scheduling, etc. will not apply to the initiation of these jobs.
If more than one partition is to be scheduled directly by
Slurm, use a comma separator between their names.</p>
<p><b>HidePartitionJobs</b> identifies partitions whose jobs are not
to be reported to Moab.
These jobs will not be accounted for or otherwise visible to Moab.
Any partitions listed here must also be listed in <b>ExcludePartitions</b>.
If more than one partition is to have its jobs hidden, use a comma
separator between their names.</p>
<p><b>HostFormat</b> controls the format of job task lists built
by Slurm and reported to Moab.
The default value is "0", for which each host name is listed
individually, once per processor (e.g. "tux0:tux0:tux1:tux1:...").
A value of "1" use. Slurm hostlist expressions with processor
counts (e.g. "tux[0-16]*2").
This is currently experimental.
<p><b>JobAggregationTime</b> is used to avoid notifying Moab
of large numbers of events occurring about the same time.
If an event occurs within this number of seconds since Moab was
last notified of an event, another notification is not sent.
This should be an integer number of seconds.
The default value is 10 seconds.
The value should match <i>JOBAGGREGATIONTIME</i> configured
in the <i>moab.cnf</i> file.</p>
<p><b>JobPriority</b> controls the scheduling of newly arriving jobs
in Slurm. Possible values are "hold" and "run" with "hold" being the
default. When <i>JobPriority=hold</i>, Slurm places all newly arriving
jobs in a HELD state (priority = 0) and lets Moab decide when and
where to run the jobs. When <i>JobPriority=run</i>, Slurm controls
when and where to run jobs.
<b>Note:</b> The "run" option implementation has yet to be completed.
Once the "run" option is available, Moab will be able to modify the
priorities of pending jobs to re-order the job queue.</p>
<h4>Sample <i>wiki.conf</i> file</h4>
<pre>
# wiki.conf
. Slurm's wiki plugin configuration file
#
# Matches KEY in moab-private.cfg
AuthKey=123456789
#
# Slurm to directly schedule "debug" partition
# and hide the jobs from Moab
ExcludePartitions=debug
HidePartitionJobs=debug
#
# Have Moab control job scheduling
JobPriority=hold
#
# Moab event notification port, matches EPORT in moab.cfg
EPort=15017
# Moab event notification host, where the Moab daemon runs
#EHost=tux0
#
# Moab event notification throttle,
# matches JOBAGGREGATIONTIME in moab.cfg (seconds)
JobAggregationTime=15
</pre>
</p>
<h3>Moab Configuration</h3>
<p>Moab has support for Slurm's WIKI interface by default.
Specify this interface in the <i>moab.cfg</i> file as follows:</p>
<pre>
SCHEDCFG[base] MODE=NORMAL
RMCFG[slurm] TYPE=WIKI:Slurm AUTHTYPE=CHECKSUM
</pre>
<p>In <i>moab-private.cfg</i> specify the private key as follows:</p>
<pre>
CLIENTCFG[RM:slurm] KEY=123456789
</pre>
<p>Insure that this file is protected from viewing by users. </p>
<h3>Job Submission</h3>
<p>Jobs can either be submitted to Moab or directly to Slurm.
Moab's <i>msub</i> command has a <i>--slurm</i> option that can
be placed at the <b>end</b> of the command line and those options
will be passed to Slurm. This can be used to invoke Slurm
options which are not directly supported by Moab (e.g.
system images to boot, task distribution specification across
sockets, cores, and hyperthreads, etc.).
For example:
<pre>
msub my.script -l walltime=600,nodes=2 \
--slurm --linux-image=/bgl/linux_image2
</pre>
<h3>User Environment</h3>
<p>When a user submits a job to Moab, that job could potentially
execute on a variety of computers, so it is typically necessary
that the user's environment on the execution host be loaded.
Moab relies upon Slurm to perform this action, using the
<i>--get-user-env</i> option for the salloc, sbatch and srun commands.
The Slurm command then executes as user root a command of this sort
as user root:</p>
<pre>
/bin/su - &lt;user&gt; -c \
"/bin/echo BEGIN; /bin/env; /bin/echo FINI"
</pre>
<p> For typical batch jobs, the job transfer from Moab to
Slurm is performed using <i>sbatch</i> and occurs instantaneously.
The environment is loaded by a Slurm daemon (slurmd) when the
batch job begins execution.
For interactive jobs (<i>msub -I ...</i>), the job transfer
from Moab to Slurm cannot be completed until the environment
variables are loaded, during which time the Moab daemon is
completely non-responsive.
To insure that Moab remains operational, Slurm will abort the above
command within a configurable period of time and look for a cache
file with the user's environment and use that if found.
Otherwise an error is reported to Moab.
The time permitted for loading the current environment
before searching for a cache file is configurable using
the <i>GetEnvTimeout</i> parameter in Slurm's configuration
file, slurm.conf. A value of zero results in immediately
using the cache file. The default value is 2 seconds.</p>
<p>We have provided a simple program that can be used to build
cache files for users. The program can be found in the Slurm
distribution at <i>contribs/env_cache_builder.c</i>.
This program can support a longer timeout than Moab, but
will report errors for users for whom the environment file
cannot be automatically build (typically due to the user's
"dot" files spawning another shell so the desired command
never execution).
For such user, you can manually build a cache file.
You may want to execute this program periodically to capture
information for new users or changes in existing users'
environment.
A sample execution is shown below.
Run this on the same host as the Moab daemon and execute it as user root.</p>
<pre>
bash-3.00# make -f /dev/null env_cache_builder
cc env_cache_builder.c -o env_cache_builder
bash-3.00# ./env_cache_builder
Building user environment cache files for Moab/Slurm.
This will take a while.
Processed 100 users...
***ERROR: Failed to get current user environment variables for alice
***ERROR: Failed to get current user environment variables for brian
Processed 200 users...
Processed 300 users...
***ERROR: Failed to get current user environment variables for christine
***ERROR: Failed to get current user environment variables for david
Some user environments could not be loaded.
Manually run 'env' for those 4 users.
Write the output to a file with the same name as the user in the
/usr/local/tmp/slurm/atlas/env_cache directory
</pre>
<p class="footer"><a href="#top">top</a></p>
<p style="text-align:center;">Last modified 31 March 2016</p>
<!--#include virtual="footer.txt"-->
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment