Skip to content
Snippets Groups Projects
Commit f4744b81 authored by Christopher J. Morrone's avatar Christopher J. Morrone
Browse files

Update FAQ multi-slurmd note to point to the info in the programmers guide.

parent 8638f6b0
No related branches found
No related tags found
No related merge requests found
......@@ -42,7 +42,7 @@ controller?</a></li>
<li><a href="#multi_slurm">Can multiple SLURM systems be run in
parallel for testing purposes?</a></li>
<li><a href="#multi_slurmd">Can multiple slurmd daemons be run
+on the compute node(s) to emulate a larger cluster?</a></li>
on the compute node(s) to emulate a larger cluster?</a></li>
</ol>
<h2>For Users</h2>
......@@ -399,18 +399,22 @@ file).</li>
<li>Build and install SLURM in the usual manner.</li>
<li>In <i>slurm.conf</i> define the desired node names (arbitrary
names used only by SLURM) as <i>NodeName</i> along with the actual
address of the physical node in <i>NodeAddr</i>. Note that multiple
<i>NodeName</i> values can be mapped to a single <i>NodeAddr</i>.</li>
<li>Note that when more than one slurmd is started per node, they will
use communication ports that are higher than the configured value of
<i>SlurmdPort</i>. (The first slurmd uses <i>SlurmdPort</i>, then
second uses <i>SlurmdPort</i>+1, etc.)</li>
address of the physical node in <i>NodeHostname</i>. Multiple
<i>NodeName</i> values can be mapped to a single
<i>NodeHostname</i>. Note that each <i>NodeName</i> on a single
physical node needs to be configured to use a different port number. You
will also want to use the "%n" symbol in slurmd related path options in
slurm.conf. </li>
<li>When starting the <i>slurmd</i> daemon, include the <i>NodeName</i>
of the node that it is supposed to serve on the execute line.</li>
</ol>
It is strongly recommended that SLURM version 1.2 or higher be used
for this due to it's improved support for multiple slurmd daemons.
See the
<a href="programmer_guide.shtml#multiple_slurmd_support">Programmers Guide</a>
for more details about configuring multiple slurmd support.
<p class="footer"><a href="#top">top</a></p>
<p style="text-align:center;">Last modified 11 October 2006</p>
......
......@@ -179,32 +179,44 @@ executes. Initiate one <span class="commandline">slurmd</span> and one
simultaneous job steps to avoid overloading the
<span class="commandline">slurmd</span> daemon executing them all.</p>
<h3>Multiple slurmd support</h3>
<h3><a name="multiple_slurmd_support">Multiple slurmd support</a></h3>
<p>It is possible to run mutiple slurmd daemons on a single node, each using
a different port number and NodeName alias. This is very useful for testing
networking and protocol changes, or anytime you want to simulate a larger
cluster than you really have. The author uses this on his desktop to simulate
multiple nodes. However, multiple slurmd mode should not be used in
production, because not all slurm functions are working under this mode (e.g.
many switch plugins will not work, srun reattachs won't work, etc.).</p>
multiple nodes. However, it is important to note that not all slurm functions
will work with multiple slurmd support enabled (e.g. many switch plugins will
not work, it is best to use switch/none).</p>
<p>Multiple support is enabled at configure-time with the
"--enable-multiple-slurmd" parameter. This enables a new parameter in the
slurm.conf file on the NodeName line, "Port=<port number>", and adds a new
command line parameters to slurmd, "-N" and "-P".</p>
command line parameter to slurmd, "-N".</p>
<p>Each slurmd needs to have its own NodeName, and its own TCP port number. Here
is an example of the NodeName lines for running three slurmd daemons on each
of ten nodes:</p>
<pre>
NodeName=foo[1-10] NodeAddr=host[1-10] Port=17001
NodeName=foo[11-20] NodeAddr=host[1-10] Port=17002
NodeName=foo[21-30] NodeAddr=host[1-10] Port=17003
NodeName=foo[1-10] NodeHostname=host[1-10] Port=17001
NodeName=foo[11-20] NodeHostname=host[1-10] Port=17002
NodeName=foo[21-30] NodeHostname=host[1-10] Port=17003
</pre>
<p>
It is then up to you to start the slurmd daemons with the proper NodeNames.
It is likely that you will also want to use the "%n" symbol in any slurmd
related paths in the slurm.conf file, for instance SlurmdLogFile,
SlurmdPidFile, and especially SlurmdSpoolDir. Each slurmd replaces the "%n"
with its own NodeName. Here is an example:</p>
<pre>
SlurmdLogFile=/var/log/slurm/slurmd.%n.log
SlurmdPidFile=/var/run/slurmd.%n.pid
SlurmdSpoolDir=/var/spool/slurmd.%n
</pre>
<p>
It is up to you to start each slurmd daemon with the proper NodeName.
For example, to start the slurmd daemons for host1 from the
above slurm.conf example:</p>
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment