Skip to content
Snippets Groups Projects
Commit 572717e8 authored by Moe Jette's avatar Moe Jette
Browse files

Document sbcast command,

correct pointer to MPICH2 site, 
use different prompt within srun --allocate for clarity.
parent 6bbc8775
No related branches found
No related tags found
No related merge requests found
......@@ -17,7 +17,8 @@ work.</p>
<h2>Architecture</h2>
<p>As depicted in Figure 1, SLURM consists of a <b>slurmd</b> daemon running on
each compute node, a central <b>slurmctld</b> daemon running on a management node
(with optional fail-over twin), and six utility programs: <b>srun</b>, <b>scancel</b>,
(with optional fail-over twin), and seven utility programs: <b>srun</b>,
<b>sbcast</b>, <b>scancel</b>,
<b>sinfo</b>, <b>srun</b>, <b>smap</b>, <b>squeue</b>, and <b>scontrol</b>.
All of the commands can run anywhere in the cluster.</p>
......@@ -60,19 +61,29 @@ specific node characteristics (so much memory, disk space, certain required feat
etc.). Besides securing a resource allocation, <span class="commandline">srun</span>
is used to initiate job steps. These job steps can execute sequentially or in
parallel on independent or shared nodes within the job's node allocation.</p>
<p><span class="commandline"><b>sbcast</b></span> is used to transfer a file
from local disk to local disk on the nodes allocated to a job. This can be
used to effectively use diskless compute nodes or provide improved performance
relative to a shared file system.</p>
<p><span class="commandline"><b>scancel</b></span> is used to cancel a pending
or running job or job step. It can also be used to send an arbitrary signal to
all processes associated with a running job or job step.</p>
<p><span class="commandline"><b>scontrol</b></span> is the administrative tool
used to view and/or modify SLURM state. Note that many <span class="commandline">scontrol</span>
commands can only be executed as user root.</p>
<p><span class="commandline"><b>sinfo</b></span> reports the state of partitions
and nodes managed by SLURM. It has a wide variety of filtering, sorting, and formatting
options.</p>
<p><span class="commandline"><b>squeue</b></span> reports the state of jobs or
job steps. It has a wide variety of filtering, sorting, and formatting options.
By default, it reports the running jobs in priority order and then the pending
jobs in priority order.</p>
<p><span class="commandline"><b>smap</b></span> reports state information for
jobs, partitions, and nodes managed by SLURM, but graphically displays the
information to reflect network topology.</p>
......@@ -166,8 +177,8 @@ SLURM to allocate resources for the job and then mpirun to initiate the
tasks. For example:
<pre>
$ srun -n4 -A # allocates four processors and spawns shell for job
$ mpirun -np 4 a.out
$ exit # exits shell spawned by initial srun command
&gt; mpirun -np 4 a.out
&gt; exit # exits shell spawned by initial srun command
</pre>
Note that any direct use of <span class="commandline">srun</span>
will only launch one task per node when the LAM/MPI plugin is used.
......@@ -195,14 +206,14 @@ Do not directly execute the <span class="commandline">srun</span> command
to launch LAM/MPI tasks. For example:
<pre>
$ srun -n16 -A # allocates 16 processors and spawns shell for job
$ lamboot
$ mpirun -np 16 foo args
&gt; lamboot
&gt; mpirun -np 16 foo args
1234 foo running on adev0 (o)
2345 foo running on adev1
etc.
$ lamclean
$ lamhalt
$ exit # exits shell spawned by initial srun command
&gt; lamclean
&gt; lamhalt
&gt; exit # exits shell spawned by initial srun command
</pre>
Note that any direct use of <span class="commandline">srun</span>
will only launch one task per node when the LAM/MPI plugin is configured
......@@ -220,7 +231,7 @@ option to launch jobs. For example:
$MPI_ROOT/bin/mpirun -TCP -srun -N8 ./a.out
</pre></p>
<p><a href="http:://www-unix.mcs.anl.gov/mpi/mpich2/"><b>MPICH2</b></a> jobs
<p><a href="http://www-unix.mcs.anl.gov/mpi/mpich2/"><b>MPICH2</b></a> jobs
are launched using the <b>srun</b> command. Just link your program with
SLURM's implementation of the PMI library so that tasks can communication
host and port information at startup. For example:
......@@ -251,6 +262,6 @@ the base partition count.
See <a href="bluegene.html">BlueGene User and Administrator Guide</a>
for more information.</p>
<p style="text-align:center;">Last modified 11 April 2006</p>
<p style="text-align:center;">Last modified 24 May 2006</p>
<!--#include virtual="footer.txt"-->
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment