Skip to content
Snippets Groups Projects
Commit 572717e8 authored by Moe Jette's avatar Moe Jette
Browse files

Document sbcast command,

correct pointer to MPICH2 site, 
use different prompt within srun --allocate for clarity.
parent 6bbc8775
No related branches found
No related tags found
No related merge requests found
...@@ -17,7 +17,8 @@ work.</p> ...@@ -17,7 +17,8 @@ work.</p>
<h2>Architecture</h2> <h2>Architecture</h2>
<p>As depicted in Figure 1, SLURM consists of a <b>slurmd</b> daemon running on <p>As depicted in Figure 1, SLURM consists of a <b>slurmd</b> daemon running on
each compute node, a central <b>slurmctld</b> daemon running on a management node each compute node, a central <b>slurmctld</b> daemon running on a management node
(with optional fail-over twin), and six utility programs: <b>srun</b>, <b>scancel</b>, (with optional fail-over twin), and seven utility programs: <b>srun</b>,
<b>sbcast</b>, <b>scancel</b>,
<b>sinfo</b>, <b>srun</b>, <b>smap</b>, <b>squeue</b>, and <b>scontrol</b>. <b>sinfo</b>, <b>srun</b>, <b>smap</b>, <b>squeue</b>, and <b>scontrol</b>.
All of the commands can run anywhere in the cluster.</p> All of the commands can run anywhere in the cluster.</p>
...@@ -60,19 +61,29 @@ specific node characteristics (so much memory, disk space, certain required feat ...@@ -60,19 +61,29 @@ specific node characteristics (so much memory, disk space, certain required feat
etc.). Besides securing a resource allocation, <span class="commandline">srun</span> etc.). Besides securing a resource allocation, <span class="commandline">srun</span>
is used to initiate job steps. These job steps can execute sequentially or in is used to initiate job steps. These job steps can execute sequentially or in
parallel on independent or shared nodes within the job's node allocation.</p> parallel on independent or shared nodes within the job's node allocation.</p>
<p><span class="commandline"><b>sbcast</b></span> is used to transfer a file
from local disk to local disk on the nodes allocated to a job. This can be
used to effectively use diskless compute nodes or provide improved performance
relative to a shared file system.</p>
<p><span class="commandline"><b>scancel</b></span> is used to cancel a pending <p><span class="commandline"><b>scancel</b></span> is used to cancel a pending
or running job or job step. It can also be used to send an arbitrary signal to or running job or job step. It can also be used to send an arbitrary signal to
all processes associated with a running job or job step.</p> all processes associated with a running job or job step.</p>
<p><span class="commandline"><b>scontrol</b></span> is the administrative tool <p><span class="commandline"><b>scontrol</b></span> is the administrative tool
used to view and/or modify SLURM state. Note that many <span class="commandline">scontrol</span> used to view and/or modify SLURM state. Note that many <span class="commandline">scontrol</span>
commands can only be executed as user root.</p> commands can only be executed as user root.</p>
<p><span class="commandline"><b>sinfo</b></span> reports the state of partitions <p><span class="commandline"><b>sinfo</b></span> reports the state of partitions
and nodes managed by SLURM. It has a wide variety of filtering, sorting, and formatting and nodes managed by SLURM. It has a wide variety of filtering, sorting, and formatting
options.</p> options.</p>
<p><span class="commandline"><b>squeue</b></span> reports the state of jobs or <p><span class="commandline"><b>squeue</b></span> reports the state of jobs or
job steps. It has a wide variety of filtering, sorting, and formatting options. job steps. It has a wide variety of filtering, sorting, and formatting options.
By default, it reports the running jobs in priority order and then the pending By default, it reports the running jobs in priority order and then the pending
jobs in priority order.</p> jobs in priority order.</p>
<p><span class="commandline"><b>smap</b></span> reports state information for <p><span class="commandline"><b>smap</b></span> reports state information for
jobs, partitions, and nodes managed by SLURM, but graphically displays the jobs, partitions, and nodes managed by SLURM, but graphically displays the
information to reflect network topology.</p> information to reflect network topology.</p>
...@@ -166,8 +177,8 @@ SLURM to allocate resources for the job and then mpirun to initiate the ...@@ -166,8 +177,8 @@ SLURM to allocate resources for the job and then mpirun to initiate the
tasks. For example: tasks. For example:
<pre> <pre>
$ srun -n4 -A # allocates four processors and spawns shell for job $ srun -n4 -A # allocates four processors and spawns shell for job
$ mpirun -np 4 a.out &gt; mpirun -np 4 a.out
$ exit # exits shell spawned by initial srun command &gt; exit # exits shell spawned by initial srun command
</pre> </pre>
Note that any direct use of <span class="commandline">srun</span> Note that any direct use of <span class="commandline">srun</span>
will only launch one task per node when the LAM/MPI plugin is used. will only launch one task per node when the LAM/MPI plugin is used.
...@@ -195,14 +206,14 @@ Do not directly execute the <span class="commandline">srun</span> command ...@@ -195,14 +206,14 @@ Do not directly execute the <span class="commandline">srun</span> command
to launch LAM/MPI tasks. For example: to launch LAM/MPI tasks. For example:
<pre> <pre>
$ srun -n16 -A # allocates 16 processors and spawns shell for job $ srun -n16 -A # allocates 16 processors and spawns shell for job
$ lamboot &gt; lamboot
$ mpirun -np 16 foo args &gt; mpirun -np 16 foo args
1234 foo running on adev0 (o) 1234 foo running on adev0 (o)
2345 foo running on adev1 2345 foo running on adev1
etc. etc.
$ lamclean &gt; lamclean
$ lamhalt &gt; lamhalt
$ exit # exits shell spawned by initial srun command &gt; exit # exits shell spawned by initial srun command
</pre> </pre>
Note that any direct use of <span class="commandline">srun</span> Note that any direct use of <span class="commandline">srun</span>
will only launch one task per node when the LAM/MPI plugin is configured will only launch one task per node when the LAM/MPI plugin is configured
...@@ -220,7 +231,7 @@ option to launch jobs. For example: ...@@ -220,7 +231,7 @@ option to launch jobs. For example:
$MPI_ROOT/bin/mpirun -TCP -srun -N8 ./a.out $MPI_ROOT/bin/mpirun -TCP -srun -N8 ./a.out
</pre></p> </pre></p>
<p><a href="http:://www-unix.mcs.anl.gov/mpi/mpich2/"><b>MPICH2</b></a> jobs <p><a href="http://www-unix.mcs.anl.gov/mpi/mpich2/"><b>MPICH2</b></a> jobs
are launched using the <b>srun</b> command. Just link your program with are launched using the <b>srun</b> command. Just link your program with
SLURM's implementation of the PMI library so that tasks can communication SLURM's implementation of the PMI library so that tasks can communication
host and port information at startup. For example: host and port information at startup. For example:
...@@ -251,6 +262,6 @@ the base partition count. ...@@ -251,6 +262,6 @@ the base partition count.
See <a href="bluegene.html">BlueGene User and Administrator Guide</a> See <a href="bluegene.html">BlueGene User and Administrator Guide</a>
for more information.</p> for more information.</p>
<p style="text-align:center;">Last modified 11 April 2006</p> <p style="text-align:center;">Last modified 24 May 2006</p>
<!--#include virtual="footer.txt"--> <!--#include virtual="footer.txt"-->
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment