Skip to content
GitLab
Explore
Sign in
Primary navigation
Search or go to…
Project
S
Slurm
Manage
Activity
Members
Labels
Plan
Issues
Issue boards
Milestones
Code
Merge requests
Repository
Branches
Commits
Tags
Repository graph
Compare revisions
Snippets
Build
Pipelines
Jobs
Pipeline schedules
Artifacts
Deploy
Releases
Package Registry
Model registry
Operate
Environments
Terraform modules
Monitor
Incidents
Service Desk
Analyze
Value stream analytics
Contributor analytics
CI/CD analytics
Repository analytics
Model experiments
Help
Help
Support
GitLab documentation
Compare GitLab plans
Community forum
Contribute to GitLab
Provide feedback
Terms and privacy
Keyboard shortcuts
?
Snippets
Groups
Projects
Show more breadcrumbs
tud-zih-energy
Slurm
Commits
572717e8
Commit
572717e8
authored
18 years ago
by
Moe Jette
Browse files
Options
Downloads
Patches
Plain Diff
Document sbcast command,
correct pointer to MPICH2 site, use different prompt within srun --allocate for clarity.
parent
6bbc8775
No related branches found
Branches containing commit
No related tags found
No related merge requests found
Changes
1
Hide whitespace changes
Inline
Side-by-side
Showing
1 changed file
doc/html/quickstart.shtml
+21
-10
21 additions, 10 deletions
doc/html/quickstart.shtml
with
21 additions
and
10 deletions
doc/html/quickstart.shtml
+
21
−
10
View file @
572717e8
...
...
@@ -17,7 +17,8 @@ work.</p>
<h2>Architecture</h2>
<p>As depicted in Figure 1, SLURM consists of a <b>slurmd</b> daemon running on
each compute node, a central <b>slurmctld</b> daemon running on a management node
(with optional fail-over twin), and six utility programs: <b>srun</b>, <b>scancel</b>,
(with optional fail-over twin), and seven utility programs: <b>srun</b>,
<b>sbcast</b>, <b>scancel</b>,
<b>sinfo</b>, <b>srun</b>, <b>smap</b>, <b>squeue</b>, and <b>scontrol</b>.
All of the commands can run anywhere in the cluster.</p>
...
...
@@ -60,19 +61,29 @@ specific node characteristics (so much memory, disk space, certain required feat
etc.). Besides securing a resource allocation, <span class="commandline">srun</span>
is used to initiate job steps. These job steps can execute sequentially or in
parallel on independent or shared nodes within the job's node allocation.</p>
<p><span class="commandline"><b>sbcast</b></span> is used to transfer a file
from local disk to local disk on the nodes allocated to a job. This can be
used to effectively use diskless compute nodes or provide improved performance
relative to a shared file system.</p>
<p><span class="commandline"><b>scancel</b></span> is used to cancel a pending
or running job or job step. It can also be used to send an arbitrary signal to
all processes associated with a running job or job step.</p>
<p><span class="commandline"><b>scontrol</b></span> is the administrative tool
used to view and/or modify SLURM state. Note that many <span class="commandline">scontrol</span>
commands can only be executed as user root.</p>
<p><span class="commandline"><b>sinfo</b></span> reports the state of partitions
and nodes managed by SLURM. It has a wide variety of filtering, sorting, and formatting
options.</p>
<p><span class="commandline"><b>squeue</b></span> reports the state of jobs or
job steps. It has a wide variety of filtering, sorting, and formatting options.
By default, it reports the running jobs in priority order and then the pending
jobs in priority order.</p>
<p><span class="commandline"><b>smap</b></span> reports state information for
jobs, partitions, and nodes managed by SLURM, but graphically displays the
information to reflect network topology.</p>
...
...
@@ -166,8 +177,8 @@ SLURM to allocate resources for the job and then mpirun to initiate the
tasks. For example:
<pre>
$ srun -n4 -A # allocates four processors and spawns shell for job
$
mpirun -np 4 a.out
$
exit # exits shell spawned by initial srun command
>
mpirun -np 4 a.out
>
exit # exits shell spawned by initial srun command
</pre>
Note that any direct use of <span class="commandline">srun</span>
will only launch one task per node when the LAM/MPI plugin is used.
...
...
@@ -195,14 +206,14 @@ Do not directly execute the <span class="commandline">srun</span> command
to launch LAM/MPI tasks. For example:
<pre>
$ srun -n16 -A # allocates 16 processors and spawns shell for job
$
lamboot
$
mpirun -np 16 foo args
>
lamboot
>
mpirun -np 16 foo args
1234 foo running on adev0 (o)
2345 foo running on adev1
etc.
$
lamclean
$
lamhalt
$
exit # exits shell spawned by initial srun command
>
lamclean
>
lamhalt
>
exit # exits shell spawned by initial srun command
</pre>
Note that any direct use of <span class="commandline">srun</span>
will only launch one task per node when the LAM/MPI plugin is configured
...
...
@@ -220,7 +231,7 @@ option to launch jobs. For example:
$MPI_ROOT/bin/mpirun -TCP -srun -N8 ./a.out
</pre></p>
<p><a href="http:
:
//www-unix.mcs.anl.gov/mpi/mpich2/"><b>MPICH2</b></a> jobs
<p><a href="http://www-unix.mcs.anl.gov/mpi/mpich2/"><b>MPICH2</b></a> jobs
are launched using the <b>srun</b> command. Just link your program with
SLURM's implementation of the PMI library so that tasks can communication
host and port information at startup. For example:
...
...
@@ -251,6 +262,6 @@ the base partition count.
See <a href="bluegene.html">BlueGene User and Administrator Guide</a>
for more information.</p>
<p style="text-align:center;">Last modified
11 April
2006</p>
<p style="text-align:center;">Last modified
24 May
2006</p>
<!--#include virtual="footer.txt"-->
This diff is collapsed.
Click to expand it.
Preview
0%
Loading
Try again
or
attach a new file
.
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Save comment
Cancel
Please
register
or
sign in
to comment