Skip to content
GitLab
Explore
Sign in
Primary navigation
Search or go to…
Project
S
Slurm
Manage
Activity
Members
Labels
Plan
Issues
Issue boards
Milestones
Code
Merge requests
Repository
Branches
Commits
Tags
Repository graph
Compare revisions
Snippets
Build
Pipelines
Jobs
Pipeline schedules
Artifacts
Deploy
Releases
Package Registry
Model registry
Operate
Environments
Terraform modules
Monitor
Incidents
Service Desk
Analyze
Value stream analytics
Contributor analytics
CI/CD analytics
Repository analytics
Model experiments
Help
Help
Support
GitLab documentation
Compare GitLab plans
Community forum
Contribute to GitLab
Provide feedback
Terms and privacy
Keyboard shortcuts
?
Snippets
Groups
Projects
Show more breadcrumbs
tud-zih-energy
Slurm
Commits
6a71b399
Commit
6a71b399
authored
19 years ago
by
Moe Jette
Browse files
Options
Downloads
Patches
Plain Diff
Update and general clean-up.
parent
726de8c8
No related branches found
No related tags found
No related merge requests found
Changes
1
Hide whitespace changes
Inline
Side-by-side
Showing
1 changed file
doc/html/quickstart.html
+37
-17
37 additions, 17 deletions
doc/html/quickstart.html
with
37 additions
and
17 deletions
doc/html/quickstart.html
+
37
−
17
View file @
6a71b399
...
@@ -9,7 +9,7 @@
...
@@ -9,7 +9,7 @@
<meta
http-equiv=
"keywords"
content=
"Simple Linux Utility for Resource Management, SLURM, resource management,
<meta
http-equiv=
"keywords"
content=
"Simple Linux Utility for Resource Management, SLURM, resource management,
Linux clusters, high-performance computing, Livermore Computing"
>
Linux clusters, high-performance computing, Livermore Computing"
>
<meta
name=
"LLNLRandR"
content=
"UCRL-WEB-213976"
>
<meta
name=
"LLNLRandR"
content=
"UCRL-WEB-213976"
>
<meta
name=
"LLNLRandRdate"
content=
"
20 Nov
ember 2005"
>
<meta
name=
"LLNLRandRdate"
content=
"
6 Dec
ember 2005"
>
<meta
name=
"distribution"
content=
"global"
>
<meta
name=
"distribution"
content=
"global"
>
<meta
name=
"description"
content=
"Simple Linux Utility for Resource Management"
>
<meta
name=
"description"
content=
"Simple Linux Utility for Resource Management"
>
<meta
name=
"copyright"
<meta
name=
"copyright"
...
@@ -53,6 +53,7 @@ structure:Laboratories and Other Field Facilities">
...
@@ -53,6 +53,7 @@ structure:Laboratories and Other Field Facilities">
<a
href=
"quickstart_admin.html"
class=
"nav"
>
Guide
</a></p></td>
<a
href=
"quickstart_admin.html"
class=
"nav"
>
Guide
</a></p></td>
<td><img
src=
"spacer.gif"
width=
"10"
height=
"1"
alt=
""
></td>
<td><img
src=
"spacer.gif"
width=
"10"
height=
"1"
alt=
""
></td>
<td
valign=
"top"
><h2>
Quick Start User Guide
</h2>
<td
valign=
"top"
><h2>
Quick Start User Guide
</h2>
<h3>
Overview
</h3>
<h3>
Overview
</h3>
<p>
The Simple Linux Utility for Resource Management (SLURM) is an open source,
<p>
The Simple Linux Utility for Resource Management (SLURM) is an open source,
fault-tolerant, and highly scalable cluster management and job scheduling system
fault-tolerant, and highly scalable cluster management and job scheduling system
...
@@ -64,20 +65,24 @@ can perform work. Second, it provides a framework for starting, executing, and
...
@@ -64,20 +65,24 @@ can perform work. Second, it provides a framework for starting, executing, and
monitoring work (normally a parallel job) on the set of allocated nodes. Finally,
monitoring work (normally a parallel job) on the set of allocated nodes. Finally,
it arbitrates conflicting requests for resources by managing a queue of pending
it arbitrates conflicting requests for resources by managing a queue of pending
work.
</p>
work.
</p>
<h3>
Architecture
</h3>
<h3>
Architecture
</h3>
<p>
As depicted in Figure 1, SLURM consists of a
<b>
slurmd
</b>
daemon running on
<p>
As depicted in Figure 1, SLURM consists of a
<b>
slurmd
</b>
daemon running on
each compute node, a central
<b>
slurmctld
</b>
daemon running on a management node
each compute node, a central
<b>
slurmctld
</b>
daemon running on a management node
(with optional fail-over twin), and
five command line utilities:
<b>
srun
</b>
,
(with optional fail-over twin), and
six utility programs:
<b>
srun
</b>
,
<b>
scancel
</b>
,
<b>
s
cancel
</b>
,
<b>
s
info
</b>
,
<b>
squeue
</b>
, and
<b>
scontrol
</b>
, which can run
<b>
s
info
</b>
,
<b>
s
run
</b>
,
<b>
smap
</b>
,
<b>
squeue
</b>
, and
<b>
scontrol
</b>
.
anywhere in the cluster.
</p>
All of the commands can run
anywhere in the cluster.
</p>
<p><img
src=
"arch.gif"
width=
"
552"
height=
"432
"
>
<p><img
src=
"arch.gif"
width=
"
600
"
>
<p><b>
Figure 1. SLURM components
</b></p>
<p><b>
Figure 1. SLURM components
</b></p>
<p>
The entities managed by these SLURM daemons, shown in Figure 2, include
<b>
nodes
</b>
,
<p>
The entities managed by these SLURM daemons, shown in Figure 2, include
<b>
nodes
</b>
,
the compute resource in SLURM,
<b>
partitions
</b>
, which group nodes into logical
the compute resource in SLURM,
<b>
partitions
</b>
, which group nodes into logical
disjoint
sets,
<b>
jobs
</b>
, or allocations of resources assigned to a user for
sets,
<b>
jobs
</b>
, or allocations of resources assigned to a user for
a specified amount of time, and
<b>
job steps
</b>
, which are sets of (possibly
a specified amount of time, and
<b>
job steps
</b>
, which are sets of (possibly
parallel) tasks within a job. Priority-ordered jobs are allocated nodes within
parallel) tasks within a job.
a partition until the resources (nodes) within that partition are exhausted. Once
The partitions can be considered job queues, each of which has an assortment of
constraints such as job size limit, job time limit, users permitted to use it, etc.
Priority-ordered jobs are allocated nodes within a partition until the resources
(nodes, processors, memory, etc.) within that partition are exhausted. Once
a job is assigned a set of nodes, the user is able to initiate parallel work in
a job is assigned a set of nodes, the user is able to initiate parallel work in
the form of job steps in any configuration within the allocation. For instance,
the form of job steps in any configuration within the allocation. For instance,
a single job step may be started that utilizes all nodes allocated to the job,
a single job step may be started that utilizes all nodes allocated to the job,
...
@@ -85,6 +90,7 @@ or several job steps may independently use a portion of the allocation.</p>
...
@@ -85,6 +90,7 @@ or several job steps may independently use a portion of the allocation.</p>
<p><img
src=
"entities.gif"
width=
"291"
height=
"218"
>
<p><img
src=
"entities.gif"
width=
"291"
height=
"218"
>
<p><b>
Figure 2. SLURM entities
</b></p>
<p><b>
Figure 2. SLURM entities
</b></p>
<p
class=
"footer"
><a
href=
"#top"
>
top
</a></p>
<p
class=
"footer"
><a
href=
"#top"
>
top
</a></p>
<h3>
Commands
</h3>
<h3>
Commands
</h3>
<p>
Man pages exist for all SLURM daemons, commands, and API functions. The command
<p>
Man pages exist for all SLURM daemons, commands, and API functions. The command
option
<span
class=
"commandline"
>
--help
</span>
also provides a brief summary of
option
<span
class=
"commandline"
>
--help
</span>
also provides a brief summary of
...
@@ -111,7 +117,11 @@ options.</p>
...
@@ -111,7 +117,11 @@ options.</p>
job steps. It has a wide variety of filtering, sorting, and formatting options.
job steps. It has a wide variety of filtering, sorting, and formatting options.
By default, it reports the running jobs in priority order and then the pending
By default, it reports the running jobs in priority order and then the pending
jobs in priority order.
</p>
jobs in priority order.
</p>
<p><span
class=
"commandline"
><b>
smap
</b></span>
reports state information for
jobs, partitions, and nodes managed by SLURM, but graphically displays the
information to reflect network topology.
</p>
<p
class=
"footer"
><a
href=
"#top"
>
top
</a></p>
<p
class=
"footer"
><a
href=
"#top"
>
top
</a></p>
<h3>
Examples
</h3>
<h3>
Examples
</h3>
<p>
Execute
<span
class=
"commandline"
>
/bin/hostname
</span>
on four nodes (
<span
class=
"commandline"
>
-N4
</span>
).
<p>
Execute
<span
class=
"commandline"
>
/bin/hostname
</span>
on four nodes (
<span
class=
"commandline"
>
-N4
</span>
).
Include task numbers on the output (
<span
class=
"commandline"
>
-l
</span>
). The
Include task numbers on the output (
<span
class=
"commandline"
>
-l
</span>
). The
...
@@ -162,20 +172,24 @@ adev9
...
@@ -162,20 +172,24 @@ adev9
1: /home/jette
1: /home/jette
2: /home/jette
2: /home/jette
3: /home/jette
3: /home/jette
</pre>
<p>
Submit a job, get its status, and cancel it.
</p>
</pre>
<p>
Submit a job, get its status, and cancel it.
</p>
<pre>
<pre>
adev0: srun -b my.sleeper
adev0: srun -b my.sleeper
srun: jobid 473 submitted
srun: jobid 473 submitted
adev0: squeue
adev0: squeue
JobId Partition Name User St TimeLim Prio Nodes
JobId Partition Name User St TimeLim
it
Prio Nodes
473 batch my.sleep jette R UNLIMIT 0.99 adev9
473 batch my.sleep jette R UNLIMIT
ED
0.99 adev9
adev0: scancel 473
adev0: scancel 473
adev0: squeue
adev0: squeue
JobId Partition Name User St TimeLim Prio Nodes
JobId Partition Name User St TimeLimit Prio Nodes
</pre>
<p>
Get the SLURM partition and node status.
</p>
</pre>
<p>
Get the SLURM partition and node status.
</p>
<pre>
<pre>
adev0: sinfo
adev0: sinfo
PARTITION NODES STATE CPUS MEMORY TMP_DISK NODES
PARTITION NODES STATE CPUS MEMORY TMP_DISK NODES
...
@@ -183,7 +197,9 @@ PARTITION NODES STATE CPUS MEMORY TMP_DISK NODES
...
@@ -183,7 +197,9 @@ PARTITION NODES STATE CPUS MEMORY TMP_DISK NODES
debug 8 IDLE 2 3448 82306 adev[0-7]
debug 8 IDLE 2 3448 82306 adev[0-7]
batch 1 DOWN 2 3448 82306 adev8
batch 1 DOWN 2 3448 82306 adev8
7 IDLE 2 3448-3458 82306 adev[9-15]
7 IDLE 2 3448-3458 82306 adev[9-15]
</pre>
<p
class=
"footer"
><a
href=
"#top"
>
top
</a></p>
</pre>
<p
class=
"footer"
><a
href=
"#top"
>
top
</a></p>
<h3>
MPI
</h3>
<h3>
MPI
</h3>
<p>
MPI use depends upon the type of MPI being used.
<p>
MPI use depends upon the type of MPI being used.
Instructions for using several varieties of MPI with SLURM are
Instructions for using several varieties of MPI with SLURM are
...
@@ -226,7 +242,7 @@ $MPI_ROOT/bin/mpirun -TCP -srun -N8 ./a.out
...
@@ -226,7 +242,7 @@ $MPI_ROOT/bin/mpirun -TCP -srun -N8 ./a.out
<p><a
href=
"http:://www-unix.mcs.anl.gov/mpi/mpich2/"
>
MPICH2
</a>
jobs
<p><a
href=
"http:://www-unix.mcs.anl.gov/mpi/mpich2/"
>
MPICH2
</a>
jobs
are launched using the
<b>
srun
</b>
command. Just link your program with
are launched using the
<b>
srun
</b>
command. Just link your program with
SLURM's implemenation of the PMI library so that tasks can communication
SLURM's implemen
t
ation of the PMI library so that tasks can communication
host and port information at startup. For example:
host and port information at startup. For example:
<pre>
<pre>
$ mpicc -lXlinker "-lpmi" ...
$ mpicc -lXlinker "-lpmi" ...
...
@@ -249,7 +265,11 @@ the script to SLURM using <span class="commandline">srun</span>
...
@@ -249,7 +265,11 @@ the script to SLURM using <span class="commandline">srun</span>
command with the
<b>
--batch
</b>
option. For example:
command with the
<b>
--batch
</b>
option. For example:
<pre>
<pre>
srun -N2 --batch my.script
srun -N2 --batch my.script
</pre></p>
</pre>
Note that the node count specified with the
<i>
-N
</i>
option indicates
the base partition count.
See
<a
href=
"bluegene.html"
>
BlueGene User and Administrator Guide
</a>
for more information.
</p>
</td>
</td>
</tr>
</tr>
...
@@ -257,7 +277,7 @@ srun -N2 --batch my.script
...
@@ -257,7 +277,7 @@ srun -N2 --batch my.script
<td
colspan=
"3"
><hr>
<p>
For information about this page, contact
<a
href=
"mailto:slurm-dev@lists.llnl.gov"
>
slurm-dev@lists.llnl.gov
</a>
.
</p>
<td
colspan=
"3"
><hr>
<p>
For information about this page, contact
<a
href=
"mailto:slurm-dev@lists.llnl.gov"
>
slurm-dev@lists.llnl.gov
</a>
.
</p>
<p><a
href=
"http://www.llnl.gov/"
><img
align=
middle
src=
"lll.gif"
width=
"32"
height=
"32"
border=
"0"
></a></p>
<p><a
href=
"http://www.llnl.gov/"
><img
align=
middle
src=
"lll.gif"
width=
"32"
height=
"32"
border=
"0"
></a></p>
<p
class=
"footer"
>
UCRL-WEB-213976
<br>
<p
class=
"footer"
>
UCRL-WEB-213976
<br>
Last modified
20 Nov
ember 2005
</p></td>
Last modified
6 Dec
ember 2005
</p></td>
</tr>
</tr>
</table>
</table>
</td>
</td>
...
...
This diff is collapsed.
Click to expand it.
Preview
0%
Loading
Try again
or
attach a new file
.
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Save comment
Cancel
Please
register
or
sign in
to comment