Skip to content
GitLab
Explore
Sign in
Primary navigation
Search or go to…
Project
S
Slurm
Manage
Activity
Members
Labels
Plan
Issues
Issue boards
Milestones
Code
Merge requests
Repository
Branches
Commits
Tags
Repository graph
Compare revisions
Snippets
Build
Pipelines
Jobs
Pipeline schedules
Artifacts
Deploy
Releases
Package Registry
Model registry
Operate
Environments
Terraform modules
Monitor
Incidents
Service Desk
Analyze
Value stream analytics
Contributor analytics
CI/CD analytics
Repository analytics
Model experiments
Help
Help
Support
GitLab documentation
Compare GitLab plans
Community forum
Contribute to GitLab
Provide feedback
Terms and privacy
Keyboard shortcuts
?
Snippets
Groups
Projects
Show more breadcrumbs
tud-zih-energy
Slurm
Commits
773b7121
Commit
773b7121
authored
12 years ago
by
Morris Jette
Browse files
Options
Downloads
Patches
Plain Diff
Minor updates to SLURM/Cray guide, mostly typos
parent
48bf06d8
No related branches found
Branches containing commit
No related tags found
No related merge requests found
Changes
1
Hide whitespace changes
Inline
Side-by-side
Showing
1 changed file
doc/html/cray.shtml
+39
-28
39 additions, 28 deletions
doc/html/cray.shtml
with
39 additions
and
28 deletions
doc/html/cray.shtml
+
39
−
28
View file @
773b7121
...
@@ -2,15 +2,23 @@
...
@@ -2,15 +2,23 @@
<h1>SLURM User and Administrator Guide for Cray Systems</h1>
<h1>SLURM User and Administrator Guide for Cray Systems</h1>
<h2>User Guide</h2>
<ul>
<li><a href="#user_guide">User Guide</a></li>
<li><a href="#admin_guide">Administrator Guide</a></li>
<li><a href="http://www.cray.com">Cray</a></li>
</ul>
<HR SIZE=4>
<h2><a name="user_guide">User Guide</a></h2>
<p>This document describes the unique features of SLURM on Cray computers.
<p>This document describes the unique features of SLURM on Cray computers.
You should be familiar with the SLURM's mode of operation on Linux clusters
You should be familiar with the SLURM's mode of operation on Linux clusters
before studying the differences in Cray system operation described in this
before studying the differences in Cray system operation described in this
document.</p>
document.</p>
<p>Since version 2.3 SLURM is designed to operate as a job scheduler over
Cray's
<p>Since version 2.3
,
SLURM is designed to operate as a job scheduler over
Application Level Placement Scheduler (ALPS).
Cray's
Application Level Placement Scheduler (ALPS).
Use SLURM's <i>sbatch</i> or <i>salloc</i> commands to create a resource
Use SLURM's <i>sbatch</i> or <i>salloc</i> commands to create a resource
allocation in ALPS.
allocation in ALPS.
Then use ALPS' <i>aprun</i> command to launch parallel jobs within the resource
Then use ALPS' <i>aprun</i> command to launch parallel jobs within the resource
...
@@ -33,7 +41,7 @@ and the job steps not being visible, all other SLURM commands will operate
...
@@ -33,7 +41,7 @@ and the job steps not being visible, all other SLURM commands will operate
as expected. Note that in order to build and install the aprun wrapper
as expected. Note that in order to build and install the aprun wrapper
described above, execute "configure" with the <i>--with-srun2aprun</i>
described above, execute "configure" with the <i>--with-srun2aprun</i>
option or add <i>%_with_srun2aprun 1</i> to your <i>~/.rpmmacros</i>
option or add <i>%_with_srun2aprun 1</i> to your <i>~/.rpmmacros</i>
file. This option is set with
rpm
s from Cray.</p>
file. This option is set with
RPM
s from Cray.</p>
<h3>Node naming and node geometry on Cray XT/XE systems</h3>
<h3>Node naming and node geometry on Cray XT/XE systems</h3>
<p>SLURM node names will be of the form "nid#####" where "#####" is a five-digit sequence number.
<p>SLURM node names will be of the form "nid#####" where "#####" is a five-digit sequence number.
...
@@ -166,16 +174,18 @@ zero compute node job requests.</p>
...
@@ -166,16 +174,18 @@ zero compute node job requests.</p>
SLURM partitions explicitly configured with <b>MinNodes=0</b> (the default
SLURM partitions explicitly configured with <b>MinNodes=0</b> (the default
minimum node count for a partition is one compute node).</p>
minimum node count for a partition is one compute node).</p>
<h2>Administrator Guide</h2>
<HR SIZE=4>
<h2><a name="admin_guide">Administrator Guide</a></h2>
<h3>Install supporting RPMs</h3>
<h3>Install supporting RPMs</h3>
<p>The build requires a few -devel RPMs listed below. You can obtain these from
<p>The build requires a few -devel RPMs listed below. You can obtain these from
SuSe/Novell.
SuSe/Novell.
<ul>
<ul>
<li>CLE 2.x uses SuSe SLES 10 packages (
rpm
s may be on the normal
iso
s)</li>
<li>CLE 2.x uses SuSe SLES 10 packages (
RPM
s may be on the normal
ISO
s)</li>
<li>CLE 3.x uses Suse SLES 11 packages (
rpm
s are on the SDK
iso
s, there
<li>CLE 3.x uses Suse SLES 11 packages (
RPM
s are on the SDK
ISO
s, there
are two SDK
iso
files for SDK)</li>
are two SDK
ISO
files for SDK)</li>
</ul></p>
</ul></p>
<p>You can check by logging onto the boot node and running</p>
<p>You can check by logging onto the boot node and running</p>
...
@@ -188,7 +198,7 @@ default: # rpm -qa
...
@@ -188,7 +198,7 @@ default: # rpm -qa
<ul>
<ul>
<li>expat-2.0.xxx</li>
<li>expat-2.0.xxx</li>
<li>libexpat-devel-2.0.xxx</li>
<li>libexpat-devel-2.0.xxx</li>
<li>cray-MySQL-devel-enterprise-5.0.64 (this should be on the Cray
iso
)</li>
<li>cray-MySQL-devel-enterprise-5.0.64 (this should be on the Cray
ISO
)</li>
</ul>
</ul>
<p>For example, loading MySQL can be done like this:</p>
<p>For example, loading MySQL can be done like this:</p>
...
@@ -218,16 +228,16 @@ somewhat from that described in the
...
@@ -218,16 +228,16 @@ somewhat from that described in the
MUNGE Installation Guide</a>.</p>
MUNGE Installation Guide</a>.</p>
<p>Munge is the authentication daemon and needed by SLURM. You can get
<p>Munge is the authentication daemon and needed by SLURM. You can get
m
unge
rpm
s from Cray
u
se the below method to install and test
.
The
M
unge
RPM
s from Cray
. U
se the below method to install and test
it.
The
Cray
m
unge
rpm
installs
m
unge in /opt/munge.</p>
Cray
M
unge
RPM
installs
M
unge in /opt/munge.</p>
<p>If needed copy the
rpm
s over to the boot node</p>
<p>If needed copy the
RPM
s over to the boot node</p>
<pre>
<pre>
login: # scp munge-*.rpm root@boot:/rr/current/software
login: # scp munge-*.rpm root@boot:/rr/current/software
</pre>
</pre>
<p>Install the
rpm
s on the boot node. While this process creates a
<p>Install the
RPM
s on the boot node. While this process creates a
m
unge key it can't
be
use
in
/etc/munge directory. So we make a
M
unge key
,
it can't use
the
/etc/munge directory. So we make a
/opt/munge/key directory instead and create a key there.</p>
/opt/munge/key directory instead and create a key there.</p>
<pre>
<pre>
boot: # xtopview
boot: # xtopview
...
@@ -259,7 +269,7 @@ sdb: # /etc/init.d/munge start
...
@@ -259,7 +269,7 @@ sdb: # /etc/init.d/munge start
</pre>
</pre>
<p>Start the
m
unge daemon and test it.</p>
<p>Start the
M
unge daemon and test it.</p>
<pre>
<pre>
login: # export PATH=/opt/munge/bin:$PATH
login: # export PATH=/opt/munge/bin:$PATH
login: # munge -n
login: # munge -n
...
@@ -267,8 +277,8 @@ MUNGE:AwQDAAAEy341MRViY+LacxYlz+mchKk5NUAGrYLqKRUvYkrR+MJzHTgzSm1JALqJcunWGDU6k3
...
@@ -267,8 +277,8 @@ MUNGE:AwQDAAAEy341MRViY+LacxYlz+mchKk5NUAGrYLqKRUvYkrR+MJzHTgzSm1JALqJcunWGDU6k3
login: # munge -n | unmunge
login: # munge -n | unmunge
</pre>
</pre>
<p>When done, verify network connectivity by executing
(You have to
<p>When done, verify network connectivity by executing
the following (the
have the munged
started on the other-login-host as well):
Munged daemon must be
started on the other-login-host as well):
<ul>
<ul>
<li><i>munge -n | ssh other-login-host /opt/slurm/munge/bin/unmunge</i></li>
<li><i>munge -n | ssh other-login-host /opt/slurm/munge/bin/unmunge</i></li>
</ul>
</ul>
...
@@ -306,9 +316,9 @@ node/31:# emacs -nw /etc/pam.d/common-session
...
@@ -306,9 +316,9 @@ node/31:# emacs -nw /etc/pam.d/common-session
<p>SLURM can be built and installed as on any other computer as described
<p>SLURM can be built and installed as on any other computer as described
<a href="quickstart_admin.html">Quick Start Administrator Guide</a>.
<a href="quickstart_admin.html">Quick Start Administrator Guide</a>.
You can also get current SLURM
rpm
s from Cray. An installation
You can also get current SLURM
RPM
s from Cray. An installation
process for the
rpm
s is described below. The
process for the
RPM
s is described below. The
Cray SLURM
rpm
s install in /opt/slurm.</p>
Cray SLURM
RPM
s install in /opt/slurm.</p>
<p><b>NOTE:</b> By default neither the <i>salloc</i> command or <i>srun</i>
<p><b>NOTE:</b> By default neither the <i>salloc</i> command or <i>srun</i>
command wrapper can be executed as a background process. This is done for two
command wrapper can be executed as a background process. This is done for two
...
@@ -323,12 +333,13 @@ using terminal foreground process group IDs</li>
...
@@ -323,12 +333,13 @@ using terminal foreground process group IDs</li>
</ol>
</ol>
<p>You can optionally enable <i>salloc</i> and <i>srun</i> to execute as
<p>You can optionally enable <i>salloc</i> and <i>srun</i> to execute as
background processes by using the configure option
background processes by using the configure option
<i>"--enable-salloc-background"</i> (.rpmmacros option <i>"%_with_salloc_backgroud 1"</i>, however doing will result in failed
<i>"--enable-salloc-background"</i> (or the .rpmmacros option
<i>"%_with_salloc_background 1"</i>), however doing will result in failed
resource allocations
resource allocations
(<i>error: Failed to allocate resources: Requested reservation is in use</i>)
(<i>error: Failed to allocate resources: Requested reservation is in use</i>)
if not executed sequentially and
if not executed sequentially and
increase the likel
y
hood of orphaned processes. Specifically request
increase the likel
i
hood of orphaned processes. Specifically request
this version when requesting
rpm
s from Cray as this is not on by default.</p>
this version when requesting
RPM
s from Cray as this is not on by default.</p>
<!-- Example:
<!-- Example:
Modify srun script or ask user to execute "/usr/bin/setsid"
Modify srun script or ask user to execute "/usr/bin/setsid"
before salloc or srun command -->
before salloc or srun command -->
...
@@ -336,12 +347,12 @@ this version when requesting rpms from Cray as this is not on by default.</p>
...
@@ -336,12 +347,12 @@ this version when requesting rpms from Cray as this is not on by default.</p>
salloc spawns zsh, zsh spawns bash, etc.
salloc spawns zsh, zsh spawns bash, etc.
when salloc terminates, bash becomes a child of init -->
when salloc terminates, bash becomes a child of init -->
<p>If needed copy the
rpm
s over to the boot node.</p>
<p>If needed copy the
RPM
s over to the boot node.</p>
<pre>
<pre>
login: # scp slurm-*.rpm root@boot:/rr/current/software
login: # scp slurm-*.rpm root@boot:/rr/current/software
</pre>
</pre>
<p>Install the
rpm
s on the boot node.</p>
<p>Install the
RPM
s on the boot node.</p>
<pre>
<pre>
boot: # xtopview
boot: # xtopview
default: # rpm -ivh /software/slurm-*.x86_64.rpm
default: # rpm -ivh /software/slurm-*.x86_64.rpm
...
@@ -362,7 +373,7 @@ configured, but will be set by SLURM using data from ALPS.
...
@@ -362,7 +373,7 @@ configured, but will be set by SLURM using data from ALPS.
<i>smap</i> and <i>sview</i> commands.
<i>smap</i> and <i>sview</i> commands.
<i>NodeHostName</i> will be set to the node's component label.
<i>NodeHostName</i> will be set to the node's component label.
The format of the component label is "c#-#c#s#n#" where the "#" fields
The format of the component label is "c#-#c#s#n#" where the "#" fields
represent in order: cabinet, row, ca
t
e, blade or slot, and node.
represent in order: cabinet, row, ca
g
e, blade or slot, and node.
For example "c0-1c2s5n3" is cabinet 0, row 1, cage 3, slot 5 and node 3.</p>
For example "c0-1c2s5n3" is cabinet 0, row 1, cage 3, slot 5 and node 3.</p>
<p>The <i>slurmd</i> daemons will not execute on the compute nodes, but will
<p>The <i>slurmd</i> daemons will not execute on the compute nodes, but will
...
@@ -569,7 +580,7 @@ script will insure that higher limits are possible. Copy the file
...
@@ -569,7 +580,7 @@ script will insure that higher limits are possible. Copy the file
<i>contribs/cray/etc_sysconfig_slurm</i> into <i>/etc/sysconfig/slurm</i>
<i>contribs/cray/etc_sysconfig_slurm</i> into <i>/etc/sysconfig/slurm</i>
for these limits to take effect. This script is executed from
for these limits to take effect. This script is executed from
<i>/etc/init.d/slurm</i>, which is typically executed to start the SLURM
<i>/etc/init.d/slurm</i>, which is typically executed to start the SLURM
daemons. An excerpt of <i>contribs/cray/etc_sysconfig_slurm</i>is shown
daemons. An excerpt of <i>contribs/cray/etc_sysconfig_slurm</i>
is shown
below.</p>
below.</p>
<pre>
<pre>
...
@@ -631,6 +642,6 @@ allocation.</p>
...
@@ -631,6 +642,6 @@ allocation.</p>
<p class="footer"><a href="#top">top</a></p>
<p class="footer"><a href="#top">top</a></p>
<p style="text-align:center;">Last modified 17 Sept 2012</p></td>
<p style="text-align:center;">Last modified 17 Sept
ember
2012</p></td>
<!--#include virtual="footer.txt"-->
<!--#include virtual="footer.txt"-->
This diff is collapsed.
Click to expand it.
Preview
0%
Loading
Try again
or
attach a new file
.
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Save comment
Cancel
Please
register
or
sign in
to comment