Skip to content
GitLab
Explore
Sign in
Primary navigation
Search or go to…
Project
S
Slurm
Manage
Activity
Members
Labels
Plan
Issues
Issue boards
Milestones
Code
Merge requests
Repository
Branches
Commits
Tags
Repository graph
Compare revisions
Snippets
Build
Pipelines
Jobs
Pipeline schedules
Artifacts
Deploy
Releases
Package Registry
Model registry
Operate
Environments
Terraform modules
Monitor
Incidents
Service Desk
Analyze
Value stream analytics
Contributor analytics
CI/CD analytics
Repository analytics
Model experiments
Help
Help
Support
GitLab documentation
Compare GitLab plans
Community forum
Contribute to GitLab
Provide feedback
Terms and privacy
Keyboard shortcuts
?
Snippets
Groups
Projects
Show more breadcrumbs
tud-zih-energy
Slurm
Commits
a5e40f9b
Commit
a5e40f9b
authored
15 years ago
by
Don Lipari
Browse files
Options
Downloads
Patches
Plain Diff
Minor addition to quickstart_admin.shtml
parent
1dbb2d17
No related branches found
No related tags found
No related merge requests found
Changes
1
Hide whitespace changes
Inline
Side-by-side
Showing
1 changed file
doc/html/quickstart_admin.shtml
+34
-11
34 additions, 11 deletions
doc/html/quickstart_admin.shtml
with
34 additions
and
11 deletions
doc/html/quickstart_admin.shtml
+
34
−
11
View file @
a5e40f9b
...
@@ -150,15 +150,20 @@ Some macro definitions that may be used in building SLURM include:
...
@@ -150,15 +150,20 @@ Some macro definitions that may be used in building SLURM include:
<p class="footer"><a href="#top">top</a></p>
<p class="footer"><a href="#top">top</a></p>
<h2>Daemons</h2>
<h2>Daemons</h2>
<p><b>slurmctld</b> is sometimes called the "controller" daemon. It
orchestrates SLURM activities, including queuing of job, monitoring node state,
<p><b>slurmctld</b> is sometimes called the "controller"
and allocating resources (nodes) to jobs. There is an optional backup controller
daemon. It orchestrates SLURM activities, including queuing of jobs,
that automatically assumes control in the event the primary controller fails.
monitoring node states, and allocating resources to jobs. There is an
The primary controller resumes control whenever it is restored to service. The
optional backup controller that automatically assumes control in the
controller saves its state to disk whenever there is a change.
event the primary controller fails (see the <a href="#HA">High
This state can be recovered by the controller at startup time.
Availability</a> section below). The primary controller resumes
State changes are saved so that jobs and other state can be preserved when
control whenever it is restored to service. The controller saves its
controller moves (to or from backup controller) or is restarted.</p>
state to disk whenever there is a change in state (see
"StateSaveLocation" in <a href="#Config">Configuration</a>
section below). This state can be recovered by the controller at
startup time. State changes are saved so that jobs and other state
information can be preserved when the controller moves (to or from a
backup controller) or is restarted.</p>
<p>We recommend that you create a Unix user <i>slurm</i> for use by
<p>We recommend that you create a Unix user <i>slurm</i> for use by
<b>slurmctld</b>. This user name will also be specified using the
<b>slurmctld</b>. This user name will also be specified using the
...
@@ -186,6 +191,24 @@ A file <b>etc/init.d/slurm</b> is provided for this purpose.
...
@@ -186,6 +191,24 @@ A file <b>etc/init.d/slurm</b> is provided for this purpose.
This script accepts commands <b>start</b>, <b>startclean</b> (ignores
This script accepts commands <b>start</b>, <b>startclean</b> (ignores
all saved state), <b>restart</b>, and <b>stop</b>.</p>
all saved state), <b>restart</b>, and <b>stop</b>.</p>
<h3><a name="HA"></a>High Availability</h3>
<p>A backup controller can be configured (see
"BackupController" in the <a
href="#Config">Configuration</a> section below) to take over for the
primary slurmctld if it ever fails. The backup controller should be
hosted on a node different from the node hosting the slurmctld.
However, both hosts should mount a common file system containing the
state information (see "StateSaveLocation" in the <a
href="#Config">Configuration</a> section below).</p>
<p>The backup controller detects when the primary fails and takes over
for it. When the primary returns to service, it notifies the backup.
The backup then saves state and returns to backup mode. The primary
reads the saved state and resumes normal operation. Other than a
brief period of non-responsiveness, the transition back and forth
should go undetected.</p>
<h2>Infrastructure</h2>
<h2>Infrastructure</h2>
<h3>User and Group Identification</h3>
<h3>User and Group Identification</h3>
<p>There must be a uniform user and group name space (including
<p>There must be a uniform user and group name space (including
...
@@ -326,7 +349,7 @@ even those allocated to other users.</p>
...
@@ -326,7 +349,7 @@ even those allocated to other users.</p>
<p class="footer"><a href="#top">top</a></p>
<p class="footer"><a href="#top">top</a></p>
<h2>Configuration</h2>
<h2>
<a name="Config"></a>
Configuration</h2>
<p>The SLURM configuration file includes a wide variety of parameters.
<p>The SLURM configuration file includes a wide variety of parameters.
This configuration file must be available on each node of the cluster and
This configuration file must be available on each node of the cluster and
must have consistent contents. A full
must have consistent contents. A full
...
@@ -641,6 +664,6 @@ Contents of major releases are also described in the RELEASE_NOTES file.
...
@@ -641,6 +664,6 @@ Contents of major releases are also described in the RELEASE_NOTES file.
</pre> <p class="footer"><a href="#top">top</a></p>
</pre> <p class="footer"><a href="#top">top</a></p>
<p style="text-align:center;">Last modified
28 March
2009</p>
<p style="text-align:center;">Last modified
1 December
2009</p>
<!--#include virtual="footer.txt"-->
<!--#include virtual="footer.txt"-->
This diff is collapsed.
Click to expand it.
Preview
0%
Loading
Try again
or
attach a new file
.
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Save comment
Cancel
Please
register
or
sign in
to comment