Skip to content
Snippets Groups Projects
Commit 1e578c90 authored by Brian Christiansen's avatar Brian Christiansen
Browse files

Merge remote-tracking branch 'origin/slurm-14.03' into slurm-14.11

Conflicts:
	doc/man/man1/scontrol.1
parents b4decbfa 014021ca
No related branches found
No related tags found
No related merge requests found
......@@ -31,7 +31,7 @@ cpu consumption, and memory use from a jobacct_gather plugin.
Data from other sources may be added in the future.</p>
<p>The data is collected into a file on a shared file system for each step on
each allocated node of a job and then merged into a HDF5 file.
each allocated node of a job and then merged into an HDF5 file.
Individual files on a shared file system was chosen because it is possible
that the data is voluminous so solutions that pass data to the Slurm control
daemon via RPC may not scale to very large clusters or jobs with
......@@ -39,7 +39,7 @@ many allocated nodes.</p>
<p>A separate <a href="acct_gather_profile_plugins.html">
SLURM Profile Accounting Plugin API (AcctGatherProfileType)</a> documents how
write other Profile Accounting plugins.</P>
to write other Profile Accounting plugins.</P>
<a id="Administration"></a>
<h2>Administration</h2>
......@@ -57,13 +57,13 @@ option in the acct_gather.conf file. The directory will be created by
Slurm if it doesn't exist. Each user will have
their own directory created in the ProfileHDF5Dir which contains
the HDF5 files. All the directories and files are created by the
SlurmdUser which is usually root. The user specific directories as well
as the files inside are chowned to the user running the job so they
SlurmdUser which is usually root. The user specific directories, as well
as the files inside, are chowned to the user running the job so they
can access the files. Since user root is usually creating these
files/directories a root squashed file system will not work for
the ProfileHDF5Dir.</p>
<p>Each user that creates a profile will have a subdirector to the profile
<p>Each user that creates a profile will have a subdirectory in the profile
directory that has read/write permission only for the user.</p>
</span>
</div>
......@@ -85,14 +85,14 @@ This sets the sampling frequency for data types:
</div>
</div>
<div style="margin-left: 20px;">
<h4>act_gather.conf parameters</h4>
<h4>acct_gather.conf parameters</h4>
<div style="margin-left: 20px;">
<p>These parameters are directly used by the HDF5 Profile Plugin.</p>
<dl>
<dt><b>ProfileHDF5Dir</b> = &lt;path&gt;</dt>
<p>
This parameter is the path to the shared folder into which the
acct_gather_profile plugin will write detailed data as a HDF5 file.
acct_gather_profile plugin will write detailed data as an HDF5 file.
The directory is assumed to be on a file system shared by the controller and
all compute nodes. This is a required parameter.<p>
......@@ -207,7 +207,7 @@ to be attached to groups to store application defined properties.</p>
<p>There are commodity programs, notably
<a href="http://www.hdfgroup.org/hdf-java-html/hdfview/index.html">
HDFView</a> for viewing and manipulating these files.
HDFView</a>, for viewing and manipulating these files.
<p>Below is a screen shot from HDFView expanding the job tree and showing the
attributes for a specific task.</p>
......
......@@ -288,7 +288,7 @@ The job_list argument is a comma separated list of job IDs.
Requeue a running, suspended or finished SLURM batch job into pending state,
moreover the job is put in held state (priority zero).
The job_list argument is a comma separated list of job IDs.
A held job can be release using scontrol to reset its priority (e.g.
A held job can be released using scontrol to reset its priority (e.g.
"scontrol release <job_id>"). The command accepts the following option:
.RS
.TP 12
......
......@@ -64,7 +64,7 @@ details.
\fBsacct\fR(1), \fBsacctmgr\fR(1), \fBsalloc\fR(1), \fBsattach\fR(1),
\fBsbatch\fR(1), \fBsbcast\fR(1), \fBscancel\fR(1), \fBscontrol\fR(1),
\fBsinfo\fR(1), \fBsmap\fR(1), \fBsqueue\fR(1), \fBsreport\fR(1),
\fBsrun\fR(1), \fBsshare\fR(1), \fBsstate\fR(1), \fBstrigger\fR(1),
\fBsrun\fR(1), \fBsshare\fR(1), \fBsstat\fR(1), \fBstrigger\fR(1),
\fBsview\fR(1),
\fBbluegene.conf\fR(5), \fBslurm.conf\fR(5), \fBslurmdbd.conf\fR(5),
\fBwiki.conf\fR(5),
......
......@@ -1686,13 +1686,13 @@ enable user login, etc. By default there is no prolog. Any configured script
is expected to complete execution quickly (in less time than
\fBMessageTimeout\fR).
If the prolog fails (returns a non\-zero exit code), this will result in the
node being set to a DRAIN state and the job requeued to executed on another node.
node being set to a DRAIN state and the job being requeued in a held state.
See \fBProlog and Epilog Scripts\fR for more information.
.TP
\fBPrologFlags\fR
Flags to control the Prolog behavior. By default no flags are set.
Currently the only option defined is:
Currently the options are:
.RS
.TP 6
\fBAlloc\fR
......@@ -1839,7 +1839,7 @@ NOTE: This configuration option does not apply to IBM BlueGene systems.
.TP
\fBReconfigFlags\fR
Flags to control various actions that may be taken when an "scontrol
reconfig" command is issued. Currently the only option defined is:
reconfig" command is issued. Currently the options are:
.RS
.TP 17
\fBKeepPartInfo\fR
......@@ -4023,7 +4023,7 @@ node being set to a DRAIN state.
If the EpilogSlurmctld fails (returns a non\-zero exit code), this will only
be logged.
If the Prolog fails (returns a non\-zero exit code), this will result in the
node being set to a DRAIN state and the job requeued to executed on another node.
node being set to a DRAIN state and the job being requeued in a held state.
If the PrologSlurmctld fails (returns a non\-zero exit code), this will result
in the job requeued to executed on another node if possible. Only batch jobs
can be requeued. Interactive jobs (salloc and srun) will be cancelled if the
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment