Skip to content
GitLab
Explore
Sign in
Primary navigation
Search or go to…
Project
S
Slurm
Manage
Activity
Members
Labels
Plan
Issues
Issue boards
Milestones
Code
Merge requests
Repository
Branches
Commits
Tags
Repository graph
Compare revisions
Snippets
Build
Pipelines
Jobs
Pipeline schedules
Artifacts
Deploy
Releases
Package Registry
Model registry
Operate
Environments
Terraform modules
Monitor
Incidents
Service Desk
Analyze
Value stream analytics
Contributor analytics
CI/CD analytics
Repository analytics
Model experiments
Help
Help
Support
GitLab documentation
Compare GitLab plans
Community forum
Contribute to GitLab
Provide feedback
Terms and privacy
Keyboard shortcuts
?
Snippets
Groups
Projects
Show more breadcrumbs
tud-zih-energy
Slurm
Commits
4f14cf7e
Commit
4f14cf7e
authored
16 years ago
by
Danny Auble
Browse files
Options
Downloads
Patches
Plain Diff
added a better rc from step_launch and finished accounting
parent
62506dd5
No related branches found
Branches containing commit
No related tags found
No related merge requests found
Changes
2
Hide whitespace changes
Inline
Side-by-side
Showing
2 changed files
doc/html/accounting.shtml
+19
-17
19 additions, 17 deletions
doc/html/accounting.shtml
src/api/step_launch.c
+5
-0
5 additions, 0 deletions
src/api/step_launch.c
with
24 additions
and
17 deletions
doc/html/accounting.shtml
+
19
−
17
View file @
4f14cf7e
...
@@ -233,7 +233,8 @@ components. A value of "auth/munge" is recommended.</li>
...
@@ -233,7 +233,8 @@ components. A value of "auth/munge" is recommended.</li>
<li><b>DbdHost</b>:
<li><b>DbdHost</b>:
The name of the machine where the Slurm Database Daemon is executed.
The name of the machine where the Slurm Database Daemon is executed.
This should be a node name without the full domain name (e.g. "lx0001").
This should be a node name without the full domain name (e.g. "lx0001").
This defaults to <i>localhost</i>.</li>
This defaults to <i>localhost</i> but should be supplied to avoid a
warning message.</li>
<li><b>DbdPort</b>:
<li><b>DbdPort</b>:
The port number that the Slurm Database Daemon (slurmdbd) listens
The port number that the Slurm Database Daemon (slurmdbd) listens
...
@@ -251,19 +252,21 @@ The default value is none (performs logging via syslog).</li>
...
@@ -251,19 +252,21 @@ The default value is none (performs logging via syslog).</li>
Identifies the places in which to look for SLURM plugins.
Identifies the places in which to look for SLURM plugins.
This is a colon-separated list of directories, like the PATH
This is a colon-separated list of directories, like the PATH
environment variable.
environment variable.
The default value is
"/usr/local
/lib/slurm".</li>
The default value is
the prefix given at configure time + "
/lib/slurm".</li>
<li><b>SlurmUser</b>:
<li><b>SlurmUser</b>:
The name of the user that the <i>slurmctld</i> daemon executes as.
The name of the user that the <i>slurmctld</i> daemon executes as.
This user must exist on the machine executing the Slurm Database Daemon
This user must exist on the machine executing the Slurm Database Daemon
and have the same user ID as the hosts on which <i>slurmctld</i> execute.
and have the same user ID as the hosts on which <i>slurmctld</i> execute.
For security purposes, a user other than "root" is recommended.
For security purposes, a user other than "root" is recommended.
The default value is "root". </li>
The default value is "root". This name should also be the same slurm
user on all clusters reporting to the DBD.</li>
<li><b>StorageHost</b>:
<li><b>StorageHost</b>:
Define the name of the host the database is running where we are going
Define the name of the host the database is running where we are going
to store the data.
to store the data.
Ideally this should be the host on which slurmdbd executes.</li>
Ideally this should be the host on which slurmdbd executes. But could
be a different machine.</li>
<li><b>StorageLoc</b>:
<li><b>StorageLoc</b>:
Specifies the name of the database where accounting
Specifies the name of the database where accounting
...
@@ -302,13 +305,15 @@ with to store the job accounting data.</li>
...
@@ -302,13 +305,15 @@ with to store the job accounting data.</li>
<h2>Tools</h2>
<h2>Tools</h2>
<p>There are
two
tools available to work with accounting data,
<p>There are
a few
tools available to work with accounting data,
<b>sacct</b>
and
<b>sacctmgr</b>.
<b>sacct</b>
,
<b>sacctmgr</b>
, and <b>sreport</b>
.
Both of t
hese tools
wi
ll get or set data through the SlurmDBD daemon.
T
hese tools
a
ll get or set data through the SlurmDBD daemon.
<br>
Sacct is used to generate accounting report for both running and
Sacct is used to generate accounting report for both running and
completed jobs.
completed jobs.
<br>
Sacctmgr is used to manage associations in the database:
Sacctmgr is used to manage associations in the database:
add or remove clusters, add or remove users, etc.
add or remove clusters, add or remove users, etc.<br>
Sreport is used to generate various reports on usage collected over a
given time period.<br>
See the man pages for each command for more information.</p>
See the man pages for each command for more information.</p>
<p>Web interfaces with graphical output is currently under
<p>Web interfaces with graphical output is currently under
...
@@ -511,14 +516,11 @@ execute line:</p>
...
@@ -511,14 +516,11 @@ execute line:</p>
<pre>
<pre>
sacctmgr remove user where default=test
sacctmgr remove user where default=test
</pre>
</pre>
Note: In most cases when removing entities the record of their
existance is still kept around only marked deleted. If an entity has
existed for less than 1 day the entity will be removed completely.
This is for the case of typos and such.
<h2>Node State Information</h2>
<p style="text-align: center;">Last modified 27 June 2008</p>
<p>Node state information is also recorded in the database.
Whenever a node goes DOWN or becomes DRAINED that event is
logged along with the node's <i>Reason</i> field.
This can be used to generate various reports.
<p style="text-align: center;">Last modified 25 March 2008</p>
</ul></body></html>
</ul></body></html>
This diff is collapsed.
Click to expand it.
src/api/step_launch.c
+
5
−
0
View file @
4f14cf7e
...
@@ -1011,6 +1011,7 @@ static int _launch_tasks(slurm_step_ctx_t *ctx,
...
@@ -1011,6 +1011,7 @@ static int _launch_tasks(slurm_step_ctx_t *ctx,
ListIterator
ret_itr
;
ListIterator
ret_itr
;
ret_data_info_t
*
ret_data
=
NULL
;
ret_data_info_t
*
ret_data
=
NULL
;
int
rc
=
SLURM_SUCCESS
;
int
rc
=
SLURM_SUCCESS
;
int
tot_rc
=
SLURM_SUCCESS
;
debug
(
"Entering _launch_tasks"
);
debug
(
"Entering _launch_tasks"
);
if
(
ctx
->
verbose_level
)
{
if
(
ctx
->
verbose_level
)
{
...
@@ -1048,6 +1049,7 @@ static int _launch_tasks(slurm_step_ctx_t *ctx,
...
@@ -1048,6 +1049,7 @@ static int _launch_tasks(slurm_step_ctx_t *ctx,
error
(
"Task launch failed on node %s: %m"
,
error
(
"Task launch failed on node %s: %m"
,
ret_data
->
node_name
);
ret_data
->
node_name
);
rc
=
SLURM_ERROR
;
rc
=
SLURM_ERROR
;
tot_rc
=
rc
;
}
else
{
}
else
{
#if 0 /* only for debugging, might want to make this a callback */
#if 0 /* only for debugging, might want to make this a callback */
errno = ret_data->err;
errno = ret_data->err;
...
@@ -1058,6 +1060,9 @@ static int _launch_tasks(slurm_step_ctx_t *ctx,
...
@@ -1058,6 +1060,9 @@ static int _launch_tasks(slurm_step_ctx_t *ctx,
}
}
list_iterator_destroy
(
ret_itr
);
list_iterator_destroy
(
ret_itr
);
list_destroy
(
ret_list
);
list_destroy
(
ret_list
);
if
(
tot_rc
!=
SLURM_SUCESS
)
return
tot_rc
;
return
rc
;
return
rc
;
}
}
...
...
This diff is collapsed.
Click to expand it.
Preview
0%
Loading
Try again
or
attach a new file
.
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Save comment
Cancel
Please
register
or
sign in
to comment