Skip to content
GitLab
Explore
Sign in
Primary navigation
Search or go to…
Project
S
Slurm
Manage
Activity
Members
Labels
Plan
Issues
Issue boards
Milestones
Code
Merge requests
Repository
Branches
Commits
Tags
Repository graph
Compare revisions
Snippets
Build
Pipelines
Jobs
Pipeline schedules
Artifacts
Deploy
Releases
Package Registry
Model registry
Operate
Environments
Terraform modules
Monitor
Incidents
Service Desk
Analyze
Value stream analytics
Contributor analytics
CI/CD analytics
Repository analytics
Model experiments
Help
Help
Support
GitLab documentation
Compare GitLab plans
Community forum
Contribute to GitLab
Provide feedback
Terms and privacy
Keyboard shortcuts
?
Snippets
Groups
Projects
Show more breadcrumbs
tud-zih-energy
Slurm
Commits
8b73640c
Commit
8b73640c
authored
20 years ago
by
Moe Jette
Browse files
Options
Downloads
Patches
Plain Diff
Add description of inactive job purging.
parent
7cdcf154
No related branches found
No related tags found
No related merge requests found
Changes
1
Hide whitespace changes
Inline
Side-by-side
Showing
1 changed file
doc/html/faq.html
+20
-2
20 additions, 2 deletions
doc/html/faq.html
with
20 additions
and
2 deletions
doc/html/faq.html
+
20
−
2
View file @
8b73640c
...
@@ -15,8 +15,8 @@ Linux clusters, high-performance computing, Livermore Computing">
...
@@ -15,8 +15,8 @@ Linux clusters, high-performance computing, Livermore Computing">
<meta
name=
"copyright"
<meta
name=
"copyright"
content=
"This document is copyrighted U.S.
content=
"This document is copyrighted U.S.
Department of Energy under Contract W-7405-Eng-48"
>
Department of Energy under Contract W-7405-Eng-48"
>
<meta
name=
"Author"
content=
"Mo
e
Jette"
>
<meta
name=
"Author"
content=
"Mo
rris
Jette"
>
<meta
name=
"email"
content=
"jette@llnl.gov"
>
<meta
name=
"email"
content=
"jette
1
@llnl.gov"
>
<meta
name=
"Classification"
<meta
name=
"Classification"
content=
"DOE:DOE Web sites via organizational
content=
"DOE:DOE Web sites via organizational
structure:Laboratories and Other Field Facilities"
>
structure:Laboratories and Other Field Facilities"
>
...
@@ -58,6 +58,7 @@ structure:Laboratories and Other Field Facilities">
...
@@ -58,6 +58,7 @@ structure:Laboratories and Other Field Facilities">
<li><a
href=
"#pending"
>
Why is my job not running?
</a></li>
<li><a
href=
"#pending"
>
Why is my job not running?
</a></li>
<li><a
href=
"#sharing"
>
Why does the srun --overcommit option not permit multiple jobs
<li><a
href=
"#sharing"
>
Why does the srun --overcommit option not permit multiple jobs
to run on nodes?
</a></li>
to run on nodes?
</a></li>
<li><a
href=
"#purge"
>
Why is my job killed prematurely?
</a></li>
</ol>
</ol>
<p><a
name=
"comp"
><b>
1. Why is my job/node in
"
completing
"
state?
</b></a><br>
<p><a
name=
"comp"
><b>
1. Why is my job/node in
"
completing
"
state?
</b></a><br>
When a job is terminating, both the job and its nodes enter the state
"
completing.
"
When a job is terminating, both the job and its nodes enter the state
"
completing.
"
...
@@ -125,6 +126,23 @@ four tasks to use.
...
@@ -125,6 +126,23 @@ four tasks to use.
of srun's
<b>
--shared
</b>
option in conjunction with the
<b>
Shared
</b>
parameter
of srun's
<b>
--shared
</b>
option in conjunction with the
<b>
Shared
</b>
parameter
in SLURM's partition configuration. See the man pages for srun and slurm.conf for
in SLURM's partition configuration. See the man pages for srun and slurm.conf for
more information.
more information.
<p><a
name=
"purge"
><b>
5. Why is my job killed prematurely?
</b></a><br>
SLURM has a job purging mechanism to remove inactive jobs (resource allocations)
before reaching its time limit, which could be infinite.
This inactivity time limit is configurable by the system administrator.
You can check it's value with the command
<blockquite>
<p><span
class=
"commandline"
>
scontrol show config | grep InactiveLimit
</span></p>
</blockquote>
The value of InactiveLimit is in seconds.
A zero value indicates that job purging is disabled.
A job is considered inactive if it has no active job steps or if the srun
command creating the job is not responding.
In the case of a batch job, the srun command terminates after the job script
is submitted.
Therefore batch job pre- and post-processing is limited to the InactiveLimit.
Contact your system administrator if you believe the InactiveLimit value
should be changed.
</td>
</td>
</tr>
</tr>
<tr>
<tr>
...
...
This diff is collapsed.
Click to expand it.
Preview
0%
Loading
Try again
or
attach a new file
.
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Save comment
Cancel
Please
register
or
sign in
to comment