Skip to content
GitLab
Explore
Sign in
Primary navigation
Search or go to…
Project
S
Slurm
Manage
Activity
Members
Labels
Plan
Issues
Issue boards
Milestones
Code
Merge requests
Repository
Branches
Commits
Tags
Repository graph
Compare revisions
Snippets
Build
Pipelines
Jobs
Pipeline schedules
Artifacts
Deploy
Releases
Package Registry
Model registry
Operate
Environments
Terraform modules
Monitor
Incidents
Service Desk
Analyze
Value stream analytics
Contributor analytics
CI/CD analytics
Repository analytics
Model experiments
Help
Help
Support
GitLab documentation
Compare GitLab plans
Community forum
Contribute to GitLab
Provide feedback
Terms and privacy
Keyboard shortcuts
?
Snippets
Groups
Projects
Show more breadcrumbs
tud-zih-energy
Slurm
Commits
82bbae66
Commit
82bbae66
authored
3 years ago
by
Ben Roberts
Committed by
Danny Auble
3 years ago
Browse files
Options
Downloads
Patches
Plain Diff
Docs - Clarify different OOM behavior for cgroups vs polling
Bug 11318
parent
68ee435c
No related branches found
No related tags found
No related merge requests found
Changes
2
Hide whitespace changes
Inline
Side-by-side
Showing
2 changed files
doc/man/man5/cgroup.conf.5
+7
-1
7 additions, 1 deletion
doc/man/man5/cgroup.conf.5
doc/man/man5/slurm.conf.5
+5
-2
5 additions, 2 deletions
doc/man/man5/slurm.conf.5
with
12 additions
and
3 deletions
doc/man/man5/cgroup.conf.5
+
7
−
1
View file @
82bbae66
...
...
@@ -116,7 +116,13 @@ which case the job's RAM limit will be set to its swap space limit if
\fBConstrainSwapSpace\fR is set to "yes".
Also see \fBAllowedSwapSpace\fR, \fBAllowedRAMSpace\fR and
\fBConstrainSwapSpace\fR.
NOTE: When enabled, ConstrainRAMSpace can lead to a noticeable decline in
\fBNOTE\fR: When using \fBConstrainRAMSpace\fR, if a process tries to consume
more memory than is available, the step that process is running in will be
killed. This differs from the behavior when using \fBOverMemoryKill\fR,
where just the offending process will be killed.
\fBNOTE\fR: When enabled, ConstrainRAMSpace can lead to a noticeable decline in
per-node job throughout. Sites with high-throughput requirements should
carefully weigh the tradeoff between per-node throughput, versus potential
problems that can arise from unconstrained memory usage on the node. See
...
...
This diff is collapsed.
Click to expand it.
doc/man/man5/slurm.conf.5
+
5
−
2
View file @
82bbae66
...
...
@@ -1207,8 +1207,11 @@ allocation may affect other processes and/or machine health.
task/cgroup as a TaskPlugin and making use of ConstrainRAMSpace=yes in the
cgroup.conf instead of using this JobAcctGather mechanism for memory
enforcement. With OverMemoryKill, memory limit is applied against each process
individually and is not applied to the step as a whole as it is with
ConstrainRAMSpace=yes. Using JobAcctGather is polling based and there is a
individually and is not applied to the step as a whole. This means that when
jobs have a process that consumes too much memory, the process will be killed
but the step will continue to run. When using cgroups with
ConstrainRAMSpace=yes, a process that consumes too much memory will result in
the job step being killed. Using JobAcctGather is polling based and there is a
delay before a job is killed, which could lead to system Out of Memory events.
.RE
...
...
This diff is collapsed.
Click to expand it.
Preview
0%
Loading
Try again
or
attach a new file
.
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Save comment
Cancel
Please
register
or
sign in
to comment