diff --git a/doc/man/man5/cgroup.conf.5 b/doc/man/man5/cgroup.conf.5 index 2bb2846e4528f9d03e5dc7baadf3671eb918a125..f9d30e367dcb0f31fac10774d73bf96d04de5bfe 100644 --- a/doc/man/man5/cgroup.conf.5 +++ b/doc/man/man5/cgroup.conf.5 @@ -116,7 +116,13 @@ which case the job's RAM limit will be set to its swap space limit if \fBConstrainSwapSpace\fR is set to "yes". Also see \fBAllowedSwapSpace\fR, \fBAllowedRAMSpace\fR and \fBConstrainSwapSpace\fR. -NOTE: When enabled, ConstrainRAMSpace can lead to a noticeable decline in + +\fBNOTE\fR: When using \fBConstrainRAMSpace\fR, if a process tries to consume +more memory than is available, the step that process is running in will be +killed. This differs from the behavior when using \fBOverMemoryKill\fR, +where just the offending process will be killed. + +\fBNOTE\fR: When enabled, ConstrainRAMSpace can lead to a noticeable decline in per-node job throughout. Sites with high-throughput requirements should carefully weigh the tradeoff between per-node throughput, versus potential problems that can arise from unconstrained memory usage on the node. See diff --git a/doc/man/man5/slurm.conf.5 b/doc/man/man5/slurm.conf.5 index 29dc090caccb355c97c78a0b4249ecc2fe4bd78b..0c8a846f0222d7dd8f1627b4c4c8ac107d69f701 100644 --- a/doc/man/man5/slurm.conf.5 +++ b/doc/man/man5/slurm.conf.5 @@ -1207,8 +1207,11 @@ allocation may affect other processes and/or machine health. task/cgroup as a TaskPlugin and making use of ConstrainRAMSpace=yes in the cgroup.conf instead of using this JobAcctGather mechanism for memory enforcement. With OverMemoryKill, memory limit is applied against each process -individually and is not applied to the step as a whole as it is with -ConstrainRAMSpace=yes. Using JobAcctGather is polling based and there is a +individually and is not applied to the step as a whole. This means that when +jobs have a process that consumes too much memory, the process will be killed +but the step will continue to run. When using cgroups with +ConstrainRAMSpace=yes, a process that consumes too much memory will result in +the job step being killed. Using JobAcctGather is polling based and there is a delay before a job is killed, which could lead to system Out of Memory events. .RE