Skip to content
Snippets Groups Projects
Commit 9f15e67a authored by Danny Auble's avatar Danny Auble
Browse files

Merge pull request #61 from ryanbcox/slurm-2.6

use mem and memsw failcnt, check for existence

Thanks Ryan.  I'll let you know how it goes.
parents 51862f56 68a96025
No related branches found
No related tags found
No related merge requests found
......@@ -28,10 +28,20 @@ the cgroup.</li>
<li>additional state objects specific to each subsystem.</li>
</ul>
</ul>
<p><b>NOTE:</b> There can be a serious performance problem with memory cgroups
<h2>General Usage Notes</h2>
<ul>
<li>There can be a serious performance problem with memory cgroups
on conventional multi-socket, multi-core nodes in kernels prior to 2.6.38 due
to contention between processors for a spinlock. This problem seems to have
been completely fixed in the 2.6.38 kernel.</p>
been completely fixed in the 2.6.38 kernel.</li>
<li>Debian and derivatives (e.g. Ubuntu) usually exclude the memory and memsw
(swap) cgroups by default. To include them, add the following parameters to
the kernel command line: <pre>cgroup_enable=memory swapaccount=1</pre>
This can usually be placed in /etc/default/grub inside the
<i>GRUB_CMDLINE_LINUX</i> variable. A command such as <i>update-grub</i> must
be run after updating the file.
</ul>
<h2>Use of Cgroups in SLURM</h2>
<p>SLURM provides cgroup versions of a number of plugins.</p>
......@@ -174,6 +184,6 @@ the following example.</li>
</ul>
<p class="footer"><a href="#top">top</a></p>
<p style="text-align:center;">Last modified 26 October 2012</p>
<p style="text-align:center;">Last modified 8 November 2013</p>
<!--#include virtual="footer.txt"-->
......@@ -143,6 +143,16 @@ permission to use. The default value is "/etc/slurm/cgroup_allowed_devices_file.
the file accepts one device per line and it permits lines like /dev/sda* or /dev/cpu/*/*.
See also an example of this file in etc/allowed_devices_file.conf.example.
.SH "DISTRIBUTION-SPECIFIC NOTES"
.LP
Debian and derivatives (e.g. Ubuntu) usually exclude the memory and memsw (swap)
cgroups by default. To include them, add the following parameters to the kernel
command line: \fBcgroup_enable=memory swapaccount=1\fR
.LP
This can usually be placed in /etc/default/grub inside the
\fBGRUB_CMDLINE_LINUX\fR variable. A command such as update-grub must be run
after updating the file.
.SH "EXAMPLE"
.LP
......
......@@ -451,6 +451,22 @@ extern int task_cgroup_memory_attach_task(slurmd_job_t *job)
return fstatus;
}
/* return 1 if failcnt file exists and is > 0 */
int failcnt_non_zero(xcgroup_t* cg, char* param)
{
int fstatus = XCGROUP_ERROR;
uint64_t value;
fstatus = xcgroup_get_uint64_param(cg,
param,
&value);
if(fstatus != XCGROUP_SUCCESS) {
debug2("unable to read '%s' from '%s'", param, cg->path);
return 0;
}
else
return value > 0;
}
extern int task_cgroup_memory_check_oom(slurmd_job_t *job)
{
xcgroup_t memory_cg;
......@@ -463,20 +479,26 @@ extern int task_cgroup_memory_check_oom(slurmd_job_t *job)
* for a step and vice versa...
* can't tell which is which so we'll treat
* them the same */
xcgroup_get_uint64_param(&step_memory_cg,
"memory.memsw.failcnt",
&memory_memsw_failcnt);
if (memory_memsw_failcnt > 0)
if(failcnt_non_zero(&step_memory_cg,
"memory.memsw.failcnt"))
error("Exceeded step memory limit at some "
"point. oom-killer likely killed a "
"process.");
else if(failcnt_non_zero(&step_memory_cg,
"memory.failcnt"))
error("Exceeded step memory limit at some "
"point. oom-killer likely "
"killed a process.");
xcgroup_get_uint64_param(&job_memory_cg,
"memory.memsw.failcnt",
&memory_memsw_failcnt);
if (memory_memsw_failcnt > 0)
"point. Step may have been partially "
"swapped out to disk.");
if(failcnt_non_zero(&job_memory_cg,
"memory.memsw.failcnt"))
error("Exceeded job memory limit at some "
"point. oom-killer likely killed a "
"process.");
else if(failcnt_non_zero(&job_memory_cg,
"memory.failcnt"))
error("Exceeded job memory limit at some "
"point. oom-killer likely "
"killed a process.");
"point. Job may have been partially "
"swapped out to disk.");
xcgroup_unlock(&memory_cg);
} else
error("task/cgroup task_cgroup_memory_check_oom: "
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment