From b28bb581fa2730cc4054e9ca4352f31b5eb921f7 Mon Sep 17 00:00:00 2001 From: Martin Perry <Martin.Perry@Bull.com> Date: Mon, 11 Jun 2012 12:10:09 -0700 Subject: [PATCH] Initial patch adding cgroup web page from Martin Perry, Bull --- doc/html/Makefile.am | 1 + doc/html/Makefile.in | 1 + doc/html/cgroups.shtml | 172 +++++++++++++++++++++++++++++++++++ doc/html/documentation.shtml | 1 + doc/man/man5/cgroup.conf.5 | 164 ++++----------------------------- 5 files changed, 193 insertions(+), 146 deletions(-) create mode 100644 doc/html/cgroups.shtml diff --git a/doc/html/Makefile.am b/doc/html/Makefile.am index 059c66bc111..0e9f9bcaa14 100644 --- a/doc/html/Makefile.am +++ b/doc/html/Makefile.am @@ -8,6 +8,7 @@ generated_html = \ authplugins.html \ big_sys.html \ bluegene.html \ + cgroups.html \ checkpoint_blcr.html \ checkpoint_plugins.html \ cons_res.html \ diff --git a/doc/html/Makefile.in b/doc/html/Makefile.in index dc8df4f1107..bd85344f24d 100644 --- a/doc/html/Makefile.in +++ b/doc/html/Makefile.in @@ -324,6 +324,7 @@ generated_html = \ authplugins.html \ big_sys.html \ bluegene.html \ + cgroups.html \ checkpoint_blcr.html \ checkpoint_plugins.html \ cons_res.html \ diff --git a/doc/html/cgroups.shtml b/doc/html/cgroups.shtml new file mode 100644 index 00000000000..af372753cdb --- /dev/null +++ b/doc/html/cgroups.shtml @@ -0,0 +1,172 @@ +<!--#include virtual="header.txt"--> + +<h1>Cgroups Guide</h1> +<h2>Cgroups Overview</h2> +For a comprehensive description of Linux Control Groups (cgroups) see the +<a href="http://www.kernel.org/doc/Documentation/cgroups/cgroups.txt"> +cgroups documentation</A> at kernel.org. Detailed knowledge of cgroups is not +required to use cgroups in SLURM, but a basic understanding of the +following features of cgroups is helpful: +<ul> +<li><b>Cgroup</b> - a container for a set of processes subject to common +controls or monitoring, implemented as a directory and a set of files +(state objects) in the cgroup +virtual filesystem.</li> +<li><b>Subsystem</b> - a module, typically a resource controller, that applies +a set of parameters to the cgroups in a hierarchy.</li> +<li><b>Hierarchy</b> - a set of cgroups organized in a tree structure, with one +or more associated subsystems.</li> +<li><b>State Objects</b> - pseudofiles that represent the state of a cgroup or +apply controls to a cgroup: +<ul> +<li><i>tasks</i> - identifies the processes (PIDs) in the cgroup. +<li><i>release_agent</i> - specifies the location of the script or program to +be called when the cgroup becomes empty.</li> +<li><i>notify_on_release</i> - controls whether the release_agent is called for +the cgroup.</li> +<li>additional state objects specific to each subsystem.</li> +</ul> +</ul> +<br> +<h2>Use of Cgroups in SLURM</h2> +SLURM provides cgroup versions of a number of plugins. Currently, there are +cgroup versions of the proctrack (process tracking), task (task management) and +jobacct_gather (job accounting statistics) plugins. The cgroup versions of +these plugins may be configured instead of the standard versions. The cgroup +plugins can provide a number of benefits over the +standard plugins, as described below. +<br><br> +<h2>SLURM Cgroups Configuration Overview</h2> +There are several sets of configuration options for SLURM cgroups: +<ul> +<li><a href="slurm.conf.html">slurm.conf</a> provides options to enable the +cgroup plugins. Each plugin may be enabled or disabled independently +of the others.</li> +<li><a href="cgroup.conf.html">cgroup.conf</a> provides general options that +are common to all cgroup plugins, plus additional options that apply only to +specific plugins.</li> +<li>Additional configuration is required to enable automatic removal of SLURM +cgroups when they are no longer in use. +See <a href="#cleanup">Cleanup of SLURM Cgroups</a> below for details.</li> +</ul> +<a name="available"></a> +<br> +<h2>Currently Available Cgroup Plugins</h2> +<h3>proctrack/cgroup plugin</h3> +The proctrack/cgroup plugin is an alternative to the proctrack/linux plugin +for process tracking and suspend/resume capability. proctrack/cgroup provides +more reliable tracking and control than proctrack/linux. proctrack/cgroup uses +the freezer subsystem. +<p> +To enable this plugin, configure the following option in slurm.conf: +<pre>ProctrackType=proctrack/cgroup</pre> +</p> +There are no specific options for this plugin in cgroup.conf, but the general +options apply. See the <a href="cgroup.conf.html">cgroup.conf</a> man page for +details. +<h3>task/cgroup plugin</h3> +The task/cgroup plugin is an alternative to the task/affinity plugin for task +management. task/cgroup provides the following features: +<ul> +<li>The ability to confine jobs and steps to their allocated cpuset.</li> +<li>The ability to bind tasks to sockets, cores and threads within their step's +allocated cpuset on a node.</li> +<ul> +<li>Supports block and cyclic distribution of allocated cpus to tasks for +binding.</li> +</ul> +<li>The ability to confine jobs and steps to specific memory resources.</li> +<li>The ability to confine jobs to their allocated set of generic resources +(gres devices).</li> +</ul> +The task/cgroup plugin uses the cpuset, memory and devices subsystems. +<p> +To enable this plugin, configure the following option in slurm.conf: +<pre>TaskPlugin=task/cgroup</pre> +</p> +There are many specific options for this plugin in cgroup.conf. The general +options also apply. See the <a href="cgroup.conf.html">cgroup.conf</a> man page +for details. +<h3>jobacct_gather/cgroup plugin</h3> +The jobacct_gather/cgroup plugin is an alternative to the jobacct_gather/linux +plugin for the collection of accounting statistics for jobs, steps and tasks. +The cgroup plugin may provide improved performance over jobacct_gather/linux. +jobacct_gather/cgroup uses the cpuacct and memory subsystems. Note: the cpu and +memory statistics collected by this plugin do not represent the same resources +as the cpu and memory statistics collected by the jobacct_gather/linux plugin +(sourced from /proc stat). At present, jobacct_gather/cgroup should be +considered experimental. +<p> +To enable this plugin, configure the following option in slurm.conf: +<pre>JobacctGatherType=jobacct_gather/cgroup</pre> +</p> +There are no specific options for this plugin in cgroup.conf, but the general +options apply. See the <a href="cgroup.conf.html">cgroup.conf</a> man page for +details. +<br><br> +<h2>Organization of SLURM Cgroups</h2> +SLURM cgroups are organized as follows. A base directory (mount point) is +created at /cgroup, or as configured by the <i>CgroupMountpoint</I> option in +<a href="cgroup.conf.html">cgroup.conf</a>. All cgroup +hierarchies are created below this base directory. A separate hierarchy is +created for each cgroup subsystem in use. The name of the root cgroup in each +hierarchy is the subsystem name. A cgroup named <i>slurm</i> is created below +the root cgroup in each hierarchy. Below each <i>slurm</i> cgroup, cgroups for +SLURM users, jobs, steps and tasks are created dynamically as needed. The names +of these cgroups consist of a prefix identifying the SLURM entity (user, job, +step or task), followed by the relevant numeric id. The following example shows +the path of the task cgroup in the cpuset hierarchy for taskid#2 of stepid#0 of +jobid#123 for userid#100, using the default base directory (/cgroup): +<p><pre>/cgroup/cpuset/slurm/uid_100/job_123/step_0/task_2</pre></p> +Note that this structure applies to a specific compute node. Jobs that use more +than one node will have a cgroup structure on each node. +<a name="cleanup"></a> +<br><br> +<h2>Cleanup of SLURM Cgroups</h2> +Linux provides a mechanism for the automatic removal of a cgroup when its +state changes from non-empty to empty. A cgroup is empty when no processes are +attached to it and it has no child cgroups. The SLURM cgroups implementation +allows this mechanism to be used to automatically remove the relevant SLURM +cgroups when tasks, steps and jobs terminate. To enable this automatic removal +feature, follow these steps: +<ul> +<li>If desired, configure the location of the SLURM Cgroup release agent +directory. This is done using the <i>CgroupReleaseAgentDir</i> option in +<a href="cgroup.conf.html">cgroup.conf</a>. +The default location is /etc/slurm/cgroup.</li> +<br> +<pre> + [sulu] (slurm) etc> cat cgroup.conf | grep CgroupReleaseAgentDir + CgroupReleaseAgentDir="/etc/slurm/cgroup" +</pre> +<li>Create the common release agent file. This file should be named +<i>release_common</i>. An example script for this file is provided in the +SLURM delivery at etc/cgroup.release_common.example. The example script will +automatically remove user, job, step and task cgroups as they become empty. The +file must have execute permission for root.</li><br> +<li>Create release agent files for each cgroup subsystem to be used by SLURM. +This depends on which cgroup plugins are enabled. For example, the +proctrack/cgroup plugin uses the <i>freezer</i> subsystem. See +<a href="#available">Currently Available Cgroup Plugins</a> above to find out +which subsystems are used by each plugin. The name of each release agent file +must be of the form <i>release_<subsystem name></i>. These files should +be created as symbolic links to the common release agent file, +<i>release_common</i>. The files must have execute permission for root. See +the following example.</li> +<br> +<pre> + [sulu] (slurm) etc> ls -al /etc/slurm/cgroup + total 12 + drwxr-xr-x 2 root root 4096 2010-04-23 14:55 . + drwxr-xr-x 4 root root 4096 2010-07-22 14:48 .. + -rwxrwxrwx 1 root root 234 2010-04-23 14:52 release_common + lrwxrwxrwx 1 root root 32 2010-04-23 11:04 release_cpuset -> /etc/slurm/cgroup/release_common + lrwxrwxrwx 1 root root 32 2010-04-23 11:03 release_freezer -> /etc/slurm/cgroup/release_common + +</pre> +</ul> +<p class="footer"><a href="#top">top</a></p> + +<p style="text-align:center;">Last modified 6 June 2012</p> + +<!--#include virtual="footer.txt"--> diff --git a/doc/html/documentation.shtml b/doc/html/documentation.shtml index b42463c32da..116333dfcd7 100644 --- a/doc/html/documentation.shtml +++ b/doc/html/documentation.shtml @@ -26,6 +26,7 @@ Also see <a href="publications.html">Publications and Presentations</a>. <ul> <li><a href="quickstart_admin.html">Quick Start Administrator Guide</a></li> <li><a href="accounting.html">Accounting</a></li> +<li><a href="cgroups.html">Cgroups Guide</a></li> <li><a href="configurator.html">Configuration Tool (Full version)</a></li> <li><a href="configurator.easy.html">Configuration Tool (Simplified version)</a></li> <li><a href="cpu_management.html">CPU Management User and Administrator Guide</a></li> diff --git a/doc/man/man5/cgroup.conf.5 b/doc/man/man5/cgroup.conf.5 index 5a06a0e7653..c5807b680d5 100644 --- a/doc/man/man5/cgroup.conf.5 +++ b/doc/man/man5/cgroup.conf.5 @@ -1,4 +1,4 @@ -.TH "cgroup.conf" "5" "December 2010" "cgroup.conf 2.2" \ +.TH "cgroup.conf" "5" "June 2012" "cgroup.conf 2.2" \ "Slurm cgroup configuration file" .SH "NAME" @@ -6,7 +6,7 @@ cgroup.conf \- Slurm configuration file for the cgroup support .SH "DESCRIPTION" -\fBcgroup.conf\fP is an ASCII file which defines parameters used by +\fBcgroup.conf\fP is an ASCII file which defines parameters used by Slurm's Linux cgroup related plugins. The file location can be modified at system build time using the DEFAULT_SLURM_CONF parameter or at execution time by setting the SLURM_CONF @@ -20,9 +20,10 @@ The size of each line in the file is limited to 1024 characters. Changes to the configuration file take effect upon restart of SLURM daemons, daemon receipt of the SIGHUP signal, or execution of the command "scontrol reconfigure" unless otherwise noted. + .LP -Two cgroup plugins are currently available in SLURM. The first -one is a proctrack plugin, the second one a task plugin. +For general Slurm Cgroups information, see the Cgroups Guide at +<http://www.schedmd.com/slurmdocs/cgroups.html>. .LP The following cgroup.conf parameters are defined to control the general behavior @@ -38,98 +39,33 @@ one per subsystem. The default \fIPATH\fR is /cgroup. \fBCgroupAutomount\fR=<yes|no> Slurm cgroup plugins require valid and functional cgroup subsystem to be mounted under /cgroup/<subsystem_name>. -When launched, plugins check their subsystem availability. If not available, -the plugin launch fails unless CgroupAutomount is set to yes. In that case, the +When launched, plugins check their subsystem availability. If not available, +the plugin launch fails unless CgroupAutomount is set to yes. In that case, the plugin will first try to mount the required subsystems. .TP \fBCgroupReleaseAgentDir\fR=<path_to_release_agent_directory> -Used to tune the cgroup system behavior. This parameter identifies the location -of the directory containing Slurm cgroup release_agent files. A release_agent file -is required for each mounted subsystem. The release_agent file name must have the -following format: release_<subsystem_name>. For instance, the release_agent file -for the cpuset subsystem must be named release_cpuset. See also CLEANUP OF -CGROUPS below. - -.SH "PROCTRACK/CGROUP PLUGIN" - -Slurm \fBproctrack/cgroup\fP plugin is used to track processes using the -freezer control group subsystem. It creates a hierarchical set of -directories for each step, putting the step tasks into the leaf. -.LP -This directory structure is like the following: -.br -/cgroup/freezer/uid_%uid/job_%jobid/step_%stepid -.LP -Slurm cgroup proctrack plugin is enabled with the following parameter -in slurm.conf: -.br -ProctrackType=proctrack/cgroup - -.LP -No particular cgroup.conf parameter is defined to control the behavior -of this particular plugin. - -.SH "JOBACCT_GATHER/CGROUP PLUGIN" - -Slurm \fBjobacct_gather/cgroup\fP plugin is an experimental plugin -that uses cgroups to generate accounting statistics instead of the linux -/proc table. The plugin creates a hierarchical set of -directories for each task. -.LP -This directory structure is like the following: -.br -/cgroup/cpuacct/uid_%uid/job_%jobid/step_%stepid/task_%taskid -.LP -Slurm cgroup jobacct_gather plugin is enabled with the following parameter -in slurm.conf: -.br -JobAcctGatherType=jobacct_gather/cgroup - -.LP -No particular cgroup.conf parameter is defined to control the behavior -of this particular plugin. +Used to tune the cgroup system behavior. This parameter identifies the location +of the directory containing Slurm cgroup release_agent files. .SH "TASK/CGROUP PLUGIN" -.LP -Slurm \fBtask/cgroup\fP plugin is used to enforce allocated resources -constraints, thus avoiding tasks to use unallocated resources. It currently -only uses cpuset subsystem but could use memory and devices subsystems in a -near future too. - -.LP -It creates a hierarchical set of directories for each task and subsystem. -The directory structure is like the following: -.br -/cgroup/%subsys/uid_%uid/job_%jobid/step_%stepid/task_%taskid - -.LP -Slurm cgroup task plugin is enabled with the following parameter -in slurm.conf: -.br -TaskPlugin=task/cgroup - .LP The following cgroup.conf parameters are defined to control the behavior of this particular plugin: .TP \fBConstrainCores\fR=<yes|no> -If configured to "yes" then constrain allowed cores to the subset of +If configured to "yes" then constrain allowed cores to the subset of allocated resources. It uses the cpuset subsystem. The default value is "no". + .TP \fBTaskAffinity\fR=<yes|no> -If configured to "yes" then set a default task affinity to bind each step +If configured to "yes" then set a default task affinity to bind each step task to a subset of the allocated cores using \fBsched_setaffinity\fP. The default value is "no". -.LP -The following cgroup.conf parameters could be defined to control the behavior -of this particular plugin in a next version where memory and devices support -would be added : - .TP \fBAllowedRAMSpace\fR=<number> Constrain the job cgroup RAM to this percentage of the allocated memory. @@ -190,13 +126,13 @@ The default value is "no". .TP \fBAllowedDevicesFile\fR=<path_to_allowed_devices_file> -If the ConstrainDevices field is set to "yes" then this file has to be used to declare -the devices that need to be allowed by default for all the jobs. The current implementation +If the ConstrainDevices field is set to "yes" then this file has to be used to declare +the devices that need to be allowed by default for all the jobs. The current implementation of cgroup devices subsystem works as a whitelist of entries, which means that in order to isolate the access of a job upon particular devices we need to allow the access on all -the devices, supported by default and then deny on those that the job does not have the -permission to use. The default value is "/etc/slurm/cgroup_allowed_devices_file.conf". The syntax of -the file accepts one device per line and it permits lines like /dev/sda* or /dev/cpu/*/*. +the devices, supported by default and then deny on those that the job does not have the +permission to use. The default value is "/etc/slurm/cgroup_allowed_devices_file.conf". The syntax of +the file accepts one device per line and it permits lines like /dev/sda* or /dev/cpu/*/*. See also an example of this file in etc/allowed_devices_file.conf.example. @@ -217,72 +153,8 @@ ConstrainCores=yes .br # -.SH "NOTES" -.LP -Only one instance of a cgroup subsystem is valid at a time in the kernel. -If you try to mount another cgroup hierarchy that uses the same cpuset -subsystem it will fail. -However you can mount another cgroup hierarchy for a different cpuset -subsystem. - -.SH CLEANUP OF CGROUPS -.LP -To allow cgroups to be removed automatically when they are no longer in use -the notify_on_release flag is set in each cgroup when the cgroup is -instantiated. The release_agent file for each subsystem is set up when the -subsystem is mounted. The name of each release_agent file is -release_<subsystem name>. The directory is specified via the -CgroupReleaseAgentDir parameter in cgroup.conf. A simple release agent -mechanism to remove slurm cgroups when they become empty may be set up by -creating the release agent files for each required subsystem as symbolic -links to a common release agent script, as shown in the example below: - -[sulu] (slurm) etc> cat cgroup.conf | grep CgroupReleaseAgentDir -.br -CgroupReleaseAgentDir="/etc/slurm/cgroup" -.br - -[sulu] (slurm) etc> ls \-al /etc/slurm/cgroup -.br -total 12 -.br -drwxr-xr-x 2 root root 4096 2010-04-23 14:55 . -.br -drwxr-xr-x 4 root root 4096 2010-07-22 14:48 .. -.br -\-rwxrwxrwx 1 root root 234 2010-04-23 14:52 release_common -.br -lrwxrwxrwx 1 root root 32 2010-04-23 11:04 release_cpuset -> /etc/slurm/cgroup/release_common -.br -lrwxrwxrwx 1 root root 32 2010-04-23 11:03 release_freezer -> /etc/slurm/cgroup/release_common - -[sulu] (slurm) etc> cat /etc/slurm/cgroup/release_common -.br -#!/bin/bash -.br -base_path=/cgroup -.br -progname=$(basename $0) -.br -subsystem=${progname##*_} -.br -.br -rmcg=${base_path}/${subsystem}$@ -.br -uidcg=${rmcg%/job*} -.br -if [[ \-d ${base_path}/${subsystem} ]] -.br -then -.br - flock \-x ${uidcg} \-c "rmdir ${rmcg}" -.br -fi -.br -[sulu] (slurm) etc> - .SH "COPYING" -Copyright (C) 2010 Lawrence Livermore National Security. +Copyright (C) 2010-2012 Lawrence Livermore National Security. Produced at Lawrence Livermore National Laboratory (cf, DISCLAIMER). CODE\-OCEC\-09\-009. All rights reserved. .LP -- GitLab