diff --git a/doc/html/gres.shtml b/doc/html/gres.shtml index b2d30566a0ec7a54ea72b0b8785a589a22f8a001..08ae20ed4116db50fc200db0257d6394be4657b1 100644 --- a/doc/html/gres.shtml +++ b/doc/html/gres.shtml @@ -25,124 +25,43 @@ Intel® Many Integrated Core (MIC) processors.</P> <P>Slurm supports no generic resources in the default configuration. One must explicitly specify which resources are to be managed in the <I>slurm.conf</I> configuration file. The configuration parameters of -interest are:</P> - -<UL> -<LI><B>GresTypes</B> a comma delimited list of generic resources to be -managed (e.g. <I>GresTypes=gpu,mps</I>). This name may be that of an -optional plugin providing additional control over the resources.</LI> -<LI><B>Gres</B> the generic resource configuration details in the format<br> -<name>[:<type>][:no_consume]:<number>[K|M|G]<br> -The first field is the resource name, which matches the GresType configuration -parameter name. -The optional type field might be used to identify a model of that generic -resource. -A generic resource can also be specified as non-consumable (i.e. multiple -jobs can use the same generic resource) with the optional field ":no_consume". -The final field must specify a generic resource count. -A suffix of "K", "M" or "G" may be used to multiply the count by 1024, -1048576 or 1073741824 respectively. -By default a node has no generic resources.</LI> -</UL> +interest are <B>GresTypes</B> and <B>Gres</B>. +</P> + +<P> +For more details, see <a href="slurm.conf.html#OPT_GresTypes">GresTypes</a> and <a href="slurm.conf.html#OPT_Gres_1">Gres</a> in the <I>slurm.conf</I> man page. +</P> <P>Note that the GRES specification for each node works in the same fashion as the other resources managed. Depending upon the value of the <I>FastSchedule</I> parameter, nodes which are found to have fewer resources than configured will be placed in a DOWN state.</P> -<P>Sample slurm.conf file:</P> +<P>Snippet from an example <I>slurm.conf</I> file:</P> <PRE> -# Configure support for our four GPUs +# Configure support for our four GPUs (with MPS), plus bandwidth GresTypes=gpu,mps,bandwidth NodeName=tux[0-7] Gres=gpu:tesla:2,gpu:kepler:2,mps:400,bandwidth:lustre:no_consume:4G </PRE> <P>Each compute node with generic resources typically contain a <I>gres.conf</I> file describing which resources are available on the node, their count, -associated device files and cores which should be used with those resources. -In the case of GPUs, and the <I>gres.conf</I> AutoDetect variable -contains <I>nvml</I> and that nvml lib is installed and was present during Slurm -configuration, their configuration will be automatically gathered using the NVML -library. +associated device files and cores which should be used with those resources.</P> + +<P>In the case of GPUs, if AutoDetect=nvml in <I>gres.conf</I> and the NVML +library is installed on the node and was present during Slurm +configuration, the missing configuration details will be automatically gathered +using the NVML library. Configuration information about all other generic resource must explicitly be described in the <I>gres.conf</I> file. -The configuration parameters available are:</P> - -<UL> -<LI><B>Name</B> name of a generic resource (must match <B>GresTypes</B> -values in <I>slurm.conf</I>).</LI> - -<LI><B>Count</B> Number of resources of this type available on this node. -The default value is set to the number of <B>File</B> values specified (if any), -otherwise the default value is one. A suffix of "K", "M" or "G" may be used -to multiply the number by 1024, 1048576 or 1073741824 respectively -(e.g. "Count=10G"). Note that Count is a 32-bit field and the maximum value -is 4,294,967,295.</LI> - -<LI><B>Cores</B> -Specify the first thread CPU index numbers for the specific cores which can -use this resource. -For example, it may be strongly preferable -to use specific cores with specific devices (e.g. on a NUMA -architecture). Multiple cores may be specified using a comma -delimited list or a range may be specified using a "-" separator -(e.g. "0,1,2,3" or "0-3"). -<B>If specified, then only the identified cores can be allocated with each generic -resource; an attempt to use other cores will not be honored.</B> -If not specified, then any core can be used with the resources, which also -increases the speed of Slurm's scheduling algorithm. -If any core can be effectively used with the resources, then do not specify the -Cores option for improved speed in the Slurm scheduling logic. - -<B>NOTE:</B> If your cores contain multiple threads only list the first thread -of each core. The logic is such that it uses core instead of thread scheduling -per GRES. Also note that since Slurm must be able to perform resource -management on heterogeneous clusters having various core ID numbering schemes, -an abstract index will be used instead of the physical core index. That -abstract id may not correspond to your physical core number. -Basically Slurm starts numbering from 0 to n, being 0 the id of the first -processing unit (core or thread if HT is enabled) on the first socket, -first core and maybe first thread, and then continuing sequentially to the -next thread, core, and socket. The numbering generally coincides with the -processing unit logical number (PU L#) seen in lstopo output. - -<LI><B>File</B> Fully qualified pathname of the device files associated with a -resource. -The name can include a numeric range suffix to be interpreted by Slurm -(e.g. <I>File=/dev/nvidia[0-3]</I>). -This field is generally required if enforcement of generic resource -allocations are to be supported (i.e. prevents a user from making -use of resources allocated to a different user). -Enforcement of the file allocation relies upon Linux Control Groups (cgroups) -and Slurm's task/cgroup plugin, which will place the allocated files into -the job's cgroup and prevent use of other files. -Please see Slurm's <a href="cgroups.html">Cgroups Guide</a> for more -information.<br> -Except in the case of MPS support, if <B>File</B> is specified then <B>Count</B> -must be either set to the number of file names specified or not set (the -default value is the number of files specified). -In the case of MPS support, each GPU would be identifed by name using the -<B>File</B> parameter and <B>Count</B> would specify the number of MPS entries -that would correspond to that GPU; typically 100 or some multiple of 100. -<br> -NOTE: If you specify the <B>File</B> parameter for a resource on some node, -the option must be specified on all nodes and Slurm will track the assignment -of each specific resource on each node. Otherwise Slurm will only track a -count of allocated resources rather than the state of each individual device -file. -<br> -NOTE: Drain a node before changing the count of records with <B>File</B> -parameters (i.e. if you want to add or remove GPUs from a node's configuration). -Failure to do so will result in any job using those GRES being aborted.</LI> - -<LI><B>Type</B> Optionally specify the device type. For example, this might -be used to identify a specific model of GPU, which users can then specify -in their job request. -If <B>Type</B> is specified, then <B>Count</B> is limited in size (currently 1024). -</LI> -</UL> - -<P>Sample gres.conf file:</P> +</P> + +<P> +To view available <I>gres.conf</I> configuration parameters, see the <a href="gres.conf.html">gres.conf man page</a>. +</P> + + +<P>Example <I>gres.conf</I> file:</P> <PRE> # Configure support for four GPUs (with MPS), plus bandwidth AutoDetect=nvml @@ -448,6 +367,6 @@ to a physical device</pre> explicitly defined in the offload pragmas.</P> <!--------------------------------------------------------------------------> -<p style="text-align: center;">Last modified 11 March 2019</p> +<p style="text-align: center;">Last modified 4 April 2019</p> <!--#include virtual="footer.txt"--> diff --git a/doc/man/man5/gres.conf.5 b/doc/man/man5/gres.conf.5 index 8f21a1eb8c1ed0bb3d56c5b6db85a0bd62f3e011..f02a3863f2c964698fc7eb6a32d9caaeb548ee59 100644 --- a/doc/man/man5/gres.conf.5 +++ b/doc/man/man5/gres.conf.5 @@ -1,4 +1,4 @@ -.TH "gres.conf" "5" "Slurm Configuration File" "March 2019" "Slurm Configuration File" +.TH "gres.conf" "5" "Slurm Configuration File" "April 2019" "Slurm Configuration File" .SH "NAME" gres.conf \- Slurm configuration file for Generic RESource (GRES) management. @@ -11,8 +11,7 @@ resources, then a gres.conf file should be included on each compute node. The file location can be modified at system build time using the DEFAULT_SLURM_CONF parameter or at execution time by setting the SLURM_CONF environment variable. The file will always be located in the -same directory as the \fBslurm.conf\fP file. If generic resource counts are -set by the GRES plugin function node_config_load(), this file may be optional. +same directory as the \fBslurm.conf\fP file. .LP If the GRES information in the slurm.conf file fully describes those resources @@ -32,7 +31,7 @@ Slurm daemons, daemon receipt of the SIGHUP signal, or execution of the command "scontrol reconfigure" unless otherwise noted. .LP -CUDA Multi-Process Service (MPS) provides a mechanism where GPUs can be +CUDA Multi\-Process Service (MPS) provides a mechanism where GPUs can be shared by multiple jobs, where each job is allocated some percentage of the GPU's resources. GPUs to be made available for MPS must be identified in the \fBslurm.conf\fP @@ -48,17 +47,23 @@ not be available as a gres/mps. Likewise, once a GPU has been allocated as a gres/mps resource it will not be available as a gres/gpu. + .LP \fBNOTE:\fP Slurm support for gres/mps requires the use of the select/cons_tres plugin. +.LP +For more information on GRES scheduling, see +\fIhttps://slurm.schedmd.com/gres.html\fR. + .LP The overall configuration parameters available include: .TP \fBAutoDetect\fR Comma separated list of the types of GRES to auto detect. Valid options are -'nvml'. This is needed to use any outside system to configure GRES. +.\" Escape `'` at the beginning of a line +\'nvml'. This is needed to use any outside system to configure GRES. .TP \fBCount\fR @@ -66,6 +71,7 @@ Number of resources of this type available on this node. The default value is set to the number of \fBFile\fR values specified (if any), otherwise the default value is one. A suffix of "K", "M", "G", "T" or "P" may be used to multiply the number by 1024, 1048576, 1073741824, etc. respectively. +For example: "Count=10G". .TP \fBCores\fR @@ -77,21 +83,23 @@ While Slurm can track and assign resources at the CPU or thread level, its scheduling algorithms used to co\-allocate GRES devices with CPUs operates at a socket or NUMA level. Therefore it is not possible to preferentially assign GRES with different -specific CPUs on the sane NUMA or socket and this option should be used to -identify all cores on some socket +specific CPUs on the same NUMA or socket and this option should be used to +identify all cores on some socket. + + Multiple cores may be specified using a comma delimited list or a range may be specified using a "\-" separator (e.g. "0,1,2,3" or "0\-3"). -If the \fBCores\fR configuration option is specified and a job is submitted -with the \fB\-\-gres-flags=enforce\-binding\fR option then only the identified -cores can be allocated with each generic resource; which will tend to improve -performance of jobs, but slow the allocation of resources to them. +If a job specifies \fB\-\-gres\-flags=enforce\-binding\fR, then only the +identified cores can be allocated with each generic resource. This will tend to +improve performance of jobs, but delay the allocation of resources to them. If specified and a job is \fInot\fR submitted with the -\fB\-\-gres-flags=enforce\-binding\fR option the identified cores will be +\fB\-\-gres\-flags=enforce\-binding\fR option the identified cores will be preferred for scheduled with each generic resource. -If \fB\-\-gres-flags=disable\-binding\fR is specified, then any core can be + +If \fB\-\-gres\-flags=disable\-binding\fR is specified, then any core can be used with the resources, which also increases the speed of Slurm's scheduling algorithm but can degrade the application performance. -The \fB\-\-gres-flags=disable\-binding\fR option is currently required to use +The \fB\-\-gres\-flags=disable\-binding\fR option is currently required to use more CPUs than are bound to a GRES (i.e. if a GPU is bound to the CPUs on one socket, but resources on more than one socket are required to run the job). If any core can be effectively used with the resources, then do not specify the @@ -119,26 +127,26 @@ in "lstopo \-l" command output. .TP \fBFile\fR Fully qualified pathname of the device files associated with a resource. -The file name parsing logic includes support for simple regular expressions as -shown in the example. +The name can include a numeric range suffix to be interpreted by Slurm +(e.g. \fIFile=/dev/nvidia[0\-3]\fR). + + This field is generally required if enforcement of generic resource -allocations is to be supported (i.e. prevents a users from making +allocations is to be supported (i.e. prevents users from making use of resources allocated to a different user). +Enforcement of the file allocation relies upon Linux Control Groups (cgroups) +and Slurm's task/cgroup plugin, which will place the allocated files into +the job's cgroup and prevent use of other files. +Please see Slurm's Cgroups Guide for more +information: \fIhttps://slurm.schedmd.com/cgroups.html\fR. + If \fBFile\fR is specified then \fBCount\fR must be either set to the number of file names specified or not set (the default value is the number of files specified). -Slurm must track the utilization of each individual device If device file -names are specified, which involves more overhead than just tracking the -device counts. -Use the \fBFile\fR parameter only if the \fBCount\fR is not sufficient for -tracking purposes. - -Except in the case of MPS support, if \fBFile\fR is specified then \fBCount\fR -must be either set to the number of file names specified or not set (the -default value is the number of files specified). -In the case of MPS support, each GPU would be identifed by name using the -\fBFile\fR parameter and \fBCount\fR would specify the number of MPS entries -that would correspond to that GPU; typically 100 or some multiple of 100. +The exception to this is MPS. For MPS, each GPU would be identified by device +file using the \fBFile\fR parameter and \fBCount\fR would specify the number of +MPS entries that would correspond to that GPU (typically 100 or some multiple of +100). NOTE: If you specify the \fBFile\fR parameter for a resource on some node, the option must be specified on all nodes and Slurm will track the assignment @@ -163,8 +171,8 @@ If specified, then this line can only contain a single GRES device (i.e. can only contain a single file via \fBFile\fR). -This is an optional value and is typically automatically determined -autodetecting the NVIDIA NVML library. +This is an optional value and is usually automatically determined if +\fBAutoDetect\fR is enabled. A typical use case would be to identify GPUs having NVLink connectivity. Note that for GPUs, the minor number assigned by the OS and used in the device file (i.e. the X in \fI/dev/nvidiaX\fR) is not necessarily the same as the @@ -174,14 +182,18 @@ ID and then numbering them starting from the smallest bus ID. .TP \fBName\fR Name of the generic resource. Any desired name may be used. +The name must match a value in \fBGresTypes\fR in \fIslurm.conf\fR. Each generic resource has an optional plugin which can provide -resource\-specific options. +resource\-specific functionality. Generic resources that currently include an optional plugin are: .RS .TP \fBgpu\fR Graphics Processing Unit .TP +\fBmps\fR +CUDA Multi\-Process Service (MPS) +.TP \fBnic\fR Network Interface Card .TP @@ -199,8 +211,9 @@ the example below. .TP \fBType\fR -An arbitrary string identifying the type of device. -For example, a particular model of GPU. +An optional arbitrary string identifying the type of device. +For example, this might be used to identify a specific model of GPU, which users +can then specify in a job request. If \fBType\fR is specified, then \fBCount\fR is limited in size (currently 1024). .SH "EXAMPLES" @@ -272,7 +285,7 @@ NodeName=tux[4\-15] Name=gpu File=/dev/nvidia[0\-3] .br # Slurm's Generic Resource (GRES) configuration file .br -# Use NVML to gather GPU configuration infomation +# Use NVML to gather GPU configuration information .br # Information about all other GRES gathered from slurm.conf .br diff --git a/doc/man/man5/slurm.conf.5 b/doc/man/man5/slurm.conf.5 index 159f5e0ae017405e02acae6f85734ec8e33348b7..146e93b6c506bab1d70c81191aa70e6a93e242e8 100644 --- a/doc/man/man5/slurm.conf.5 +++ b/doc/man/man5/slurm.conf.5 @@ -1,4 +1,4 @@ -.TH "slurm.conf" "5" "Slurm Configuration File" "March 2019" "Slurm Configuration File" +.TH "slurm.conf" "5" "Slurm Configuration File" "April 2019" "Slurm Configuration File" .SH "NAME" slurm.conf \- Slurm configuration file @@ -884,8 +884,9 @@ The default value is 2 seconds. .TP \fBGresTypes\fR -A comma delimited list of generic resources to be managed. -These generic resources may have an associated plugin available to provide +A comma delimited list of generic resources to be managed (e.g. +\fIGresTypes=gpu,mps\fR). +These resources may have an associated GRES plugin of the same name providing additional functionality. No generic resources are managed by default. Ensure this parameter is consistent across all nodes in the cluster for