Skip to content
Snippets Groups Projects
Commit 4a59ecea authored by Morris Jette's avatar Morris Jette
Browse files

Add some documentation about gres/mic support

parent 4ee9c786
No related branches found
No related tags found
No related merge requests found
...@@ -33,6 +33,7 @@ HIGHLIGHTS ...@@ -33,6 +33,7 @@ HIGHLIGHTS
- Added srun option "--cpu-freq" to enable user control over the job's CPU - Added srun option "--cpu-freq" to enable user control over the job's CPU
frequency and thus it's power consumption. frequency and thus it's power consumption.
- Added priority/multifactor2 plugin supporting ticket based shares. - Added priority/multifactor2 plugin supporting ticket based shares.
- Added gres/mic plugin supporting Intel Many Integrated Core (MIC) processors.
CONFIGURATION FILE CHANGES (see "man slurm.conf" for details) CONFIGURATION FILE CHANGES (see "man slurm.conf" for details)
============================================================= =============================================================
......
...@@ -2,10 +2,9 @@ ...@@ -2,10 +2,9 @@
<h1>Generic Resource (GRES) Scheduling</h1> <h1>Generic Resource (GRES) Scheduling</h1>
<P>Beginning in SLURM version 2.2 generic resource (Gres) scheduling is <P>Generic resource (GRES) scheduling is supported through a flexible plugin
supported through a flexible plugin mechanism. Support is initially provided mechanism. Support is currently provided for Graphics Processing Units (GPUs)
for Graphics Processing Units (GPUs), although support for any resources is and Intel&reg; Many Integrated Core (MIC) processors.</P>
possible.</P>
<!--------------------------------------------------------------------------> <!-------------------------------------------------------------------------->
<h2>Configuration</h2> <h2>Configuration</h2>
...@@ -17,10 +16,10 @@ interest are:</P> ...@@ -17,10 +16,10 @@ interest are:</P>
<UL> <UL>
<LI><B>GresTypes</B> a comma delimited list of generic resources to be <LI><B>GresTypes</B> a comma delimited list of generic resources to be
managed (e.g. <I>GresTypes=gpu,nic</I>). This name may be that of an managed (e.g. <I>GresTypes=gpu,mic</I>). This name may be that of an
optional plugin providing additional control over the resources.</LI> optional plugin providing additional control over the resources.</LI>
<LI><B>Gres</B> the specific generic resource and their count associated with <LI><B>Gres</B> the specific generic resource and their count associated with
each node (e.g. <I>NodeName=linux[0-999] Gres=gpu:8,nic:2</I>).</LI> each node (e.g. <I>NodeName=linux[0-999] Gres=gpu:1,mic:2</I>).</LI>
</UL> </UL>
<P>Note that the Gres specification for each node works in the same fashion <P>Note that the Gres specification for each node works in the same fashion
...@@ -53,7 +52,7 @@ Multiple CPUs may be specified using a comma delimited list or a range may be ...@@ -53,7 +52,7 @@ Multiple CPUs may be specified using a comma delimited list or a range may be
specified using a "-" separator (e.g. "0,1,2,3" or "0-3"). specified using a "-" separator (e.g. "0,1,2,3" or "0-3").
If not specified, then any CPU can be used with the resources. If not specified, then any CPU can be used with the resources.
If any CPU can be used with the resources, then do not specify the If any CPU can be used with the resources, then do not specify the
<B>CPUs</B> option for improved speed in the SLURM scheduling logic. CPUs option for improved speed in the SLURM scheduling logic.
<LI><B>File</B> Fully qualified pathname of the device files associated with a <LI><B>File</B> Fully qualified pathname of the device files associated with a
resource. resource.
...@@ -62,10 +61,10 @@ The name can include a numberic range suffix to be interpretted by SLURM ...@@ -62,10 +61,10 @@ The name can include a numberic range suffix to be interpretted by SLURM
This field is generally required if enforcement of generic resource This field is generally required if enforcement of generic resource
allocations is to be supported (i.e. prevents a users from making allocations is to be supported (i.e. prevents a users from making
use of resources allocated to a different user). use of resources allocated to a different user).
If <B>File</B> is specified then <B>Count</B> must be either set to the number If File is specified then Count must be either set to the number
of file names specified or not set (the default value is the number of files of file names specified or not set (the default value is the number of files
specified). specified).
NOTE: If you specify the <B>File</B> parameter for a resource on some node, NOTE: If you specify the File parameter for a resource on some node,
the option must be specified on all nodes and SLURM will track the assignment the option must be specified on all nodes and SLURM will track the assignment
of each specific resource on each node. Otherwise SLURM will only track a of each specific resource on each node. Otherwise SLURM will only track a
count of allocated resources rather than the state of each individual device count of allocated resources rather than the state of each individual device
...@@ -150,7 +149,28 @@ JobStep=1234.2 CUDA_VISIBLE_DEVICES=3 ...@@ -150,7 +149,28 @@ JobStep=1234.2 CUDA_VISIBLE_DEVICES=3
<P>NOTE: Be sure to specify the <I>File</I> parameters in the <I>gres.conf</I> <P>NOTE: Be sure to specify the <I>File</I> parameters in the <I>gres.conf</I>
file and insure they are in the increasing numeric order.</P> file and insure they are in the increasing numeric order.</P>
<!--------------------------------------------------------------------------> <!-------------------------------------------------------------------------->
<h2>MIC Management</h2>
<P>SLURM can be used to provide resource management for systems with the
Intel&reg; Many Integrated Core (MIC) processor.
SLURM sets an OFFLOAD_DEVICES environment variable, which controls the
selection of MICs available to a job step.
The OFFLOAD_DEVICES environment variable is used by both Intel
LEO (Language Extensioins for Offload) and the MKL (Math Kernel Library)
automatic offload.
(This is very similar to how the CUDA_VISIBLE_DEVICES environment variable is
used to control which GPUs can be used by CUDA&trade; software.)
If no MICs are reserved via GRES, the OFFLOAD_DEVICES variable is set to
-1. This causes the code to ignore the offload directives and run MKL
routines on the CPU. The code will still run but only on the CPU. This
also gives a somewhat cryptic warning:</P>
<pre>offload warning: OFFLOAD_DEVICES device number -1 does not correspond
to a physical device</pre>
<P>The offloading is automatically scaled to all the devices, (e.g. if
--gres=mic:2 is defined) then all offloads use two MICs unless
explicitly defined in the offload pragmas.</P>
<!-------------------------------------------------------------------------->
<p style="text-align: center;">Last modified 2 July 2012</p> <p style="text-align: center;">Last modified 25 October 2012</p>
</body></html> </body></html>
...@@ -15,6 +15,7 @@ ...@@ -15,6 +15,7 @@
<p>SLURM Version 2.5 was release in November 2012. <p>SLURM Version 2.5 was release in November 2012.
Major enhancements include: Major enhancements include:
<ul> <ul>
<li>Support for Intel&reg; Many Integrated Core (MIC) processors.</li>
<li>User control over CPU frequency of each job step.</li> <li>User control over CPU frequency of each job step.</li>
<li>Recording power usage information for each job.</li> <li>Recording power usage information for each job.</li>
<li>Advanced reservation of cores rather than whole nodes.</li> <li>Advanced reservation of cores rather than whole nodes.</li>
...@@ -84,6 +85,6 @@ trojan library, then that library will be used by the SLURM daemon with ...@@ -84,6 +85,6 @@ trojan library, then that library will be used by the SLURM daemon with
unpredictable results. This was fixed in SLURM version 2.1.14.</li> unpredictable results. This was fixed in SLURM version 2.1.14.</li>
</ul> </ul>
<p style="text-align:center;">Last modified 2 October 2012</p> <p style="text-align:center;">Last modified 5 October 2012</p>
<!--#include virtual="footer.txt"--> <!--#include virtual="footer.txt"-->
...@@ -73,6 +73,9 @@ Graphics Processing Unit ...@@ -73,6 +73,9 @@ Graphics Processing Unit
.TP .TP
\fBnic\fR \fBnic\fR
Network Interface Card Network Interface Card
.TP
\fBmic\fR
Intel Many Integrated Core (MIC) processor
.RE .RE
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment