From 1bd73cafaedba4d685809114dbfa128d0507c96b Mon Sep 17 00:00:00 2001 From: Tim Wickberg <tim@schedmd.com> Date: Wed, 28 Sep 2016 09:43:36 -0400 Subject: [PATCH] Mention removal of BG/L+P support, start pruning documentation. --- doc/html/bluegene.shtml | 90 +++++------------------------------- doc/html/documentation.shtml | 2 +- doc/html/mpi_guide.shtml | 18 ++------ doc/html/select_design.shtml | 9 ++-- doc/html/selectplugins.shtml | 4 +- 5 files changed, 22 insertions(+), 101 deletions(-) diff --git a/doc/html/bluegene.shtml b/doc/html/bluegene.shtml index a1330daf49a..2fe69273444 100644 --- a/doc/html/bluegene.shtml +++ b/doc/html/bluegene.shtml @@ -1,11 +1,15 @@ <!--#include virtual="header.txt"--> -<h1>BlueGene User and Administrator Guide</h1> +<h1>BlueGene/Q User and Administrator Guide</h1> + +<p><b>Beginning with the 17.02 release only BlueGene/Q systems are supported +by Slurm. Support for the BlueGene/L and BlueGene/P systems has been removed. +</b></p> <h2>Overview</h2> <p>This document describes the unique features of Slurm on the -<a href="http://www.research.ibm.com/bluegene/">IBM BlueGene</a> systems. +<a href="http://www.research.ibm.com/bluegene/">IBM BlueGene/Q</a> systems. You should be familiar with Slurm's mode of operation on Linux clusters before studying the relatively few differences in BlueGene operation described in this document.</p> @@ -13,8 +17,7 @@ described in this document.</p> <p>BlueGene systems have several unique features making for a few differences in how Slurm operates there. BlueGene systems consists of one or more <i>base partitions</i> or -<i>midplanes</i> connected in a three-dimensional - XYZ - (BlueGene/L -and BlueGene/P systems) or four-dimensional - AXYZ - (BlueGene/Q) torus. +<i>midplanes</i> connected in a four-dimensional - AXYZ - torus. Each <i>midplane</i> typically includes 512 <i>c-nodes</i> or compute nodes each containing two or more cores; one core is typically designed primarily for managing communications while the @@ -28,7 +31,7 @@ for every <i>midplane</i> on the system. </p> <p>Internally Slurm treats each <i>midplane</i> as one node with a processor count equal to the number of cores on the midplane, which -keeps the number of entities being managed b. Slurm more +keeps the number of entities being managed by Slurm more reasonable.</p> <p>All BlueGene systems can sub-allocate a <i>midplane</i> @@ -73,7 +76,7 @@ have, for instance, the 4th nodeboard populated instead of the 1st. services except support for job steps, which is detailed later. The sstat command is not supported on BlueGene systems.</p> -<p>Seven job submission options are available exclusively on BlueGene systems:</p> +<p>Four job submission options are available exclusively on BlueGene/Q systems:</p> <table> <tr VALIGN=TOP><td><i>--geometry</i></td><td>Specify job size in each dimension, (i.e. 1x4x4 = 16 nodes)</td></tr> @@ -84,18 +87,8 @@ The sstat command is not supported on BlueGene systems.</p> specify a different conn-type for each dimension, TTMT would give you Torus in all dimensions except the Y dimension, where it would be Mesh.</td></tr> -<tr VALIGN=TOP><td><i>--blrts-image</i></td><td>(BlueGene/L systems only) - Specify alternative blrts image for bluegene block. Default if not set.</td></tr> -<tr VALIGN=TOP><td><i>--cnload-image</i></td><td>(BlueGene/P systems only) Specify - alternative c-node image for bluegene block. Default if not set.</td></tr> -<tr VALIGN=TOP><td><i>--ioload-image</i></td><td>(BlueGene/P systems only) Specify - alternative io image for bluegene block. Default if not set.</td></tr> -<tr VALIGN=TOP><td><i>--linux-image</i></td><td>(BlueGene/L systems only) - Specify alternative linux image for bluegene block. Default if not set.</td></tr> <tr VALIGN=TOP><td><i>--mloader-image</i></td><td>Specify alternative mloader image for bluegene block. Default if not set.</td></tr> -<tr VALIGN=TOP><td><i>--ramdisk-image</i></td><td>(BlueGene/L or P systems only) - Specify alternative ramdisk image for bluegene block. Default if not set.</td></tr> </table> <p>The <i>--nodes</i> option with a minimum and (optionally) maximum node count @@ -118,29 +111,6 @@ leading dashes are required when listing the <i>runjob</i> options, e.g., <i>srun --launcher-opts='--mapping TEDCBA'</i>. See the <i>runjob</i> man page for the list of available options.</p> -<h3>Task Launch on BlueGene/L and BlueGene/P only</h3> - -<p>Slurm performs resource allocation for the job, but initiation of tasks is -performed using the <i>mpirun</i> command. Slurm has no concept of a job step -on BlueGene/L or BlueGene/P systems. -To reiterate: <u><i>salloc</i> or <i>sbatch</i> are used to create a job allocation, but -<i>mpirun</i> is used to launch the parallel tasks.</u> -The script that you submit to Slurm can contain multiple invocations of mpirun -as well as any desired commands for pre- and post-processing. -The mpirun command will get its <i>block</i> information from the -<i>MPIRUN_PARTITION</i> environment variable as set by Slurm. A sample script -is shown below.</p> -<pre> -#!/bin/bash -# pre-processing -date -# processing -mpirun -exec /home/user/prog -cwd /home/user -args 123 -mpirun -exec /home/user/prog -cwd /home/user -args 124 -# post-processing -date -</pre> - <h3><a name="naming">Naming Conventions</a></h3> <p>The naming of midplanes includes a numeric suffix representing the its coordinates with a zero origin. The suffix contains three digits on BlueGene/L @@ -368,18 +338,6 @@ Slurm will support this in the future when the underlying system allows it. <p class="footer"><a href="#top">top</a></p> -<h2>System Administration for all BlueGene/L Systems</h2> - -<p>Building a BlueGene compatible system is dependent upon the -<i>configure</i> program locating some expected files. -In particular for a BlueGene/L system, the configure script searches -for <i>libdb2.so</i> in the -directories <i>/bgl/BlueLight/ppcfloor/bglsys</i>, <i>/opt/IBM/db2/V8.1</i> -<i>/home/bgdb2cli/sqllib</i> and <i>/u/bgdb2cli/sqllib</i>. If your -DB2 library file is in a different location, use the configure -option <i>--with-db2-dir=PATH</i> to specify the parent directory. -This option does not apply to any other BlueGene arch.</p> - <h2>System Administration for all BlueGene Systems</h2> <p>The <i>slurmctld</i> daemon should execute on the system's service node. @@ -747,19 +705,6 @@ are case insensitive.</p> systems is <i>MloaderImage</i>. Alternate images may be specified as described above for all BlueGene system types.</p> -<p>On BlueGene/L and BlueGene/P systems DB2 database access is required by -the <i>slurmctld</i> daemon. All other Slurm daemons and commands -interact with DB2 using remote procedure calls, which are processed -by <i>slurmctld</i>. -DB2 access is dependent upon the environment variable -<i>BRIDGE_CONFIG_FILE</i>. -Make sure this is set appropriate before initiating the -<i>slurmctld</i> daemon. -If desired, this environment variable and any other logic -can be executed through the script <i>/etc/sysconfig/slurm</i>, -which is automatically executed by <i>/etc/init.d/slurm</i> -prior to initiating the Slurm daemons.</p> - <p>When <i>slurmctld</i> is initially started on an idle system, the blocks already defined in MMCS are read using the Bridge APIs. If these blocks do not correspond to those defined in the <i>bluegene.conf</i> @@ -767,9 +712,6 @@ file, the old blocks with a prefix of "RMP" are destroyed and new ones created. When a job is scheduled, the appropriate block is identified, its user set, and it is booted. -On BlueGene/L and BlueGene/P systems Node use (virtual or coprocessor) -is set from the mpirun command line. -Slurm has nothing to do with setting the node use. Subsequent jobs use this same block without rebooting by changing the associated user field. The only time blocks should be freed and rebooted, in normal operation, @@ -833,17 +775,9 @@ daemon serving as a front-end to those midplanes is not responding (on non-BlueGene systems, the <i>slurmd</i> actually does run on the compute nodes, so the message is more meaningful there). </p> -<p>Note that you can emulate a BlueGene/L system on stand-alone Linux +<p>Note that you can emulate a BlueGene/Q system on stand-alone Linux system. -Run <i>configure</i> with the <i>--enable-bgl-emulation</i> option. -This will define "HAVE_BG", "HAVE_BGL", "HAVE_BG_L_P", and -"HAVE_FRONT_END" in the config.h file. -You can also emulate a BlueGene/P system with -the <i>--enable-bgp-emulation</i> option. -This will define "HAVE_BG", "HAVE_BGP", "HAVE_BG_L_P", and -"HAVE_FRONT_END" in the config.h file. -You can also emulate a BlueGene/Q system using -the <i>--enable-bgq-emulation</i> option. +Run <i>configure</i> with the <i>--enable-bgq-emulation</i> option. This will define "HAVE_BG", "HAVE_BGQ", and "HAVE_FRONT_END" in the config.h file. Then execute <i>make</i> normally. @@ -855,6 +789,6 @@ scheduling logic, etc. </p> <p class="footer"><a href="#top">top</a></p> -<p style="text-align:center;">Last modified 31 March 2016</p> +<p style="text-align:center;">Last modified 28 September 2016</p> <!--#include virtual="footer.txt"--> diff --git a/doc/html/documentation.shtml b/doc/html/documentation.shtml index 65fee61776a..f9c76fec5bd 100644 --- a/doc/html/documentation.shtml +++ b/doc/html/documentation.shtml @@ -27,7 +27,7 @@ may be found in the <a href="http://slurm.schedmd.com/archive/">archive</a>. <li><a href="job_exit_code.html">Job Exit Codes</a></li> <li>Specific Systems</li> <ul> -<li><a href="bluegene.html">Blue Gene User and Administrator Guide</a></li> +<li><a href="bluegene.html">BlueGene/Q User and Administrator Guide</a></li> <li><a href="cray.html">Cray User and Administrator Guide with Native Slurm</a></li> <li><a href="cray_alps.html">Cray User and Administrator Guide with ALPS</a></li> <li><a href="ibm-pe.html">IBM Parallel Environment User and Administrator Guide</a></li> diff --git a/doc/html/mpi_guide.shtml b/doc/html/mpi_guide.shtml index a58696b7768..edd47fbb28e 100644 --- a/doc/html/mpi_guide.shtml +++ b/doc/html/mpi_guide.shtml @@ -410,8 +410,8 @@ $ srun -n16 --mpi=pmi2 a.out <h2><a name="bluegene_mpi" href="http://www.research.ibm.com/bluegene/"><b>BlueGene MPI</b></a></h2> -<p>All IBM BlueGene Systems rely upon Slurm to create a job's resource -allocation, but the task launch mechanism differs by system type.</p> +<p>IBM BlueGene/Q Systems rely upon Slurm to create a job's resource +allocation.</p> <h3>BlueGene/Q</h3> <p>The BlueGene/Q systems support the ability to allocate different portions of @@ -424,19 +424,7 @@ The srun command creates a job step allocation which is linked to IBM's <span class="commandline">runjob</span> libraries which will launch the tasks within the allocated resources.</p> -<h3>BlueGene/L and BlueGene/P</h3> -<p>BlueGene/L and P MPI relies upon the native -<span class="commandline">mpirun</span> command to launch tasks. -Build a job script containing one or more invocations of the -<span class="commandline">mpirun</span> command. Then submit -the script to Slurm using <span class="commandline">sbatch</span>. -For example:</p> -<pre> -$ sbatch -N512 my.script -</pre> -<p>Note that the node count specified with the <i>-N</i> option indicates -the base partition count. -See <a href="bluegene.html">BlueGene User and Administrator Guide</a> +<p>See <a href="bluegene.html">BlueGene/Q User and Administrator Guide</a> for more information.</p> <hr size=4 width="100%"> diff --git a/doc/html/select_design.shtml b/doc/html/select_design.shtml index 8325a4496d7..85e57be4104 100644 --- a/doc/html/select_design.shtml +++ b/doc/html/select_design.shtml @@ -10,7 +10,7 @@ The select plugin is aware of the systems topology, based upon data structures established by the topology plugn. It can also over-subscribe resources to support gang scheduling (time slicing of parallel jobs), if so configured. The select plugin is also capable of communicating with an external entity -to perform these actions (the select/bluegene plugin used on an IBM BlueGene +to perform these actions (the select/bluegene plugin used on an IBM BlueGene/Q and the select/cray plugin used with Cray ALPS/BASIL software are two examples). Other architectures would rely upon either the select/linear or select/cons_res plugin. The select/linear plugin allocates whole nodes to jobs @@ -66,7 +66,7 @@ jobs, expanding/shrinking job allocations, un/packing job state information, un/packing node state information, etc. The operation of those functions is relatively straightforward and not detailed here.</p> -<h2>Operation on IBM BlueGene Systems</h2> +<h2>Operation on IBM BlueGene/Q Systems</h2> <p>On IBM BlueGene systems, Slurm's <i>slurmd</i> daemon executes on the front-end nodes rather than the compute nodes and IBM provides a Bridge API @@ -74,9 +74,8 @@ to manage compute nodes and jobs. The IBM BlueGene systems also have very specific topology rules for what resources can be allocated to a job. Slurm's interface to IBM's Bridge API and the topology rules are found within the select/bluegene plugin and very little BlueGene-specific logic in Slurm is -found outside of that plugin. Note that the select/bluegene plugin is used for -BlueGene/L, BlueGene/P and BlueGene/Q systems with select portions of the -code conditionally compiled depending upon the system type.</p> +found outside of that plugin. Note that the select/bluegene plugin is required +for BlueGene/Q systems.</p> <h2>Operation on Cray Systems</h2> diff --git a/doc/html/selectplugins.shtml b/doc/html/selectplugins.shtml index 5381727b14a..029a112843e 100644 --- a/doc/html/selectplugins.shtml +++ b/doc/html/selectplugins.shtml @@ -16,10 +16,10 @@ specifications:</p> The major type must be "select." The minor type can be any recognizable abbreviation for the type of node selection algorithm. We recommend, for example:</p> <ul> -<li><b>bluegene</b>—<a href="http://www.research.ibm.com/bluegene/">IBM Blue Gene</a> +<li><b>bluegene</b>—<a href="http://www.research.ibm.com/bluegene/">IBM BlueGene/Q</a> node selector. Note that this plugin not only selects the nodes for a job, but performs some initialization and termination functions for the job. Use this plugin for -BlueGene/L, BlueGene/P and BlueGene/Q systems.</li> +BlueGene/Q systems.</li> <li><b>cons_res</b>—A plugin that can allocate individual processors, memory, etc. within nodes. This plugin is recommended for systems with many non-parallel programs sharing nodes. For more information see -- GitLab