From 6f5d8d42730a6a841616a292b22d6237029072c0 Mon Sep 17 00:00:00 2001 From: Moe Jette <jette1@llnl.gov> Date: Tue, 7 Sep 2004 16:50:17 +0000 Subject: [PATCH] Add Blue Gene Users Guide document. --- NEWS | 1 + doc/Makefile.am | 1 + doc/html/bluegene.html | 183 ++++++++++++++++++++++++++++++++++++ doc/html/documentation.html | 9 +- 4 files changed, 191 insertions(+), 3 deletions(-) create mode 100644 doc/html/bluegene.html diff --git a/NEWS b/NEWS index 2a934280fd5..6106d9f7cf7 100644 --- a/NEWS +++ b/NEWS @@ -7,6 +7,7 @@ documents those changes that are of interest to users and admins. Blue Gene/L -- Create new allocation as needed for debugger in case old allocation has been purged + -- Add Blue Gene User Guide to html documents * Changes in SLURM 0.4.0-pre2 ============================= diff --git a/doc/Makefile.am b/doc/Makefile.am index ae41fbbfd5c..fd8219f88fe 100644 --- a/doc/Makefile.am +++ b/doc/Makefile.am @@ -7,6 +7,7 @@ htmldir = ${prefix}/share/doc/@PACKAGE@-@VERSION@/html html_DATA = \ html/arch.gif \ html/authplugins.html \ + html/bluegene.html \ html/coding_style.pdf \ html/documentation.html \ html/download.html \ diff --git a/doc/html/bluegene.html b/doc/html/bluegene.html new file mode 100644 index 00000000000..493122efbb8 --- /dev/null +++ b/doc/html/bluegene.html @@ -0,0 +1,183 @@ +<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" + "http://www.w3.org/TR/REC-html40/loose.dtd"> + +<html> + +<head> +<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1"> +<meta http-equiv="Pragma" content="no-cache"> +<meta http-equiv="keywords" content="Simple Linux Utility for Resource Management, SLURM, resource management, +Linux clusters, high-performance computing, Livermore Computing"> +<meta name="LLNLRandR" content="UCRL-WEB-204324"> +<meta name="LLNLRandRdate" content="6 September 2004"> +<meta name="distribution" content="global"> +<meta name="description" content="Simple Linux Utility for Resource Management"> +<meta name="copyright" +content="This document is copyrighted U.S. +Department of Energy under Contract W-7405-Eng-48"> +<meta name="Author" content="Morris Jette"> +<meta name="email" content="jette1@llnl.gov"> +<meta name="Classification" +content="DOE:DOE Web sites via organizational +structure:Laboratories and Other Field Facilities"> +<title>Simple Linux Utility for Resource Management:Blue Gene User Guide</title> +<link href="slurmstyles.css" rel="stylesheet" type="text/css"> +</head> + +<body bgcolor="#000000" text="#000000" leftmargin="0" topmargin="0"> +<table width="770" border="0" cellspacing="0" cellpadding="0"> +<tr> +<td><img src="slurm_banner.jpg" width="770" height="145" usemap="#Map" border="0" alt="Simple Linux Utility for Resource Management"></td> +</tr> +</table> +<table width="770" border="0" cellspacing="0" cellpadding="3" bgcolor="#FFFFFF"> +<tr> +<td width="100%"> +<table width="760" border="0" cellspacing="0" cellpadding="4" align="right"> +<tr> +<td valign="top" bgcolor="#000000"><p><img src="spacer.gif" width="110" height="1" alt=""></p> +<p><a href="slurm.html" class="nav" align="center">Home</a></p> +<p><span class="whitetext">About</span><br> +<a href="overview.html" class="nav">Overview</a><br> +<a href="news.html" class="nav">What's New</a><br> +<a href="publications.html" class="nav">Publications</a><br> +<a href="team.html" class="nav">SLURM Team</a></p> +<p><span class="whitetext">Using</span><br> +<a href="documentation.html" class="nav">Documentation</a><br> +<a href="faq.html" class="nav">FAQ</a><br> +<a href="help.html" class="nav">Getting Help</a></p> +<p><span class="whitetext">Installing</span><br> +<a href="platforms.html" class="nav">Platforms</a><br> +<a href="download.html" class="nav">Download</a><br> +<a href="quickstart_admin.html" class="nav">Guide</a></p></td> +<td><img src="spacer.gif" width="10" height="1" alt=""></td> +<td valign="top"><h2>Blue Gene User Guide</h2> + +<h3>Overview</h3> + +<p>This document describes the unique features of SLURM on the +<a href="http://www.research.ibm.com/bluegene">IBM Blue Gene</a> systems. +You should be familiar with the SLURM's mode of operation on Linux clusters +before studying the differences in operation described in this document. +Users familiar with SLURM will find that there are relatively few Blue Gene +specific differences.</p> + +<p>Blue Gene systems have several unique features making for a few +differences in how SLURM operates there. +The basic unit of resource allocation is a <i>base partition</i>. +The <i>base partitions</i> are connected in a three-dimensional torus. +Each <i>base partition</i> includes 512 <i>c-nodes</i> each containing two processors; +one designed primarily for computations and the other primarily for managing communications. +SLURM considers each <i>base partition</i> as one node with 1024 processors. +The <i>c-nodes</i> can execute only one process and thus are unable to execute both +the user's jobs and SLURM's <i>slurmd</i> daemon. +Thus the <i>slurmd</i> daemon executes on one of the Blue Gene <i>Front End Nodes</i>. +This <i>slurmd</i> daemon provides (almost) all of the normal SLURM services +for every <i>base partition</i> on the system. </p> + +<h3>User Tools</h3> + +<p>The normal set of SLURM user tools: srun, scancel, sinfo, squeue and scontrol +provide all of the expected services except support for job steps. +SLURM performs resource allocation for the job, but initiation of job steps is performed +using the <i>mpirun</i> command and daemons provided with the Blue Gene system. +Three new srun options are available: --geometry (specify job size in each dimension), +--rotate (permit rotation of geometry), and --connect (specify connection type of +mesh or torus). See the srun man pages for details. </p> + +<a name="naming"> +<p>The naming of nodes includes a three-digit suffix representing the base partition's +location in the X, Y and Z dimensions with a zero origin. +For example, "bgl012" represents the base partition whose location is at X=0, Y=1 and Z=2. +Since jobs must be allocated consecutive nodes in all three dimensions, we have developed +an abbreviated format for describing the nodes in one of these three-dimensional blocks. +The node's prefix is followed by the end-points of the block enclosed in square-brackets. +For example, " bgl[620x731]" is used to represent the eight nodes enclosed in a block +with endpoints bgl620 and bgl731 (bgl620, bgl621, bgl630, bgl631, bgl720, bgl721, +bgl730 and bgl731).</p></a> + +<p>One new tools provided is <i>smap</i>. +Smap is aware of system topography and provides a map of what nodes are allocated +to jobs, partitions, etc. +A sample of smap output is provided below showing the location of five jobs. +Note the format of the list of nodes allocated to each job. +Also note that idle (unassigned) base partitions are indicated by a period.</p> + +<pre> + a a a a b b d d Key JobId User Nodes NodeList + a a a a b b d d a 12345 joseph 64 bgl[000x333] + a a a a b b c c b 12346 chris 16 bgl[420x533] +a a a a b b c c c 12350 danny 8 bgl[620x731] + d 12356 dan 16 bgl[603x733] + a a a a b b d d e 12378 joseph 4 bgl[610x711] + a a a a b b d d + a a a a b b c c +a a a a b b c c + + a a a a . . d d + a a a a . . d d + a a a a . . e e +a a a a . . e e + + a a a a . . d d + a a a a . . d d + a a a a . . . . +a a a a . . . . +</pre> + +<p class="footer"><a href="#top">top</a></p> + +<h3>System Administration</h3> + +<p>Building a Blue Gene compatible system is dependent upon the <i>configure</i> +program locating some expected files. You should see "#define HAVE_BGL 1" in +the "config.h" file before making SLURM.</p> + +<p>The slurmctld daemon should execute on the system's service node with +an optional backup daemon on one of the front end nodes. +One slurmd daemon should be configured to execute on one of the front end nodes. +That one slurmd daemon represents communications channel for every base partition. +You can use the scontrol command to drain individual nodes as desired and +return them to service. </p> + +<p>The slurm.conf (configuration) file needs to have the value of InactiveLimit +set to zero or not specified (it defaults to a value of zero). +This is because there are no job steps and we don't want to purge jobs prematurely. +The value of SelectType must be set to "select/bluegene" in order to have +node selection performed using a system aware of the system's topography +and interfaces. </p> + +<p>If SLURM node and partition descriptions should make use of the +<a href="#naming">naming</a> conventions described above. For example, +"NodeName=bgl[000x733] NodeAddr=frontend0 Procs=1024". +Note that the NodeAddr value for all 128 base partitions is the name +of the front end node executing the slurmd daemon.</p> + +<p>While users are unable to initiate SLURM job steps on Blue Gene systems, +this restriction does not apply to user root or SlurmUser. +Be advised that the one slurmd supporting all nodes is unable to manage a +large number of job steps, so this ability should be used only to verify normal +SLURM operation. +If large numbers of job steps are initiated by slurmd, expect the daemon to +fail due to lack of memory. </p> + +<p class="footer"><a href="#top">top</a></p></td> + +</tr> +<tr> +<td colspan="3"><hr> <p>For information about this page, contact <a href="mailto:slurm-dev@lists.llnl.gov">slurm-dev@lists.llnl.gov</a>.</p> +<p><a href="http://www.llnl.gov/"><img align=middle src="lll.gif" width="32" height="32" border="0"></a></p> +<p class="footer">UCRL-WEB-204324<br> +Last modified 6 September 2004</p></td> +</tr> +</table> +</td> + </tr> +</table> +<map name="Map"> +<area shape="rect" coords="616,4,762,97" href="../"> +<area shape="rect" coords="330,1,468,11" href="http://www.llnl.gov/disclaimer.html"> +<area shape="rect" coords="11,23,213,115" href="slurm.html"> +</map> +</body> +</html> diff --git a/doc/html/documentation.html b/doc/html/documentation.html index 35d055e88be..0b8f029c0be 100644 --- a/doc/html/documentation.html +++ b/doc/html/documentation.html @@ -9,7 +9,7 @@ <meta http-equiv="keywords" content="Simple Linux Utility for Resource Management, SLURM, resource management, Linux clusters, high-performance computing, Livermore Computing"> <meta name="LLNLRandR" content="UCRL-WEB-204324"> -<meta name="LLNLRandRdate" content="24 August 2004"> +<meta name="LLNLRandRdate" content="6 September 2004"> <meta name="distribution" content="global"> <meta name="description" content="Simple Linux Utility for Resource Management"> <meta name="copyright" @@ -54,6 +54,9 @@ structure:Laboratories and Other Field Facilities"> <td valign="top"><h2> Documentation</h2> <p>The SLURM <a href="quickstart.html">Quick Start User Guide</a> provides basic introductory information for all SLURM users.</p> +<p>Users and administrators of the +<a href="http://www.research.ibm.com/bluegene">IBM Blue Gene</a> should also +reference the <a href="bluegene.html">Blue Gene User Guide</a>.</p> <p>The following documents have been written to provide guidance and information for SLURM administrators and developers.</p> <ul> @@ -65,14 +68,14 @@ for SLURM administrators and developers.</p> <li><a href="selectplugins.html">Node Selection Plugin Programmer Guide</a></li> <li><a href="schedplugins.html">Scheduler Plugin Programmer Guide</a></li> <li><a href="switchplugins.html">Switch (Interconnect) Plugin Programmer Guide</a></li> -<li><a href="maui.html">Maui Scheduler Inegration Guide</a></li> +<li><a href="maui.html">Maui Scheduler Integration Guide</a></li> </ul></td> </tr> <tr> <td colspan="3"><hr> <p>For information about this page, contact <a href="mailto:slurm-dev@lists.llnl.gov">slurm-dev@lists.llnl.gov</a>.</p> <p><a href="http://www.llnl.gov/"><img align=middle src="lll.gif" width="32" height="32" border="0"></a></p> <p class="footer">UCRL-WEB-204324<br> -Last modified 24 August 2004</p></td> +Last modified 6 September 2004</p></td> </tr> </table> </td> -- GitLab