From 358dbbf179b4536e95514c0d7c05fa5e4f90073c Mon Sep 17 00:00:00 2001 From: Moe Jette <jette1@llnl.gov> Date: Mon, 9 Feb 2009 21:53:11 +0000 Subject: [PATCH] update some web pages for slurm v2.0 --- doc/html/job_priority.shtml | 10 ++--- doc/html/news.shtml | 85 +++++++++---------------------------- doc/html/overview.shtml | 5 ++- doc/html/platforms.shtml | 8 +++- doc/html/slurm.shtml | 52 +++++++++++++++-------- 5 files changed, 70 insertions(+), 90 deletions(-) diff --git a/doc/html/job_priority.shtml b/doc/html/job_priority.shtml index 2c18e60db60..44b5e0ddc38 100644 --- a/doc/html/job_priority.shtml +++ b/doc/html/job_priority.shtml @@ -54,7 +54,7 @@ Job_priority = <!--------------------------------------------------------------------------> <a name=age> -<h2>Age Factor</h2> +<h2>Age Factor</h2></a> <P> The age factor represents the length of time a job has been sitting in the queue and eligible to run. In general, the longer a job waits in the queue, the larger its age factor grows. However, the age factor for a dependent job will not change while it waits for the job it depends on to complete. Also, the age factor of a queued job whose node or time limits exceed the cluster's current limits will not change.</P> @@ -62,25 +62,25 @@ Job_priority = <!--------------------------------------------------------------------------> <a name=jobsize> -<h2>Job Size Factor</h2> +<h2>Job Size Factor</h2></a> <P> The job size factor correlates to the number of nodes the job has requested. This factor can be configured to favor larger jobs or smaller jobs based on the state of the <i>PriorityFavorSmall</i> boolean in the slurm configuration file. When <i>PriorityFavorSmall</i> is 0, the larger the job, the greater its job size factor will be. A job that requests all the nodes on the machine will get a job size factor of 1.0. When the <i>PriorityFavorSmall</i> Boolean is 1, the single node job will receive the 1.0 job size factor.</P> <!--------------------------------------------------------------------------> <a name=partition> -<h2>Partition Factor</h2> +<h2>Partition Factor</h2></a> <P> Each node partition can be assigned a factor from 0.0 to 1.0. The higher the number, the greater the job priority will be for jobs that are slated to run in this partition.</P> <!--------------------------------------------------------------------------> <a name=qos> -<h2>Quality of Service (QOS) Factor</h2> +<h2>Quality of Service (QOS) Factor</h2></a> <P> Each QOS can be assigned a factor from 0.0 to 1.0. The higher the number, the greater the job priority will be for jobs that request this QOS.</P> <!--------------------------------------------------------------------------> <a name=fairshare> -<h2>Fair-share Factor</h2> +<h2>Fair-share Factor</h2></a> <!--------------------------------------------------------------------------> <p style="text-align:center;">Last modified 9 February 2009</p> diff --git a/doc/html/news.shtml b/doc/html/news.shtml index ca9497bed0f..74e1bffd7ac 100644 --- a/doc/html/news.shtml +++ b/doc/html/news.shtml @@ -4,65 +4,9 @@ <h2>Index</h2> <ul> -<li><a href="#11">SLURM Version 1.1, May 2006</a></li> -<li><a href="#12">SLURM Version 1.2, February 2007</a></li> <li><a href="#13">SLURM Version 1.3, March 2008</a></li> -<li><a href="#14">SLURM Version 1.4, May 2009</a></li> -<li><a href="#15">SLURM Version 1.5 and beyond</a></li> -</ul> - -<h2><a name="11">Major Updates in SLURM Version 1.1</a></h2> -<p>SLURM Version 1.1 became available in May 2006. -Major enhancements include: -<ul> -<li>Communications enhancements, validated up to 16,384 node clusters.</li> -<li>File broadcast support (new <i>sbcast</i> command).</li> -<li>Support for distinct executables and arguments by task ID -(see <i>srun --multi-prog</i> option).</li> -<li>Support for binding tasks to the memory on a processor.</li> -<li>The configuration parameter <i>HeartbeatInterval</i> is defunct. -Half the values of configuration parameters <i>SlurmdTimeout</i> and -<i>SlurmctldTimeout</i> are used as the communication frequency for -the slurmctld and slurmd daemons respectively.</li> -<li>Support for PAM to control resource limits by user on each -compute node used. See <i>UsePAM</i> configuration parameter.</li> -<li>Support added for <i>xcpu</i> job launch.</li> -<li>Add support for 1/16 midplane BlueGene blocks.</li> -<li>Add support for overlapping BlueGene blocks.</li> -<li>Add support for dynamic BlueGene block creation on demand.</li> -<li>BlueGene node count specifications are now c-node counts -rather than base partition counts.</li> -</ul> - -<h2><a name="12">Major Updates in SLURM Version 1.2</a></h2> -<p>SLURM Version 1.2 became available in February 2007. -Major enhancements include: -<ul> -<li>More complete support for resource management down to the core level -on a node.</li> -<li>Treat memory as a consumable resource on a compute node.</li> -<li>New graphical user interface provided, <i>sview</i>.</li> -<li>Added support for OS X.</li> -<li>Permit batch jobs to be requeued.</li> -<li>Expanded support of Moab and Maui schedulers.</li> -<li><i>Srun</i> command augmented by new commands for each operation: -<i>salloc</i>, <i>sbatch</i>, and <i>sattach</i>.</li> -<li>Sched/wiki plugin (for Moab and Maui Schedulers) rewritten to -provide vastly improved integration.</li> -<li>BlueGene plugin permits use of different boot images per job -specification.</li> -<li>Event trigger mechanism added with new tool <i>strigger</i>.</li> -<li>Added support for task binding to CPUs or memory via <i>cpuset</i> -mechanism.</li> -<li>Added support for configurable -<a href="power_save.html">power savings</a> on idle nodes.</li> -<li>Support for MPICH-MX, MPICH1/shmem and MPICH1/p4 added with -task launch directly from the <i>srun</i> command.</li> -<li>Wrappers available for common Torque/PBS commands -(<i>psub</i>, <i>pstat</i>, and <i>pbsnodes</i>).</li> -<li>Support for <a href="http://www-unix.globus.org/">Globus</a> -(using Torque/PBS command wrappers).</li> -<li>Wrapper available for <i>mpiexec</i> command.</li> +<li><a href="#20">SLURM Version 2.0, May 2009</a></li> +<li><a href="#21">SLURM Version 2.1 and beyond</a></li> </ul> <h2><a name="13">Major Updates in SLURM Version 1.3</a></h2> @@ -85,34 +29,47 @@ option of using OpenSSL (default) or Munge (GPL).</li> spawned tasks.</li> <li>Support added for a much richer job dependency specification including testing of exit codes and multiple dependencies.</li> +<li>Support added for BlueGene/P systems and HTC (High Throughput +Computing) mode.</li> </ul> -<h2><a name="14">Major Updates in SLURM Version 1.4</a></h2> -<p>SLURM Version 1.4 is scheduled for released in May 2009. +<h2><a name="20">Major Updates in SLURM Version 2.0</a></h2> +<p>SLURM Version 2.0 is scheduled for released in May 2009. Major enhancements include: <ul> +<li>Sophisticated scheduling algorithms are available in a new plugin. Jobs +can be prioritized based upon their age, size and/or fair-share resource +allocation using hierarchical bank accounts.</li> +<li>An assortment of resource limits can be imposed upon individual users +and/or hierarchical bank accounts such as maximum job time limit, maximum +job size, and maximum number of running jobs.</li> +<li>Advanced reservations can be made to insure resources will be available +when needed.</li> <li>Idle nodes can now be completely powered down when idle and automatically restarted when their is work available.</li> <li>Jobs in higher priority partitions (queues) can automatically preempt jobs -in lower priority queues. The preempted jobs will automatically resume execution -upon completion of the higher priority job.</li> +in lower priority queues. The preempted jobs will automatically resume +execution upon completion of the higher priority job.</li> <li>Specific cores are allocated to jobs and jobs steps in order to effective preempt or gang schedule jobs.</li> <li>A new configuration parameter, <i>PrologSlurmctld</i>, can be used to support the booting of different operating systems for each job.</li> </ul> -<h2><a name="15">Major Updates in SLURM Version 1.5 and beyond</a></h2> +<h2><a name="21">Major Updates in SLURM Version 2.1 and beyond</a></h2> <p> Detailed plans for release dates and contents of future SLURM releases have not been finalized. Anyone desiring to perform SLURM development should notify <a href="mailto:slurm-dev@lists.llnl.gov">slurm-dev@lists.llnl.gov</a> to coordinate activities. Future development plans includes: <ul> +<li>Optimized resource allocation based upon network topology (e.g. +hierarchical switches).</li> +<li>Support for BlueGene/Q systems.</li> <li>Permit resource allocations (jobs) to change size.</li> <li>Add Kerberos credential support including credential forwarding and refresh.</li> </ul> -<p style="text-align:center;">Last modified 13 November 2008</p> +<p style="text-align:center;">Last modified 9 February 2009</p> <!--#include virtual="footer.txt"--> diff --git a/doc/html/overview.shtml b/doc/html/overview.shtml index ce6d4743451..6e857e76c21 100644 --- a/doc/html/overview.shtml +++ b/doc/html/overview.shtml @@ -85,6 +85,9 @@ FIFO (First In First Out, default), backfill, gang (time-slicing for parallel jo The Maui Scheduler</a>, and <a href="http://www.clusterresources.com/pages/products/moab-cluster-suite.php"> Moab Cluster Suite</a>. +There is also a <a href="job_priority.html">job prioritization</a> plugin +available for use with the FIFO, backfil and gang schedulers. Jobs can be +prioritized by age, size, fair-share allocation, etc.</li> <li><a href="switchplugins.html">Switch or interconnect</a>: <a href="http://www.quadrics.com/">Quadrics</a> @@ -170,6 +173,6 @@ PartitionName=DEFAULT MaxTime=UNLIMITED MaxNodes=4096 PartitionName=batch Nodes=lx[0041-9999] </pre> -<p style="text-align:center;">Last modified 13 November 2008</p> +<p style="text-align:center;">Last modified 9 February 2009</p> <!--#include virtual="footer.txt"--> diff --git a/doc/html/platforms.shtml b/doc/html/platforms.shtml index f4ea38d3aa9..ee1a60a543f 100644 --- a/doc/html/platforms.shtml +++ b/doc/html/platforms.shtml @@ -13,6 +13,9 @@ distributions using i386, ia64, and x86_64 architectures.</li> <ul> <li><b>BlueGene</b>—SLURM support for IBM's BlueGene/L and BlueGene/P systems has been thoroughly tested.</li> +<li><b>Cray XT</b>—Much of the infrastructure to support a Cray XT +system is current in SLURM. The interface to ALPS/BASIL remains to be done. +Please contact us if you would be interested in this work.</li> <li><b>Ethernet</b>—Ethernet requires no special support from SLURM and has been thoroughly tested.</li> <li><b>IBM Federation</b>—SLURM support for IBM's Federation Switch @@ -21,10 +24,11 @@ has been thoroughly tested.</li> <li><b>Myrinet</b>—Myrinet, MPICH-GM and MPICH-MX are supported.</li> <li><b>Quadrics Elan</b>—SLURM support for Quadrics Elan 3 and Elan 4 switches are available in all versions of SLURM and have been thoroughly tested.</li> -<li><b>Sun Constellation</b>—Three-dimensional torus interconnect.</li> +<li><b>Sun Constellation</b>—Resource allocation has been optimized +for the three-dimensional torus interconnect.</li> <li><b>Other</b>—SLURM ports to other systems will be gratefully accepted.</li> </ul> -<p style="text-align:center;">Last modified 22 December 2008</p> +<p style="text-align:center;">Last modified 9 February 2009</p> <!--#include virtual="footer.txt"--> diff --git a/doc/html/slurm.shtml b/doc/html/slurm.shtml index ff07abbea13..59704a034ea 100644 --- a/doc/html/slurm.shtml +++ b/doc/html/slurm.shtml @@ -1,32 +1,48 @@ <!--#include virtual="header.txt"--> <h1>SLURM: A Highly Scalable Resource Manager</h1> -<p>SLURM is an open-source resource manager designed for Linux clusters of all -sizes. It provides three key functions. First it allocates exclusive and/or non-exclusive -access to resources (computer nodes) to users for some duration of time so they -can perform work. Second, it provides a framework for starting, executing, and -monitoring work (typically a parallel job) on a set of allocated nodes. Finally, -it arbitrates contention for resources by managing a queue of pending work. </p> -<p>SLURM is not a sophisticated batch system, but it does provide an Applications -Programming Interface (API) for integration with external schedulers such as +<p>SLURM is an open-source resource manager designed for Linux clusters of +all sizes. +It provides three key functions. +First it allocates exclusive and/or non-exclusive access to resources +(computer nodes) to users for some duration of time so they can perform work. +Second, it provides a framework for starting, executing, and monitoring work +(typically a parallel job) on a set of allocated nodes. +Finally, it arbitrates contention for resources by managing a queue of +pending work. </p> + +<p>SLURM's design is very modular with dozens of optional plugins. +In its simplest configuration, it can be installed and configured in a +couple of minutes (see <a href="http://www.linux-mag.com/id/7239/1/"> +Caos NSA and Perceus: All-in-one Cluster Software Stack</a> +by Jeffrey B. Layton). +More complex configurations rely upon a +<a href="http://www.mysql.com/">MySQL</a> for archiving +<a href="accounting.html">accounting</a> records, managing +<a href="resource_limits.html">resource limits</a> by user or bank account, +or supporting sophisticated <a href="job_priority.html">job prioritization</a> +algorithms. +SLURM also provides an Applications Programming Interface (API) for +integration with external schedulers such as <a href="http://www.clusterresources.com/pages/products/maui-cluster-scheduler.php"> -The Maui Scheduler</a> and +The Maui Scheduler</a> or <a href="http://www.clusterresources.com/pages/products/moab-cluster-suite.php"> -Moab Cluster Suite</a>. -While other resource managers do exist, SLURM is unique in several respects: +Moab Cluster Suite</a>.</p> + +<p>While other resource managers do exist, SLURM is unique in several +respects: <ul> <li>Its source code is freely available under the <a href="http://www.gnu.org/licenses/gpl.html">GNU General Public License</a>.</li> <li>It is designed to operate in a heterogeneous cluster with up to 65,536 nodes.</li> -<li>It is portable; written in C with a GNU autoconf configuration engine. While -initially written for Linux, other UNIX-like operating systems should be easy -porting targets. A plugin mechanism exists to support various interconnects, authentication -mechanisms, schedulers, etc.</li> +<li>It is portable; written in C with a GNU autoconf configuration engine. +While initially written for Linux, other UNIX-like operating systems should +be easy porting targets.</li> <li>SLURM is highly tolerant of system failures, including failure of the node executing its control functions.</li> -<li>It is simple enough for the motivated end user to understand its source and -add functionality.</li> +<li>A plugin mechanism exists to support various interconnects, authentication +mechanisms, schedulers, etc. These plugins are documented and simple enough for the motivated end user to understand the source and add functionality.</li> </ul></p> <p>SLURM provides resource management on about 1000 computers worldwide, @@ -49,6 +65,6 @@ with 10,240 PowerPC processors and a Myrinet switch</li> <a href="http://www.clusterresources.com">Cluster Resources</a> and <a href="http://www.sicortex.com">SiCortex</a>.</p> -<p style="text-align:center;">Last modified 29 November 2007</p> +<p style="text-align:center;">Last modified 9 February 2009</p> <!--#include virtual="footer.txt"--> -- GitLab