Minor word-smithing.

08fccd83 · Moe Jette · a7f04397 · 08fccd83 · 08fccd83 · 08fccd83
Commit 08fccd83 authored 22 years ago by Moe Jette
--- a/doc/jsspp/abstract.tex
+++ b/doc/jsspp/abstract.tex
 \begin{abstract}
 A new cluster resource management system called 
-Simple Linux Utility Resource Management (SLURM) is developed and 
+Simple Linux Utility Resource Management (SLURM) is described 
-presented in this paper. SLURM, initially developed for large 
+in this paper. SLURM, initially developed for large 
 Linux clusters at the Lawrence Livermore National Laboratory (LLNL), 
 is a simple cluster manager that can scale to thousands of processors. 
 SLURM is designed to be flexible and fault-tolerant and can be ported to 

--- a/doc/jsspp/architecture.tex
+++ b/doc/jsspp/architecture.tex
@@ -86,7 +86,7 @@ previously saved state information,
 notifies the controller that it is active, waits for work, 
 executes the work, returns status, and waits for more work.  
 Since it initiates jobs for other users, it must run with root privilege.
-It also asynchronously exchanges node and job status information with {\tt slurmctld}.  
+%It also asynchronously exchanges node and job status information with {\tt slurmctld}.  
 The only job information it has at any given time pertains to its 
 currently executing jobs.
 The \slurmd\ performs five major tasks.
@@ -124,7 +124,7 @@ Most SLURM state information is maintained by the controller, {\tt slurmctld}.
 The \slurmctld\ is multi-threaded with independent read and write locks 
 for the various data structures to enhance scalability. 
 When \slurmctld\ starts, it reads the SLURM configuration file.  
-It also can read additional state information
+It can also read additional state information
 from a checkpoint file generated by a previous execution of {\tt slurmctld}.
 Full controller state information is written to 
 disk periodically with incremental changes written to disk immediately

--- a/doc/jsspp/conclusions.tex
+++ b/doc/jsspp/conclusions.tex
@@ -5,9 +5,9 @@ and portable cluster resource management system.
 The contribution of this work is that we have provided a immediately-available
 and open-source tool that virtually anybody can use to efficiently manage clusters of 
 different sizes and architecture.
-We expect SLURM to begin production use on LLNL Linux clusters 
+%We expect SLURM to begin production use on LLNL Linux clusters 
-starting in March 2003 and be available for distribution shortly 
+%starting in March 2003 and be available for distribution shortly 
-thereafter. 
+%thereafter. 
 Looking ahead, we anticipate adding support for additional 
 operating systems.

--- a/doc/jsspp/intro.tex
+++ b/doc/jsspp/intro.tex
@@ -8,16 +8,17 @@ The continuous decrease in the price of the COTS parts in conjunction with
 the good scalability of the cluster architecture has now made it feasible to economically
 build large-scale clusters with thousands of processors~\cite{MCRWeb,PCRWeb}.
-An essential component that is needed to harness such a resource is a 
+An essential component that is needed to harness such a computer is a 
-cluster management system.
+resource management system.
-A cluster management system (or cluster manager) performs such crucial tasks as
+A resource management system (or resource manager) performs such crucial tasks as
 scheduling user jobs, monitoring machine and job status, launching user applications, and
 managing machine configuration,
-An ideal cluster manager should be simple, efficient, scalable, and fault-tolerant.
+An ideal resource manager should be simple, efficient, scalable, fault-tolerant, 
+and portable.
-Unfortunately there are no open-source cluster management systems currently available 
+Unfortunately there are no open-source resource management systems currently available 
 which satisfy these requirements.
-A survey~\cite{Jette02} has revealed that many existing cluster managers have poor scalability and fault-tolerance rendering them unsuitable for large clusters having 
+A survey~\cite{Jette02} has revealed that many existing resource managers have poor scalability and fault-tolerance rendering them unsuitable for large clusters having 
 thousands of processors~\cite{LoadLevelerWeb,LoadLevelerManual}.
 While some proprietary cluster managers are suitable for large clusters, 
 they are typically designed for particular computer systems and/or 
@@ -30,7 +31,7 @@ even though the scheduler does not necessarily meet the need of organization tha
 Clear separation of the cluster management functionality from scheduling policy is desired.
 This observation led us to set out to design a simple, highly scalable, and 
-portable cluster management system that performs the core of cluster resource management functions. 
+portable resource management system. 
 The result of this effort is Simple Linux Utility Resource Management 
 (SLURM\footnote{A tip of the hat to Matt Groening and creators of {\em Futurama},
 where Slurm is the most popular carbonated beverage in the universe.}). 
@@ -94,10 +95,9 @@ deterministic.
 \end{itemize}
 The main contribution of our work is that we have provided a readily available 
-and inexpensive tool that anybody can use to efficiently manage clusters of different size and architecture. 
+tool that anybody can use to efficiently manage clusters of different size and architecture. 
-The SLURM is highly scalable\footnote{It was observed that it took less than two seconds for SLURM to launch a thousand-task job on 
+SLURM is highly scalable\footnote{It was observed that it took less than five seconds for SLURM to launch a 1900-task job over 950 nodes on recently installed cluster at Lawrence Livermore National Laboratory.}. 
-a large cluster currently being built for Lawrence Livermore National Laboratory.}. 
+The SLURM can be easily ported to any cluster system with minimal effort with its plugin 
-The SLURM can be easily ported to any cluster system with minimal effort with its plug-in 
 capability and can be used with any meta-batch scheduler or a Grid resource broker~\cite{Gridbook}
 with its well-defined interfaces.

--- a/doc/jsspp/survey.tex
+++ b/doc/jsspp/survey.tex
@@ -24,7 +24,7 @@ high priority queue for smaller "interactive" jobs.  Signal to daemons
 causes current log file to be closed, renamed with 
 time-stamp, and a new log file created.
-Although the PBS is portable and has broad user base, it has significant drawbacks.
+Although the PBS is portable and has a broad user base, it has significant drawbacks.
 PBS is single threaded and hence exhibits poor performance on large clusters. 
 This is particularly problematic when a compute node in the system fails: 
 PBS tries to contact down node while other activities must wait. 
@@ -83,17 +83,17 @@ PBS also has a weak mechanism for starting and cleaning up parallel jobs.
 \subsection{Quadrics RMS}
 Quadrics RMS\cite{Quadrics02}
-(Resource Management System) is a cluster management system for 
+(Resource Management System) is for 
 Unix systems having Quadrics Elan interconnects. 
 RMS functionality and performance is excellent. 
-It major drawback is the requirement for a Quadrics interconnect. 
+It major limitation is the requirement for a Quadrics interconnect. 
-The priorietary code and cost may also pose difficulties under some 
+The proprietary code and cost may also pose difficulties under some 
 circumstances.
 \subsection*{Maui Scheduler}
-Maui Scheduler~\cite{Maui} is an advance reservation HPC batch scheduler 
+Maui Scheduler~\cite{Maui} is an advanced reservation HPC batch scheduler 
 for use with SP, O2K, and UNIX/Linux clusters. 
 It is widely used to extend the functionality of PBS and LoadLeveler, 
 which Maui requires to perform the parallel job initiation and management.
@@ -128,7 +128,7 @@ overuse of a computer where not authorized.
 %      not authorized.
 %\end{itemize}
-While DPCS does have these attractive characteristics, it supports only a 
+DPCS supports only a 
 limited number of computer systems: IBM RS/6000 and SP, Linux, 
 Sun Solaris, and Compaq Alpha. 
 Like the Maui Scheduler, DPCS requires an underlying infrastructure for 
@@ -140,10 +140,10 @@ LoadLeveler~\cite{LoadLevelerManual,LoadLevelerWeb}
 is a proprietary batch system and parallel job manager by 
 IBM. LoadLeveler supports few non-IBM systems. Very primitive 
 scheduling software exists and other software is required for reasonable 
-performance such as Maui Scheduler or DPCS. 
+performance, such as Maui Scheduler or DPCS. 
-The LoadLeveler is simple and very flexible queue and job class structure is available 
+The LoadLeveler has a  simple and very flexible queue and job class structure available 
 operating in "matrix" fashion. 
-The biggest problem of the LoadLeveler is its extremely poor scalability. 
+The biggest problem of the LoadLeveler is its poor scalability. 
 It typically requires 20 minutes to execute even a trivial 500-node, 8000-task
 on the IBM SP computers at LLNL.
 %In addition, all jobs must be initiated through the LoadLeveler, and a special version of
@@ -184,7 +184,7 @@ architectures, it has sophisticated scheduling software including
 fair-share, backfill, consumable resources, an job preemption and 
 very flexible queue structure.
 It also provides good status information on nodes and LSF daemons.
-While LSF is quite powerful, it is not open source and is costly on 
+While LSF is quite powerful, it is not open-source and can be costly on 
 larger clusters.
 %The LSF share many of its shortcomings with the LoadLeveler: job initiation only
 %through LSF, requirement of a spwcial MPI library, etc.
@@ -227,17 +227,17 @@ their requirements for a broker to perform matches. The checkpoint
 mechanism is used to relocate work on demand (when the "owner" of a 
 desktop machine wants to resume work).
+%
-\subsection*{Linux PAGG Process Aggregates}
+%\subsection*{Linux PAGG Process Aggregates}
+%
-PAGG~\cite{PAGG}
+%PAGG~\cite{PAGG}
-consists of modifications to the linux kernel that allows
+%consists of modifications to the linux kernel that allows
-developers to implement Process AGGregates as loadable kernel modules.
+%developers to implement Process AGGregates as loadable kernel modules.
-A process aggregate is defined as a collection of processes that are
+%A process aggregate is defined as a collection of processes that are
-all members of the same set. A set would be implemented as a container
+%all members of the same set. A set would be implemented as a container
-for the member processes. For instance, process sessions and groups
+%for the member processes. For instance, process sessions and groups
-could have been implemented as process aggregates.
+%could have been implemented as process aggregates.
+%
 \subsection*{Beowulf Distributed Process Space (BPROC)}
@@ -250,7 +250,9 @@ processes started with this mechanism appear in the process table
 of the front end machine in a cluster. This allows remote process
 management using the normal UNIX process control facilities. Signals
 are transparently forwarded to remote processes and exit status is
-received using the usual wait() mechanisms.
+received using the usual wait() mechanisms. This tight coupling of 
+a cluster's nodes is convenient, but high scalability can be difficult 
+to achieve.
 %\subsection{xcat}
 %
@@ -269,18 +271,18 @@ received using the usual wait() mechanisms.
 %NQS\footnote{http://umbc7.umbc.edu/nqs/nqsmain.html}, 
 %the Network Queueing System, is a serial batch system.
 %
-\subsection*{LAM / MPI}
+%\subsection*{LAM / MPI}
+%
-Local Area Multicomputer (LAM)~\cite{LAM}
+%Local Area Multicomputer (LAM)~\cite{LAM}
-is an MPI programming environment and development system for heterogeneous 
+%is an MPI programming environment and development system for heterogeneous 
-computers on a network. 
+%computers on a network. 
-With LAM, a dedicated cluster or an existing network
+%With LAM, a dedicated cluster or an existing network
-computing infrastructure can act as one parallel computer solving
+%computing infrastructure can act as one parallel computer solving
-one problem.  LAM features extensive debugging support in the
+%one problem.  LAM features extensive debugging support in the
-application development cycle and peak performance for production
+%application development cycle and peak performance for production
-applications. LAM features a full implementation of the MPI
+%applications. LAM features a full implementation of the MPI
-communication standard.
+%communication standard.
+%
 %\subsection{MPICH}
 %
 %MPICH\footnote{http://www-unix.mcs.anl.gov/mpi/mpich/}