Skip to content
GitLab
Explore
Sign in
Primary navigation
Search or go to…
Project
S
Slurm
Manage
Activity
Members
Labels
Plan
Issues
Issue boards
Milestones
Code
Merge requests
Repository
Branches
Commits
Tags
Repository graph
Compare revisions
Snippets
Build
Pipelines
Jobs
Pipeline schedules
Artifacts
Deploy
Releases
Package Registry
Model registry
Operate
Environments
Terraform modules
Monitor
Incidents
Service Desk
Analyze
Value stream analytics
Contributor analytics
CI/CD analytics
Repository analytics
Model experiments
Help
Help
Support
GitLab documentation
Compare GitLab plans
Community forum
Contribute to GitLab
Provide feedback
Terms and privacy
Keyboard shortcuts
?
Snippets
Groups
Projects
Show more breadcrumbs
tud-zih-energy
Slurm
Commits
df4f591d
Commit
df4f591d
authored
22 years ago
by
Moe Jette
Browse files
Options
Downloads
Patches
Plain Diff
Minor updates to report.tex. Updates to WWW pointers in project.bib.
Major enhancements to jsspp.tex.
parent
08ff8fba
No related branches found
No related tags found
No related merge requests found
Changes
3
Hide whitespace changes
Inline
Side-by-side
Showing
3 changed files
doc/common/project.bib
+15
-1
15 additions, 1 deletion
doc/common/project.bib
doc/jsspp/jsspp.tex
+83
-0
83 additions, 0 deletions
doc/jsspp/jsspp.tex
doc/pubdesign/report.tex
+31
-32
31 additions, 32 deletions
doc/pubdesign/report.tex
with
129 additions
and
33 deletions
doc/common/project.bib
+
15
−
1
View file @
df4f591d
...
...
@@ -36,7 +36,7 @@
@MISC
{
GPL2002
,
AUTHOR
=
"The GNU
General Public
License"
,
AUTHOR
=
"The GNU License
s
"
,
TITLE
=
"http://www.gnu.org/licenses/licenses.html"
,
}
...
...
@@ -62,6 +62,13 @@
YEAR
=
2002
,
}
@MISC
{
LL2002
,
AUTHOR
=
"LoadLeveler documentation"
,
TITLE
=
"http://www-1.ibm.com/servers/eserver/pseries/library/sp\_books/loadleveler.html"
}
@MISC
{
Maui2002
,
...
...
@@ -69,6 +76,13 @@
TITLE
=
"http://mauischeduler.sourceforge.net/"
,
}
@MISC
{
Quadrics2002
,
AUTHOR
=
"Quadrics Resource Management System"
,
TITLE
=
"http://www.quadrics.com/"
,
}
@CONFERENCE
{
STORM2001
,
...
...
This diff is collapsed.
Click to expand it.
doc/jsspp/jsspp.tex
+
83
−
0
View file @
df4f591d
...
...
@@ -145,6 +145,89 @@ and directs operations.
Compute nodes simply run a
\slurmd\
daemon (similar to a remote shell
daemon) to export control to SLURM.
\subsection
{
What SLURM is Not
}
SLURM is not a comprehensive cluster administration or monitoring package.
While SLURM knows the state of its compute nodes, it makes no attempt to put
this information to use in other ways, such as with a general purpose event
logging mechanism or a back-end database for recording historical state.
It is expected that SLURM will be deployed in a cluster with other
tools performing these functions.
SLURM is not a meta-batch system like Globus
\cite
{
Globus2002
}
or DPCS (Distributed Production Control System)
\cite
{
DPCS2002
}
.
SLURM supports resource management across a single cluster.
SLURM is not a sophisticated batch system.
In fact, it was expressly designed to provide high-performance
parallel job management while leaving scheduling decisions to an
external entity.
\section
{
Architecture
}
\subsection
{
Node Management
}
\subsection
{
Partition Management
}
\subsection
{
Job Management
}
\subsection
{
Scheduling Infrastructure
}
Scheduling parallel computers is a very complex matter.
Several good public domain schedulers exist with the most
popular being the Maui Scheduler
\cite
{
Jackson2001,Maui2002
}
.
The scheduler used at our site, DPCS
\cite
{
DPCS2002
}
, is quite
sophisticated and has over 150,000 lines of code.
We felt no need to address scheduling issues within SLURM, but
have instead developed a resource manager with a rich set of
application programming interfaces (APIs) and the flexibility
to satisfy the needs of others working on scheduling issues.
When jobs are submitted to SLURM they are assigned an initial
scheduling priority through a plug-in library function. It
maintains a priority ordered queue of pending jobs
to perform gang scheduling, namely an API
to explicit preempt and later resume a job.
\section
{
Results
}
\begin{figure}
[htb]
\centerline
{
\epsfig
{
file=figures/times.eps
}}
\caption
{
Time to execute /bin/hostname with various node counts
}
\label
{
timing
}
\end{figure}
We were able to perform some SLURM tests on a 1000 node cluster in
November 2002. Some development was still underway at that time and
tuning had not been performed. The results for executing the program
/bin/hostname on two tasks per node and various node counts is show
in Figure~
\ref
{
timing
}
. We found SLURM performance to be comparable
to the Quadrics Resource Management System (RMS)
\cite
{
Quadrics2002
}
for all job sizes and about 80 times faster than IBM
LoadLeveler
\cite
{
LL2002
}
at small job sizes.
(While not shown on this chart, LoadLeveler reaches 1200 seconds to
launch an 8000 task job on 500 nodes.)
\section
{
Future plans
}
We expect SLURM to begin production use on LLNL Linux clusters
starting in March 2003 and be available for distribution shortly
thereafter.
Looking ahead, we anticipate moving the interconnect topography
and API functions into plug-in modules and adding support for
additional systems.
We plan to add support for additional operating systems
(IA64 and x86-64) and interconnects (InfiniBand, Myrinet, and
the IBM Blue Gene
\cite
{
BlueGene2002
}
system
\footnote
{
Blue Gene
has a different interconnect than any supported by SLURM and
a 3-D topography with restrictive allocation constraints.
}
).
We plan to add support for suspending and resuming jobs, which
provides the infrastructure needed to support gang scheduling.
We also plan to support changing the node count associated
with running jobs (as needed for MPI2).
\bibliographystyle
{
plain
}
\bibliography
{
project
}
...
...
This diff is collapsed.
Click to expand it.
doc/pubdesign/report.tex
+
31
−
32
View file @
df4f591d
...
...
@@ -128,6 +128,13 @@ programming interface (API).
\subsection
{
What SLURM is Not
}
SLURM is not a comprehensive cluster administration or monitoring package.
While SLURM knows the state of its compute nodes, it makes no attempt to put
this information to use in other ways, such as with a general purpose event
logging mechanism or a back-end database for recording historical state.
It is expected that SLURM will be deployed in a cluster with other
tools performing these functions.
SLURM is not a sophisticated batch system. Its default scheduler
implements First-In First-Out (FIFO) and is not
intended to directly implement complex site policy.
...
...
@@ -140,26 +147,15 @@ Multiple jobs may be allocated the same nodes
if the administrator has configured nodes for shared access and/or
the job has requested shared resources for improved responsiveness.
SLURM does not directly perform gang scheduling (time-slicing
of parallel jobs). However it does does provide the infrastructure
for a meta-scheduler to perform gang scheduling, namely an API
to explicit preempt and later resume a job.
An external scheduler may submit, signal, hold,
of parallel jobs). An external scheduler may submit, signal, hold,
reorder and terminate jobs via the API.
SLURM is not a meta-batch system like Globus
\cite
{
Globus2002
}
or DPCS (Distributed Production Control System)
\cite
{
DPCS2002
}
.
SLURM supports resource management across a single cluster.
SLURM is not a comprehensive cluster administration or monitoring package.
While SLURM knows the state of its compute nodes, it makes no attempt to put
this information to use in other ways, such as with a general purpose event
logging mechanism or a back-end database for recording historical state.
It is expected that SLURM will be deployed in a cluster with other
tools performing these functions.
\subsection
{
Architecture
}
\begin{figure}
[tb]
\centerline
{
\epsfig
{
file=figures/arch.eps
}}
\caption
{
SLURM Architecture
}
...
...
@@ -1385,32 +1381,35 @@ the application's tasks.
\label
{
timing
}
\end{figure}
We were able to perform some SLURM tests on a 1000 node cluster in November
2002. Some development was still underway at that time and tuning had not been
performed. The results for executing the program /bin/hostname on two tasks
per node and various node counts is show in Figure~
\ref
{
timing
}
.
We found SLURM performance to be comparable to RMS for all job
sizes and about 100 times faster than LoadLeveler at small job sizes.
We were able to perform some SLURM tests on a 1000 node cluster in
November 2002. Some development was still underway at that time and
tuning had not been performed. The results for executing the program
/bin/hostname on two tasks per node and various node counts is show
in Figure~
\ref
{
timing
}
. We found SLURM performance to be comparable
to the Quadrics Resource Management System (RMS)
\cite
{
Quadrics2002
}
for all job sizes and about 80 times faster than IBM
LoadLeveler
\cite
{
LL2002
}
at small job sizes.
(While not shown on this chart, LoadLeveler reaches 1200 seconds to
launch an 8000 task job on 500 nodes.)
\section
{
Future plans
}
As of January 2003, some work remains before we feel ready to
distribute SLURM for general use. Work needed at that time was
primarily in scalability, fault-tolerance, and testing.
We expect SLURM to begin production use on LLNL Linux clusters
in February 2003.
While SLURM does properly support job and partition time limits,
the resource utilization information by job (CPU cycles used,
memory used, etc.) is still under development.
Looking ahead, we anticipate porting SLURM to the
IBM Blue Gene
\cite
{
BlueGene2002
}
in the fall of 2003.
Blue Gene has a different interconnect than any supported
by SLURM.
It also has a 3-D topography with restrictive allocation constraints.
starting in March 2003 and be available for distribution shortly
thereafter.
Looking ahead, we anticipate moving the interconnect topography
and API functions into plug-in modules and adding support for
additional systems.
We plan to add support for additional operating systems
(IA64 and x86-64) and interconnects (InfiniBand, Myrinet, and
the IBM Blue Gene
\cite
{
BlueGene2002
}
system
\footnote
{
Blue Gene
has a different interconnect than any supported by SLURM and
a 3-D topography with restrictive allocation constraints.
}
).
We plan to add support for suspending and resuming jobs, which
provides the infrastructure needed to support gang scheduling.
We also plan to support changing the node count associated
with running jobs (as needed for MPI2).
\section
{
Acknowledgments
}
...
...
This diff is collapsed.
Click to expand it.
Preview
0%
Loading
Try again
or
attach a new file
.
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Save comment
Cancel
Please
register
or
sign in
to comment