Skip to content
GitLab
Explore
Sign in
Primary navigation
Search or go to…
Project
S
Slurm
Manage
Activity
Members
Labels
Plan
Issues
Issue boards
Milestones
Code
Merge requests
Repository
Branches
Commits
Tags
Repository graph
Compare revisions
Snippets
Build
Pipelines
Jobs
Pipeline schedules
Artifacts
Deploy
Releases
Package Registry
Model registry
Operate
Environments
Terraform modules
Monitor
Incidents
Service Desk
Analyze
Value stream analytics
Contributor analytics
CI/CD analytics
Repository analytics
Model experiments
Help
Help
Support
GitLab documentation
Compare GitLab plans
Community forum
Contribute to GitLab
Provide feedback
Terms and privacy
Keyboard shortcuts
?
Snippets
Groups
Projects
Show more breadcrumbs
tud-zih-energy
Slurm
Commits
045700f0
Commit
045700f0
authored
20 years ago
by
Moe Jette
Browse files
Options
Downloads
Patches
Plain Diff
Major update. Framework of papaer largely in place.
parent
9281a027
No related branches found
No related tags found
No related merge requests found
Changes
1
Hide whitespace changes
Inline
Side-by-side
Showing
1 changed file
doc/bgl.report/report.tex
+98
-16
98 additions, 16 deletions
doc/bgl.report/report.tex
with
98 additions
and
16 deletions
doc/bgl.report/report.tex
+
98
−
16
View file @
045700f0
...
@@ -310,26 +310,89 @@ identifies the directory in which to find the plugin.
...
@@ -310,26 +310,89 @@ identifies the directory in which to find the plugin.
\section
{
Blue Gene/L Specific Resource Management Issues
}
\section
{
Blue Gene/L Specific Resource Management Issues
}
SLURM was only required to address a one-dimensional topology.
Several issues needed to be addressed for SLURM to support BGL:
It was obvious that the resource selection logic would require a major
pseudo-nodes representing the base partitions, topology,
redesign. Plugin...
\slurmd\
executing only on the front-end-node,
BGL wiring issues and use of the BGL-specific APIs.
The topology requirements also necessitated the addition of several
BGL wiring issues are extensive and addressed in a separate section.
Since a BGL base partition is the minimum allocation unit for a job,
it was natural to consider each one as an independent SLURM node.
This meant SLURM would manage a very reasonable 128 nodes
rather than tens of thousands of individual c-nodes.
The
\slurmd\
daemon was designed to execute on each SLURM
nodes to monitor the status of that node, launch job steps, etc.
Unfortunately BGL prohibited the execute of SLURM daemons within
the base partitions on any of the c-nodes.
SLIURM forced to execute one
\slurmd\
for the entire BGL system
on a front-end node.
In addition, the typical Unix mechanism used to interact with a
compute host do not function with BGL base partitions.
This issue was addressed by adding a SLURM parameter to
indicate when it is running with a front-end node, in which case
there is assumed to be a single
\slurmd\
for the entire system.
SLURM was originally designed to address a one-dimensional topology
and this impacted a variety of areas from naming convensions to
node selection.
SLURM provides resource management on several Linux clusters
exceeding 1000 nodes and it is impractical to display or otherwise
work with hundreds of individual node names.
SLURM addresses this by using regular expressions to indicate
ranges of node names.
For example, "linux[0-1023]" was used to represent 1024 nodes
with names having a prefix of "linux" and a numeric suffic ranging
from "0" to "1023".
The most reasonable way to name the BGL nodes seemed to be
using a three digit suffix, but rather than indicate a monotonically
increasing number, each digit would represent the base partition's
location in the X, Y and Z dimensions (the value of X ranges
from 0 to 7, Y from 0 to 3, and Z from 0 to 3 on the LLNL system).
For example, "bgl012" would represent the base partition at
the position X=0, Y=1 and Z=2.
Since BGL resources naturally tend to be rectangular prisms in
shape, we modified the regular expression to indicate the two
extreme base partition locations.
The name prefix is always "bgl".
Within the brackets one lists the base partition with the smallest
X, Y and Z coordinates followed by a "x" followed by the base
partition with the highest X, Y and Z coordinates.
For example, "bgl[200x311]" represents the following eight base
partitions: bgl200, bgl201, bgl210, bgl211, bgl300, bgl301, bgl310
and bgl311.
Note that this method does can not accomodate blocks of base
partitions that wrap over the torus boundaries particularly well,
although a regular expression of this sort is supported:
"bgl[000x-011,700x711]".
The node selection functionality is another topology aware
SLURM component.
Rather than embedding BGL-specific logic into a multitude of
locations, all of this logic was put into a single plugin.
The pre-existing node selection logic was put into a plugin
supporting typical Linux clusters with node names based
upon a one-dimensional array.
The BGL-specific plugin not only selects nodes for pending jobs
based upon BGL topography, but issues the BGL-specific APIs
to monitor the system health (draining nodes with any failure
mode) and perform initialization and termination sequences for the job.
BGL's topology requirement necessitated the addition of several
\srun\
options:
{
\em
--geometry
}
to specify the dimension required by
\srun\
options:
{
\em
--geometry
}
to specify the dimension required by
the job,
the job,
{
\em
--no-rotate
}
to indicate of the geometry specification could rotate
{
\em
--no-rotate
}
to indicate of the geometry specification could rotate
in three-dimensions,
in three-dimensions,
{
\em
--comm-type
}
to indicate the communctions type being mesh or torus,
{
\em
--node-use
}
to specify if the second process on a c-node should
{
\em
--node-use
}
to specify if the second process on a c-node should
be used to execute the user application or be used for communications.
be used to execute the user application or be used for communications.
While
\srun\
accepts these new options on all computer systems,
The
\slurmd\
daemon was designed to execute on the individual SLURM
the node selection plugin logic is used to manage this data in an
nodes to monitor the status of that computer, launch job steps, etc.
opaque data type.
BGL prohibited the execute of SLURM daemons within the base partitions.
Since these new data types are unused on non-BGL systems, the
In addition the base partition was a ...
functions to manage them perform no work.
\slurmd\
needed to execute on front-end-node....
Other computers with other topology requiremens will be able to
Disable job step.
take advantage of this plugin infrastructure as well with minimal
effort.
Base partitions are virtual nodes to SLURM.
In order to provide users with a clear view of the BGL topology, a new
In order to provide users with a clear view of the BGL topology, a new
tools was developed.
tools was developed.
...
@@ -354,17 +417,36 @@ Table ~\ref{smap_out}.
...
@@ -354,17 +417,36 @@ Table ~\ref{smap_out}.
\end{center}
\end{center}
\end{table}
\end{table}
Rather than modifying SLURM to initiate and manage the parallel
tasks for BGL jobs, we decided utilize existing software from IBM.
This eliminated a multitude of software integration issues.
SLURM will manage resources, select resources for the job,
set an environment variable BGL
\_
PARTITION
\_
ID, and spawn
a script.
The job will initiate its parallel tasks through the use of
{
\em
mpirun
}
.
{
\em
mpirun
}
uses BGL-specific APIs to launch and manage the
tasks.
An additional benefit of this architecture is that the single
\slurmd\
for the entire system is relieved of job step management, which
could involve a significant amount of overhead for a computer
of BGL's size.
We disabled SLURM's job step support for normal users to
mitigate the possible impact of users inadvertently attempting
to initiate job steps through SLURM.
\section
{
Blue Gene/L Network Wiring Issues
}
\section
{
Blue Gene/L Network Wiring Issues
}
TBD
TBD
Static partitioning
\section
{
Results
}
\section
{
Results
}
TBD
TBD
\section
{
Future Plans
}
\section
{
Future Plans
}
TBD
Dynamic partitioning
\raggedright
\raggedright
% make the bibliography
% make the bibliography
...
...
This diff is collapsed.
Click to expand it.
Preview
0%
Loading
Try again
or
attach a new file
.
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Save comment
Cancel
Please
register
or
sign in
to comment