Skip to content
GitLab
Explore
Sign in
Primary navigation
Search or go to…
Project
S
Slurm
Manage
Activity
Members
Labels
Plan
Issues
Issue boards
Milestones
Code
Merge requests
Repository
Branches
Commits
Tags
Repository graph
Compare revisions
Snippets
Build
Pipelines
Jobs
Pipeline schedules
Artifacts
Deploy
Releases
Package Registry
Model registry
Operate
Environments
Terraform modules
Monitor
Incidents
Service Desk
Analyze
Value stream analytics
Contributor analytics
CI/CD analytics
Repository analytics
Model experiments
Help
Help
Support
GitLab documentation
Compare GitLab plans
Community forum
Contribute to GitLab
Provide feedback
Terms and privacy
Keyboard shortcuts
?
Snippets
Groups
Projects
Show more breadcrumbs
tud-zih-energy
Slurm
Commits
f658ec73
Commit
f658ec73
authored
19 years ago
by
Moe Jette
Browse files
Options
Downloads
Patches
Plain Diff
Minor changes to formatting and fix some spelling errors.
parent
6a71b399
No related branches found
No related tags found
No related merge requests found
Changes
1
Hide whitespace changes
Inline
Side-by-side
Showing
1 changed file
doc/html/bluegene.html
+17
-17
17 additions, 17 deletions
doc/html/bluegene.html
with
17 additions
and
17 deletions
doc/html/bluegene.html
+
17
−
17
View file @
f658ec73
...
@@ -148,12 +148,12 @@ Values in the X dimension increase to the right.
...
@@ -148,12 +148,12 @@ Values in the X dimension increase to the right.
Values in the Z dimension increase down and toward the left.
</p>
Values in the Z dimension increase down and toward the left.
</p>
<pre>
<pre>
a a a a b b d d
ID JOBID PARTITION BGL_BLOCK USER NAME ST TIME NODES NODELIST
a a a a b b d d ID JOBID PARTITION BGL_BLOCK USER NAME ST TIME NODES NODELIST
a a a a b b d d
a 12345 batch RMP0 joseph tst1 R 43:12 64 bg[000x333]
a a a a b b d d a 12345 batch RMP0 joseph tst1 R 43:12 64 bg[000x333]
a a a a b b c c
b 12346 debug RMP1 chris sim3 R 12:34 16 bg[420x533]
a a a a b b c c b 12346 debug RMP1 chris sim3 R 12:34 16 bg[420x533]
a a a a b b c c
c 12350 debug RMP2 danny job3 R 0:12 8 bg[622x733]
a a a a b b c c c 12350 debug RMP2 danny job3 R 0:12 8 bg[622x733]
d 12356 debug RMP3 dan colu R 18:05 16 bg[600x731]
d 12356 debug RMP3 dan colu R 18:05 16 bg[600x731]
a a a a b b d d
e 12378 debug RMP4 joseph asx4 R 0:34 4 bg[612x713]
a a a a b b d d e 12378 debug RMP4 joseph asx4 R 0:34 4 bg[612x713]
a a a a b b d d
a a a a b b d d
a a a a b b c c
a a a a b b c c
a a a a b b c c
a a a a b b c c
...
@@ -189,7 +189,7 @@ before draining the associated nodes and aborting the job.</p>
...
@@ -189,7 +189,7 @@ before draining the associated nodes and aborting the job.</p>
<p>
The job will continue to be in a RUNNING state until the bgjob has
<p>
The job will continue to be in a RUNNING state until the bgjob has
completed and the bgblock ownership is changed.
completed and the bgblock ownership is changed.
The time for completing a bgjob has freqently been on the order of
The time for completing a bgjob has freq
u
ently been on the order of
five minutes.
five minutes.
In summary, your job may appear in SLURM as RUNNING for 15 minutes
In summary, your job may appear in SLURM as RUNNING for 15 minutes
before the script actually begins to 5 minutes after it completes.
before the script actually begins to 5 minutes after it completes.
...
@@ -206,8 +206,8 @@ keys scroll the window containing the text information.</p>
...
@@ -206,8 +206,8 @@ keys scroll the window containing the text information.</p>
<h3>
System Administration
</h3>
<h3>
System Administration
</h3>
<p>
As of IBM's REV 2 driver SLURM must be built in 64bit mod.
<p>
As of IBM's REV 2 driver SLURM must be built in 64bit mod.
This can be done by specifying
<
i
>
CFLAGS=-m64 CXX="g++ -m64"
</
i
>
.
Both CFLAGS
This can be done by specifying
<
b
>
CFLAGS=-m64 CXX="g++ -m64"
</
b
>
.
and CXX must be set for
slurm
to compile correctly.
Both CFLAGS
and CXX must be set for
SLURM
to compile correctly.
<p>
Building a Blue Gene compatible system is dependent upon the
<p>
Building a Blue Gene compatible system is dependent upon the
<i>
configure
</i>
program locating some expected files.
<i>
configure
</i>
program locating some expected files.
In particular, the configure script searches for
<i>
libdb2.so
</i>
in the
In particular, the configure script searches for
<i>
libdb2.so
</i>
in the
...
@@ -237,7 +237,7 @@ row/rack/midplane data.</p>
...
@@ -237,7 +237,7 @@ row/rack/midplane data.</p>
to configure and build two sets of files for installation.
to configure and build two sets of files for installation.
One set will be for the Service Node (SN), which has direct access to the BG Bridge APIs.
One set will be for the Service Node (SN), which has direct access to the BG Bridge APIs.
The second set will be for the Front End Nodes (FEN), whick lack access to the
The second set will be for the Front End Nodes (FEN), whick lack access to the
Bridge APIs and interact with using Remote Proce
e
dure Calls to the slurmctld daemon.
Bridge APIs and interact with using Remote Procedure Calls to the slurmctld daemon.
You should see "#define HAVE_BG 1" and "#define HAVE_FRONT_END 1" in the "config.h"
You should see "#define HAVE_BG 1" and "#define HAVE_FRONT_END 1" in the "config.h"
file for both the SN and FEN builds.
file for both the SN and FEN builds.
You should also see "#define HAVE_BG_FILES 1" in config.h on the SN before
You should also see "#define HAVE_BG_FILES 1" in config.h on the SN before
...
@@ -297,7 +297,7 @@ etc.). Sample prolog and epilog scripts follow. </p>
...
@@ -297,7 +297,7 @@ etc.). Sample prolog and epilog scripts follow. </p>
with each other's scheduling, backfill scheduling is not presently meaningful.
with each other's scheduling, backfill scheduling is not presently meaningful.
SLURM's builtin scheduler on Blue Gene will sort pending jobs and then attempt
SLURM's builtin scheduler on Blue Gene will sort pending jobs and then attempt
to schedule all of them in priority order.
to schedule all of them in priority order.
This essent
a
illy functions as if there is a separate queue for each job size.
This essenti
a
lly functions as if there is a separate queue for each job size.
Note that SLURM does support different partitions with an assortment of
Note that SLURM does support different partitions with an assortment of
different scheduling parameters.
different scheduling parameters.
For example, SLURM can have defined a partition for full system jobs that
For example, SLURM can have defined a partition for full system jobs that
...
@@ -314,7 +314,7 @@ the scontrol reconfig command. </p>
...
@@ -314,7 +314,7 @@ the scontrol reconfig command. </p>
"NodeName=bg[000x733] NodeAddr=frontend0 NodeHostname=frontend0 Procs=1024".
"NodeName=bg[000x733] NodeAddr=frontend0 NodeHostname=frontend0 Procs=1024".
Based on the prefix you give to the noderange in the NodeName= variable
Based on the prefix you give to the noderange in the NodeName= variable
the bgl blocks will be named by such. Thus this can be anything you want, but
the bgl blocks will be named by such. Thus this can be anything you want, but
needs to be consi
ta
nt throughout the slurm.conf file.
needs to be consi
ste
nt throughout the slurm.conf file.
Note that the values of both NodeAddr and NodeHostname for all
Note that the values of both NodeAddr and NodeHostname for all
128 base partitions is the name of the front end node executing
128 base partitions is the name of the front end node executing
the slurmd daemon.
the slurmd daemon.
...
@@ -365,7 +365,7 @@ both be cold-started (e.g. <b>/etc/init.d/slurm startclean</b>).
...
@@ -365,7 +365,7 @@ both be cold-started (e.g. <b>/etc/init.d/slurm startclean</b>).
If you which to modify the Image and Numpsets values for existing
If you which to modify the Image and Numpsets values for existing
bgblocks, either modify them manually or destroy the bgblocks
bgblocks, either modify them manually or destroy the bgblocks
and let SLURM recreate them.
and let SLURM recreate them.
Note that in addition to the bgblocks defined in blugene.conf, an
Note that in addition to the bgblocks defined in blu
e
gene.conf, an
additional bgblock is created containing all resources defined
additional bgblock is created containing all resources defined
all of the other defined bgblocks.
all of the other defined bgblocks.
If you modify the bgblocks, it is recommended that you restart
If you modify the bgblocks, it is recommended that you restart
...
@@ -394,7 +394,7 @@ bgblocks. A sample <i>bluegene.conf</i> file is shown below.
...
@@ -394,7 +394,7 @@ bgblocks. A sample <i>bluegene.conf</i> file is shown below.
# Bridge API logs.
# Bridge API logs.
# BridgeAPIVerbose: How verbose the BG Bridge API logs should be
# BridgeAPIVerbose: How verbose the BG Bridge API logs should be
# 0: Log only error and warning messages
# 0: Log only error and warning messages
# 1: Log level 0 and information messa
s
ges
# 1: Log level 0 and information messages
# 2: Log level 1 and basic debug messages
# 2: Log level 1 and basic debug messages
# 3: Log level 2 and more debug message
# 3: Log level 2 and more debug message
# 4: Log all messages
# 4: Log all messages
...
@@ -470,7 +470,7 @@ prior to initiating the SLURM daemons.</p>
...
@@ -470,7 +470,7 @@ prior to initiating the SLURM daemons.</p>
<p>
At some time in the future, we expect SLURM to support
<i>
dynamic
<p>
At some time in the future, we expect SLURM to support
<i>
dynamic
partitioning
</i>
in which Blue Gene job partitions are created and destroyed
partitioning
</i>
in which Blue Gene job partitions are created and destroyed
as needed to accomodate the workload.
as needed to accom
m
odate the workload.
At that time the
<i>
bluegene.conf
</i>
configuration file will become obsolete.
At that time the
<i>
bluegene.conf
</i>
configuration file will become obsolete.
Dynamic partition does involve substantial overhead including the
Dynamic partition does involve substantial overhead including the
rebooting of c-nodes and I/O nodes.
</p>
rebooting of c-nodes and I/O nodes.
</p>
...
@@ -523,7 +523,7 @@ block on request.
...
@@ -523,7 +523,7 @@ block on request.
apply to Blue Gene systems.
apply to Blue Gene systems.
One can start the
<b>
slurmctld
</b>
and
<b>
slurmd
</b>
in the foreground
One can start the
<b>
slurmctld
</b>
and
<b>
slurmd
</b>
in the foreground
with extensive debugging to establish basic functionality.
with extensive debugging to establish basic functionality.
Once runn
n
ing in production, the configured
<b>
SlurmctldLog
</b>
and
Once running in production, the configured
<b>
SlurmctldLog
</b>
and
<b>
SlurmdLog
</b>
files will provide historical system information.
<b>
SlurmdLog
</b>
files will provide historical system information.
On Blue Gene systems, there is also a
<b>
BridgeAPILogFile
</b>
defined
On Blue Gene systems, there is also a
<b>
BridgeAPILogFile
</b>
defined
in
<b>
bluegene.conf
</b>
which can be configured to contain detailed
in
<b>
bluegene.conf
</b>
which can be configured to contain detailed
...
@@ -532,7 +532,7 @@ information about every Bridge API call issued.</p>
...
@@ -532,7 +532,7 @@ information about every Bridge API call issued.</p>
<p>
Note that slurmcltld log messages of the sort
<p>
Note that slurmcltld log messages of the sort
<i>
Nodes bg[000x133] not responding
</i>
are indicative of the slurmd
<i>
Nodes bg[000x133] not responding
</i>
are indicative of the slurmd
daemon serving as a front-end to those nodes is not responding (on
daemon serving as a front-end to those nodes is not responding (on
non-Blue Gene systems, the slurmd act
a
ully does run on the compute
non-Blue Gene systems, the slurmd actu
a
lly does run on the compute
nodes, so the message is more meaningful there).
</p>
nodes, so the message is more meaningful there).
</p>
<p
class=
"footer"
><a
href=
"#top"
>
top
</a></p></td>
<p
class=
"footer"
><a
href=
"#top"
>
top
</a></p></td>
...
...
This diff is collapsed.
Click to expand it.
Preview
0%
Loading
Try again
or
attach a new file
.
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Save comment
Cancel
Please
register
or
sign in
to comment