Skip to content
GitLab
Explore
Sign in
Primary navigation
Search or go to…
Project
S
Slurm
Manage
Activity
Members
Labels
Plan
Issues
Issue boards
Milestones
Code
Merge requests
Repository
Branches
Commits
Tags
Repository graph
Compare revisions
Snippets
Build
Pipelines
Jobs
Pipeline schedules
Artifacts
Deploy
Releases
Package Registry
Model registry
Operate
Environments
Terraform modules
Monitor
Incidents
Service Desk
Analyze
Value stream analytics
Contributor analytics
CI/CD analytics
Repository analytics
Model experiments
Help
Help
Support
GitLab documentation
Compare GitLab plans
Community forum
Contribute to GitLab
Provide feedback
Terms and privacy
Keyboard shortcuts
?
Snippets
Groups
Projects
Show more breadcrumbs
tud-zih-energy
Slurm
Commits
a9a828ab
Commit
a9a828ab
authored
13 years ago
by
Morris Jette
Browse files
Options
Downloads
Patches
Plain Diff
BlueGene/Q web page update
Major update to BlueGene web page specifically to include BlueGene/Q information.
parent
bb8477b3
No related branches found
No related tags found
No related merge requests found
Changes
1
Hide whitespace changes
Inline
Side-by-side
Showing
1 changed file
doc/html/bluegene.shtml
+117
-97
117 additions, 97 deletions
doc/html/bluegene.shtml
with
117 additions
and
97 deletions
doc/html/bluegene.shtml
+
117
−
97
View file @
a9a828ab
...
@@ -12,16 +12,17 @@ described in this document.</p>
...
@@ -12,16 +12,17 @@ described in this document.</p>
<p>BlueGene systems have several unique features making for a few
<p>BlueGene systems have several unique features making for a few
differences in how SLURM operates there.
differences in how SLURM operates there.
The BlueGene system consists of one or more <i>base partitions</i> or
BlueGene systems consists of one or more <i>base partitions</i> or
<i>midplanes</i> connected in a three-dimensional torus.
<i>midplanes</i> connected in a three-dimensional (BlueGene/L and BlueGene/P
Each <i>base partition</i> consists of 512 <i>c-nodes</i> each containing two
systems) or five-dimensional (BlueGene/Q) torus.
or more cores;
Each <i>base partition</i> typically includes 512 <i>c-nodes</i> or compute
one designed primarily for managing communications while the others are used
nodes each containing two or more cores;
primarily for computations.
one core is typically designed primarily for managing communications while the
The <i>c-nodes</i> can execute only one process and thus are unable to execute
other cores are used primarily for computations.
both the user's jobs and SLURM's <i>slurmd</i> daemon.
Each <i>c-node</i> can execute only one process and thus are unable to execute
Thus the <i>slurmd</i> daemons executes on one or more of the BlueGene <i>Front
both the user's application plus SLURM's <i>slurmd</i> daemon.
End Nodes</i>.
Thus the <i>slurmd</i> daemon(s) executes on one or more of the BlueGene
<i>Front End Nodes</i>.
The <i>slurmd</i> daemons provide (almost) all of the normal SLURM services
The <i>slurmd</i> daemons provide (almost) all of the normal SLURM services
for every <i>base partition</i> on the system. </p>
for every <i>base partition</i> on the system. </p>
...
@@ -29,21 +30,21 @@ for every <i>base partition</i> on the system. </p>
...
@@ -29,21 +30,21 @@ for every <i>base partition</i> on the system. </p>
a processor count equal to the number of cores on the base partition, which
a processor count equal to the number of cores on the base partition, which
keeps the number of entities being managed by SLURM more reasonable.
keeps the number of entities being managed by SLURM more reasonable.
Since the current BlueGene software can sub-allocate a <i>base partition</i>
Since the current BlueGene software can sub-allocate a <i>base partition</i>
into blocks of 32 and/or 128 <i>c-nodes</i>, more than one user job can execute
into smaller blocks, more than one user job can execute on each <i>base
on each <i>base partition</i> (subject to system administrator configuration).
partition</i> (subject to system administrator configuration). In the case of
BlueGene/Q systems, more than one user job can also execute in each block.
To effectively utilize this environment, SLURM tools present the user with
To effectively utilize this environment, SLURM tools present the user with
the view that each <i>c-node</i> is a separate node, so allocation requests
the view that each <i>c-node</i> is a separate node, so allocation requests
and status information use <i>c-node</i> counts (this is a new feature in
and status information use <i>c-node</i> counts.
SLURM version 1.1).
Since the <i>c-node</i> count can be very large, the suffix "k" can be used
Since the <i>c-node</i> count can be very large, the suffix "k" can be used
to represent multiples of 1024 (e.g. "2k" is equivalent to "2048").</p>
to represent multiples of 1024 or "m" for multiples of 1,048,576 (1024 x 1024).
For example, "2k" is equivalent to "2048".</p>
<h2>User Tools</h2>
<h2>User Tools</h2>
<p>The normal set of SLURM user tools: sbatch, scancel, sinfo, squeue, and scontrol
<p>The normal set of SLURM user tools: sbatch, scancel, sinfo, squeue, and
provide all of the expected services except support for job steps.
scontrol provide all of the expected services except support for job steps,
SLURM performs resource allocation for the job, but initiation of tasks is performed
which is detailed later.
using the <i>mpirun</i> command. SLURM has no concept of a job step on BlueGene.
Seven new sbatch options are available:
Seven new sbatch options are available:
<i>--geometry</i> (specify job size in each dimension),
<i>--geometry</i> (specify job size in each dimension),
<i>--no-rotate</i> (disable rotation of geometry),
<i>--no-rotate</i> (disable rotation of geometry),
...
@@ -56,25 +57,25 @@ Seven new sbatch options are available:
...
@@ -56,25 +57,25 @@ Seven new sbatch options are available:
<i>--ramdisk-image</i> (specify alternative ramdisk image for bluegene block. Default if not set, BGL only.)
<i>--ramdisk-image</i> (specify alternative ramdisk image for bluegene block. Default if not set, BGL only.)
The <i>--nodes</i> option with a minimum and (optionally) maximum node count continues
The <i>--nodes</i> option with a minimum and (optionally) maximum node count continues
to be available.
to be available.
Note that this is a c-node count.</p>
Note that this is a c-node count.</p>
<p>To reiterate: sbatch is used to submit a job script,
<h3>Task Launch on BlueGene/Q only</h3>
but mpirun is used to launch the parallel tasks.
Note that a SLURM batch job's default stdout and stderr file names are generated
<p>Use SLURM's srun command to launch tasks (srun is a wrapper for IBM's
using the SLURM job ID.
<i>runjob</i> command.
When the SLURM control daemon is restarted, SLURM job ID values can be repeated,
SLURM job step information including accounting functions as expected.</p>
therefore it is recommended that batch jobs explicitly specify unique names for
stdout and stderr files using the srun options <i>--output</i> and <i>--error</i>
<h3>Task Launch on BlueGene/L and BlueGene/P only</h3>
respectively.
While the salloc command may be used to create an interactive SLURM job,
<p>SLURM performs resource allocation for the job, but initiation of tasks is
it will be the responsibility of the user to insure that the <i>bgblock</i>
performed using the <i>mpirun</i> command. SLURM has no concept of a job step
is ready for use before initiating any mpirun commands.
on BlueGene/L or BlueGene/P systems.
SLURM will assume this responsibility for batch jobs.
To reiterate: salloc or sbatch are used to create a job allocation, but
The script that you submit to SLURM can contain multiple invocations of mpirun as
<i>mpirun</i> is used to launch the parallel tasks.
well as any desired commands for pre- and post-processing.
The script that you submit to SLURM can contain multiple invocations of mpirun
as well as any desired commands for pre- and post-processing.
The mpirun command will get its <i>bgblock</i> information from the
The mpirun command will get its <i>bgblock</i> information from the
<i>MPIRUN_PARTITION</i> as set by SLURM. A sample script is shown below.
<i>MPIRUN_PARTITION</i> as set by SLURM. A sample script is shown below.
</p>
<pre>
<pre>
#!/bin/bash
#!/bin/bash
# pre-processing
# pre-processing
...
@@ -84,45 +85,47 @@ mpirun -exec /home/user/prog -cwd /home/user -args 123
...
@@ -84,45 +85,47 @@ mpirun -exec /home/user/prog -cwd /home/user -args 123
mpirun -exec /home/user/prog -cwd /home/user -args 124
mpirun -exec /home/user/prog -cwd /home/user -args 124
# post-processing
# post-processing
date
date
</pre>
</p>
</pre>
<h3><a name="naming">Naming Conventions</a></h3>
<h3><a name="naming">Naming Conventions</a></h3>
<p>The naming of base partitions includes a three-digit suffix representing the its
<p>The naming of base partitions includes a numeric suffix representing the its
coordinates in the X, Y and Z dimensions with a zero origin.
coordinates with a zero origin. The suffix contains three digits on BlueGene/L
For example, "bg012" represents the base partition whose coordinate is at X=0, Y=1 and Z=2. In a system
and BlueGene/P systems, while four digits are required for the BlueGene/Q
configured with <i>small blocks</i> (any block less than a full base partition) there will be divisions
systems. For example, "bgp012" represents the base partition whose coordinate
into the base partition notation. For example, if there were 64 psets in the
is at X=0, Y=1 and Z=2.
configuration, bg012[0-15] represents
SLURM uses an abbreviated format for describing base partitions in which the
the first quarter or first 16 ionodes of a midplane. In BlueGene/L
end-points of the block enclosed are in square-brackets and separated by an "x".
this would be 128 c-node block. To represent the first nodecard in the
For example, "bgp[620x731]" is used to represent the eight base partitions
second quarter or ionodes 16-19 the notation would be bg012[16-19], or
enclosed in a block with end-points and bgp620 and bgp731 (bgp620, bgp621,
a 32 c-node block.
bgp630, bgp631, bgp720, bgp721, bgp730 and bgp731).</p>
Since jobs must allocate consecutive base partitions in all three dimensions, we have developed
an abbreviated format for describing the base partitions in one of these three-dimensional blocks.
<p><b>IMPORTANT:</b> SLURM higher can support up to 36 elements in each
The base partition has a prefix determined from the system which is followed by the end-points
BlueGene dimension by supporting "A-Z" as valid numbers. SLURM requires the
of the block enclosed in square-brackets and separated by an "x".
prefix to be lower case and any letters in the suffix must always be upper
For example, "bg[620x731]" is used to represent the eight base partitions enclosed in a block
case. This schema must be used in both the slurm.conf and bluegene.conf
with end-points and bg620 and bg731 (bg620, bg621, bg630, bg631, bg720, bg721,
configuration files when specifying midplane/node names (the prefix is
bg730 and bg731).</p></a>
optional). This schema should also be used to specify midplanes or locations
in configure mode of smap:
<p>
<b>IMPORTANT:</b> SLURM version 1.2 or higher can handle a bluegene system of
sizes up to 36x36x36. To try to keep with the 'three-digit suffix
representing the its coordinates in the X, Y and Z dimensions with a
zero origin', we now support A-Z as valid numbers. This makes it so
the prefix <b>must always be lower case</b>, and any letters in the
three-digit suffix <b> must always be upper case</b>. This schema
should be used in your slurm.conf file and in your bluegene.conf file
if you put a prefix there even though it is not necessary there. This
schema should also be used to specify midplanes or locations in
configure mode of smap.
<br>
<br>
valid: bgl[000xC44], bgl000, bglZZZ
valid: bgl[000xC44], bgl000, bglZZZ
<br>
<br>
invalid: BGL[000xC44], BglC00, bglb00, Bglzzz
invalid: BGL[000xC44], BglC00, bglb00, Bglzzz
</p>
</p>
<p>In a system configured with <i>small blocks</i> (any block less
than a full base partition) there will be divisions in the base partition
notation. On BlueGene/L and BlueGene/P systems, the base partition name may
be followed by a square bracket enclosing ID numbers of the IO nodes associated
with the block. For example, if there are 64 psets in a BlueGene/L
configuration, "bgl012[0-15]" represents the first quarter or first 16 IO nodes
of a midplane. In BlueGene/L this would be 128 c-node block. To represent
the first nodecard in the second quarter or IO nodes 16-19, the notation would
be "bgl012[16-19]", or a 32 c-node block. On BlueGene/Q systems, the specific
c-nodes would be identified in square brackets using their five digit
coordinates. For example "bgq0123[00000x11111]" would represent the 32 c-nodes
in midplane "bgq0123" having coordinates (within that midplane) from zero to
one in each of the five dimensions.</p>
<p>Two topology-aware graphical user interfaces are provided: <i>smap</i> and
<p>Two topology-aware graphical user interfaces are provided: <i>smap</i> and
<i>sview</i> (<i>sview</i> provides more viewing and configuring options).
<i>sview</i> (<i>sview</i> provides more viewing and configuring options).
See each command's man page for details.
See each command's man page for details.
...
@@ -235,9 +238,9 @@ distributed among the slurmd daemons to balance the workload.
...
@@ -235,9 +238,9 @@ distributed among the slurmd daemons to balance the workload.
You can use the scontrol command to drain individual compute nodes as desired
You can use the scontrol command to drain individual compute nodes as desired
and return them to service.</p>
and return them to service.</p>
<p>The <i>slurm.conf</i> (configuration) file needs to have the value of
<i>InactiveLimit</i>
<p>The <i>slurm.conf</i> (configuration) file needs to have the value of
set to zero or not specified (it defaults to a value of zero).
<i>InactiveLimit</i>
set to zero or not specified (it defaults to a value of zero).
This is because there are no job steps
and
we don't want to purge jobs prematurely.
This is because
if
there are no job steps
,
we don't want to purge jobs prematurely.
The value of <i>SelectType</i> must be set to "select/bluegene" in order to have
The value of <i>SelectType</i> must be set to "select/bluegene" in order to have
node selection performed using a system aware of the system's topography
node selection performed using a system aware of the system's topography
and interfaces.
and interfaces.
...
@@ -327,8 +330,8 @@ FrontendName=frontend[00-01] State=UNKNOWN
...
@@ -327,8 +330,8 @@ FrontendName=frontend[00-01] State=UNKNOWN
NodeName=bg[000x733] CPUs=1024 State=UNKNOWN
NodeName=bg[000x733] CPUs=1024 State=UNKNOWN
</pre>
</pre>
<p>While users are unable to initiate SLURM job steps on BlueGene
systems,
<p>While users are unable to initiate SLURM job steps on BlueGene
/L or BlueGene/P
this restriction does not apply to user root or <i>SlurmUser</i>.
systems,
this restriction does not apply to user root or <i>SlurmUser</i>.
Be advised that the slurmd daemon is unable to manage a large number of job
Be advised that the slurmd daemon is unable to manage a large number of job
steps, so this ability should be used only to verify normal SLURM operation.
steps, so this ability should be used only to verify normal SLURM operation.
If large numbers of job steps are initiated by slurmd, expect the daemon to
If large numbers of job steps are initiated by slurmd, expect the daemon to
...
@@ -347,7 +350,7 @@ System administrators should use the <i>smap</i> tool to build appropriate
...
@@ -347,7 +350,7 @@ System administrators should use the <i>smap</i> tool to build appropriate
configuration file for static partitioning.
configuration file for static partitioning.
Note that <i>smap -Dc</i> can be run without the SLURM daemons
Note that <i>smap -Dc</i> can be run without the SLURM daemons
active to establish the initial configuration.
active to establish the initial configuration.
Note that the
defined
bgblocks may not overlap (except for the
Note that the bgblocks
defined using smap
may not overlap (except for the
full-system bgblock, which is implicitly created).
full-system bgblock, which is implicitly created).
See the smap man page for more information.</p>
See the smap man page for more information.</p>
...
@@ -405,12 +408,12 @@ for this mode.</p>
...
@@ -405,12 +408,12 @@ for this mode.</p>
(i.e. "<i>scontrol update BlockName=RMP0 state=error</i>").
(i.e. "<i>scontrol update BlockName=RMP0 state=error</i>").
This will end any job on the block and set the state of the block to ERROR
This will end any job on the block and set the state of the block to ERROR
making it so no job will run on the block. To set it back to a usable
making it so no job will run on the block. To set it back to a usable
state set the state to free (i.e.
state
,
set the state to free (i.e.
"<i>scontrol update BlockName=RMP0 state=free</i>").
"<i>scontrol update BlockName=RMP0 state=free</i>").
<p>Alternatively, if only part of a base partition needs to be put
<p>Alternatively, if only part of a base partition needs to be put
into an error state which isn't already in a block of the size you
into an error state which isn't already in a block of the size you
need, you can set a
set
of
io
nodes into an error state
with
scontrol
,
need, you can set a
collection
of
IO
nodes into an error state
using
scontrol
(i.e. "<i>scontrol update subbpname=bg000[0-3] state=error</i>").
(i.e. "<i>scontrol update subbpname=bg000[0-3] state=error</i>").
This will end any job on the nodes listed, create a block there, and set
This will end any job on the nodes listed, create a block there, and set
the state of the block to ERROR making it so no job will run on the
the state of the block to ERROR making it so no job will run on the
...
@@ -432,14 +435,19 @@ file (i.e. <i>BasePartitionNodeCnt=512</i> and <i>NodeCardNodeCnt=32</i>).</p>
...
@@ -432,14 +435,19 @@ file (i.e. <i>BasePartitionNodeCnt=512</i> and <i>NodeCardNodeCnt=32</i>).</p>
<p>Note that the <i>Numpsets</i> values defined in
<p>Note that the <i>Numpsets</i> values defined in
<i>bluegene.conf</i> is used only when SLURM creates bgblocks this
<i>bluegene.conf</i> is used only when SLURM creates bgblocks this
determines if the system is IO rich or not. For most
b
lue
g
ene/L
determines if the system is IO rich or not. For most
B
lue
G
ene/L
systems this value is either 8 (for IO poor systems) or 64 (for IO rich
systems this value is either 8 (for IO poor systems) or 64 (for IO rich
systems).
systems).</p>
<p>The <i>Images</i> can change during job start based on input from
the user.
<p>The <i>Images</i> file specifications identify which images are used when
booting a bgblock and the valid images are different for each BlueGene system
type (e.g. L, P and Q). Their values can change during job allocation based on
input from the user.
If you change the bgblock layout, then slurmctld and slurmd should
If you change the bgblock layout, then slurmctld and slurmd should
both be cold-started (e.g. <b>/etc/init.d/slurm startclean</b>).
both be cold-started (without preserving any state information,
If you wish to modify the <i>Numpsets</i> values
"/etc/init.d/slurm startclean").</p>
<p>If you wish to modify the <i>Numpsets</i> values
for existing bgblocks, either modify them manually or destroy the bgblocks
for existing bgblocks, either modify them manually or destroy the bgblocks
and let SLURM recreate them.
and let SLURM recreate them.
Note that in addition to the bgblocks defined in <i>bluegene.conf</i>, an
Note that in addition to the bgblocks defined in <i>bluegene.conf</i>, an
...
@@ -450,7 +458,7 @@ bgblocks.
...
@@ -450,7 +458,7 @@ bgblocks.
A sample <i>bluegene.conf</i> file is shown below.
A sample <i>bluegene.conf</i> file is shown below.
<pre>
<pre>
###############################################################################
###############################################################################
# Global specifications for BlueGene system
# Global specifications for
a
BlueGene
/L
system
#
#
# BlrtsImage: BlrtsImage used for creation of all bgblocks.
# BlrtsImage: BlrtsImage used for creation of all bgblocks.
# LinuxImage: LinuxImage used for creation of all bgblocks.
# LinuxImage: LinuxImage used for creation of all bgblocks.
...
@@ -552,7 +560,7 @@ BridgeAPIVerbose=0
...
@@ -552,7 +560,7 @@ BridgeAPIVerbose=0
# volume = 1x1x1 = 1
# volume = 1x1x1 = 1
BPs=[000x000] Type=TORUS # 1x1x1 = 1 midplane
BPs=[000x000] Type=TORUS # 1x1x1 = 1 midplane
BPs=[001x001] Type=SMALL 32CNBlocks=4 128CNBlocks=3 # 1x1x1 = 4-Nodecard sized
BPs=[001x001] Type=SMALL 32CNBlocks=4 128CNBlocks=3 # 1x1x1 = 4-Nodecard sized
# cnode blocks 3-Base
# c
-
node blocks 3-Base
# Partition Quarter sized
# Partition Quarter sized
# c-node blocks
# c-node blocks
...
@@ -560,8 +568,8 @@ BPs=[001x001] Type=SMALL 32CNBlocks=4 128CNBlocks=3 # 1x1x1 = 4-Nodecard sized
...
@@ -560,8 +568,8 @@ BPs=[001x001] Type=SMALL 32CNBlocks=4 128CNBlocks=3 # 1x1x1 = 4-Nodecard sized
<p>The above <i>bluegene.conf</i> file defines multiple bgblocks to be
<p>The above <i>bluegene.conf</i> file defines multiple bgblocks to be
created in a single midplane (see the "SMALL" option).
created in a single midplane (see the "SMALL" option).
Using this mechanism, up to 32 independent jobs each consisting of
1
Using this mechanism, up to 32 independent jobs each consisting of
32 cnodes can be executed
32 c
-
nodes can be executed
simultaneously on a one-rack BlueGene system.
simultaneously on a one-rack BlueGene system.
If defining bgblocks of <i>Type=SMALL</i>, the SLURM partition
If defining bgblocks of <i>Type=SMALL</i>, the SLURM partition
containing them as defined in <i>slurm.conf</i> must have the
containing them as defined in <i>slurm.conf</i> must have the
...
@@ -573,9 +581,10 @@ scheduler performance.
...
@@ -573,9 +581,10 @@ scheduler performance.
As in all SLURM configuration files, parameters and values
As in all SLURM configuration files, parameters and values
are case insensitive.</p>
are case insensitive.</p>
<p> With a BlueGene/P system the image names are different. The
<p>The valid image names on a BlueGene/P system are CnloadImage, MloaderImage,
correct image names are CnloadImage, MloaderImage, and IoloadImage.
and IoloadImage. The only image name on BlueGene/Q systems is MloaderImage.
You can also use alternate images just the same as described above.
Alternate images may be specified as described above for all BlueGene system
types.</p>
<p>One more thing is required to support SLURM interactions with
<p>One more thing is required to support SLURM interactions with
the DB2 database (at least as of the time this was written).
the DB2 database (at least as of the time this was written).
...
@@ -622,9 +631,9 @@ repeated reboots and the likely failure of user jobs.
...
@@ -622,9 +631,9 @@ repeated reboots and the likely failure of user jobs.
A system administrator should address the problem before returning
A system administrator should address the problem before returning
the base partitions to service.</p>
the base partitions to service.</p>
<p>If
you
cold-start
slurmctl
d (<b>/etc/init.d/slurm startclean</b>
<p>If
the slurmctld daemon is
cold-start
e
d (<b>/etc/init.d/slurm startclean</b>
or <b>slurmctld -c</b>) it is recommended that
you also cold-start
or <b>slurmctld -c</b>) it is recommended that
the slurmd daemon(s) be
the slurm
d at the same time.
cold-starte
d at the same time.
Failure to do so may result in errors being reported by both slurmd
Failure to do so may result in errors being reported by both slurmd
and slurmctld due to bgblocks that previously existed being deleted.</p>
and slurmctld due to bgblocks that previously existed being deleted.</p>
...
@@ -635,11 +644,18 @@ Run <i>sfree --help</i> for more information.</p>
...
@@ -635,11 +644,18 @@ Run <i>sfree --help</i> for more information.</p>
<h4>Resource Reservations</h4>
<h4>Resource Reservations</h4>
<p>SLURM's advance reservation mechanism can accept a node count specification
as input rather than identification of specific nodes/midplanes. In that case,
SLURM may reserve nodes/midplanes which may not be formed into an appropriate
bgblock. Work is planned for SLURM version 2.4 to remedy this problem. Until
that time, identifying the specific nodes/midplanes to be included in an
advanced reservation may be necessary.</p>
<p>SLURM's advance reservation mechanism is designed to reserve resources
<p>SLURM's advance reservation mechanism is designed to reserve resources
at the level of whole nodes, which on a BlueGene systems would represent
at the level of whole nodes, which on a BlueGene systems would represent
whole midplanes. In order to support advanced reservations with a finer
whole midplanes. In order to support advanced reservations with a finer
grained resolution, you can configure one license per cnode on the system
grained resolution, you can configure one license per c
-
node on the system
and reserve cnodes instead of entire midplanes. Note that reserved licenses
and reserve c
-
nodes instead of entire midplanes. Note that reserved licenses
are treated somewhat differently than reserved nodes. When nodes are reserved
are treated somewhat differently than reserved nodes. When nodes are reserved
then jobs using that reservation can use only those nodes. Reserved licenses
then jobs using that reservation can use only those nodes. Reserved licenses
can only be used by jobs associated with that reservation, but licenses not
can only be used by jobs associated with that reservation, but licenses not
...
@@ -649,11 +665,11 @@ explicitly reserved are available to any job.</p>
...
@@ -649,11 +665,11 @@ explicitly reserved are available to any job.</p>
"<i>Licenses=cnode*512</i>". Then create an advanced reservation with a
"<i>Licenses=cnode*512</i>". Then create an advanced reservation with a
command like this:<br>
command like this:<br>
"<i>scontrol create reservation licenses="cnode*32" starttime=now duration=30:00 users=joe</i>".<br>
"<i>scontrol create reservation licenses="cnode*32" starttime=now duration=30:00 users=joe</i>".<br>
Jobs run in this reservation will then have <b>at least</b> 32 cnodes
Jobs run in this reservation will then have <b>at least</b> 32 c
-
nodes
available for their use, but could use more given an appropriate workload.</p>
available for their use, but could use more given an appropriate workload.</p>
<p>There is also a job_submit/cnode plugin available for use that will
<p>There is also a job_submit/cnode plugin available for use that will
automatically set a job's license specification to match its cnode request
automatically set a job's license specification to match its c
-
node request
(i.e. a command like<br>
(i.e. a command like<br>
"<i>sbatch -N32 my.sh</i>" would automatically be translated to<br>
"<i>sbatch -N32 my.sh</i>" would automatically be translated to<br>
"<i>sbatch -N32 --licenses=cnode*32 my.sh</i>" by the slurmctld daemon.
"<i>sbatch -N32 --licenses=cnode*32 my.sh</i>" by the slurmctld daemon.
...
@@ -685,18 +701,22 @@ Run <b>configure</b> with the <b>--enable-bgl-emulation</b> option.
...
@@ -685,18 +701,22 @@ Run <b>configure</b> with the <b>--enable-bgl-emulation</b> option.
This will define "HAVE_BG", "HAVE_BGL", and "HAVE_FRONT_END" in the
This will define "HAVE_BG", "HAVE_BGL", and "HAVE_FRONT_END" in the
config.h file.
config.h file.
You can also emulate a BlueGene/P system with
You can also emulate a BlueGene/P system with
the <b>--enable-bgp-emulation</b> option.
the <b>--enable-bgp-emulation</b> option.
This will define "HAVE_BG", "HAVE_BGP", and "HAVE_FRONT_END" in the
This will define "HAVE_BG", "HAVE_BGP", and "HAVE_FRONT_END" in the
config.h file.
config.h file.
You can also emulate a BlueGene/Q system using
the <b>--enable-bgq-emulation</b> option.
This will define "HAVE_BG", "HAVE_BGQ", and "HAVE_FRONT_END" in the
config.h file.
Then execute <b>make</b> normally.
Then execute <b>make</b> normally.
These variables will build the code as if it were running
These variables will build the code as if it were running
on an actual BlueGene computer, but avoid making calls to the
on an actual BlueGene computer, but avoid making calls to the
Bridge libary (that is controlled by the variable "HAVE_BG_FILES",
Bridge lib
r
ary (that is controlled by the variable "HAVE_BG_FILES",
which is left undefined). You can use this to test configurations,
which is left undefined). You can use this to test configurations,
scheduling logic, etc. </p>
scheduling logic, etc. </p>
<p class="footer"><a href="#top">top</a></p>
<p class="footer"><a href="#top">top</a></p>
<p style="text-align:center;">Last modified
9 March
2011</p>
<p style="text-align:center;">Last modified
16 August
2011</p>
<!--#include virtual="footer.txt"-->
<!--#include virtual="footer.txt"-->
This diff is collapsed.
Click to expand it.
Preview
0%
Loading
Try again
or
attach a new file
.
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Save comment
Cancel
Please
register
or
sign in
to comment