Skip to content
Snippets Groups Projects
Commit e63fd033 authored by Moe Jette's avatar Moe Jette
Browse files

major update for slurm v1.3

parent 548700be
No related branches found
No related tags found
No related merge requests found
...@@ -28,6 +28,74 @@ configuration file. ...@@ -28,6 +28,74 @@ configuration file.
.LP .LP
The overall configuration parameters available include: The overall configuration parameters available include:
.TP
\fBAccountingStorageEnforce\fR
If to a non-zero value and the user, partition, account association is not
defined for a job in the accounting database then prevent the job from being
executed.
The default value is zero.
.TP
\fBAccountingStorageHost\fR
Define the name of the host where the database is running we are going
to store the accounting data.
Only used for database type storage plugins, ignored otherwise.
Also see \fBDefaultStorageHost\fR.
.TP
\fBAccountingStorageLoc\fR
Specifies the location of the file or database where accounting
records are written.
Also see \fBDefaultStorageLoc\fR.
.TP
\fBAccountingStoragePass\fR
Define the password used to gain access to the database to store the
accounting data.
Only used for database type storage plugins, ignored otherwise.
Also see \fBDefaultStoragePassr\fR.
.TP
\fBAccountingStoragePort\fR
Define the port the database server is listening on where we are going
to store the accounting data.
Only used for database type storage plugins, ignored otherwise.
Also see \fBDefaultStoragePort\fR.
.TP
\fBAccountingStorageType\fR
Define the accounting storage mechanism type.
Acceptable values at present include
"accounting_storage/filetxt", "accounting_storage/gold",
"accounting_storage/mysql", "accounting_storage/none",
"accounting_storage/pgsql", and "accounting_storage/slurmdbd".
The value "accounting_storage/filetxt" indicates that accounting records
will be written to a the file specified by the
\fBAccountingStorageLoc\fR parameter.
The value "accounting_storage/gold" indicates that account records
will be written to Gold
(http://www.clusterresources.com/pages/products/gold-allocation-manager.ph),
which maintains its own database.
The value "accounting_storage/mysql" indicates that accounting records
should be written to a mysql database specified by the
\fBAccountingStorageLoc\fR parameter.
The default value is "accounting_storage/none", which means that
account records are not maintained.
The value "accounting_storage/pgsql" indicates that accounting records
should be written to a postresql database specified by the
\fBAccountingStorageLoc\fR parameter.
The value "accounting_storage/slurmdbd" indicates that accounting records
will be written to SlurmDbd, which maintains its own database. See
"man slurmdbd" for more information.
Also see \fBDefaultStorageType\fR.
.TP
\fBAccountingStorageUser\fR
Define the name of the user we are going to connect to the database
with to store the accounting data.
Only used for database type storage plugins, ignored otherwise.
Also see \fBDefaultStorageUser\fR.
.TP .TP
\fBAuthType\fR \fBAuthType\fR
Define the authentication method for communications between SLURM Define the authentication method for communications between SLURM
...@@ -39,8 +107,9 @@ communication messages is not verified. ...@@ -39,8 +107,9 @@ communication messages is not verified.
This may be fine for testing purposes, but This may be fine for testing purposes, but
\fBdo not use "auth/none" if you desire any security\fR. \fBdo not use "auth/none" if you desire any security\fR.
"auth/authd" indicates that Brett Chun's authd is to be used (see "auth/authd" indicates that Brett Chun's authd is to be used (see
"http://www.theether.org/authd/" for more information). "http://www.theether.org/authd/" for more information, Note that
"auth/munge" indicates that Chris Dunlap's munge is to be used authd is no longer actively supported).
"auth/munge" indicates that LLNL's MUNGE is to be used
(this is the best supported authentication mechanism for SLURM, (this is the best supported authentication mechanism for SLURM,
see "http://home.gna.org/munge/" for more information). see "http://home.gna.org/munge/" for more information).
All SLURM daemons and commands must be terminated prior to changing All SLURM daemons and commands must be terminated prior to changing
...@@ -87,6 +156,12 @@ Acceptable values at present include ...@@ -87,6 +156,12 @@ Acceptable values at present include
"checkpoint/none". "checkpoint/none".
The default value is "checkpoint/none". The default value is "checkpoint/none".
.TP
\fBClusterName\fR
The name by which this SLURM managed cluster is known for accounting
purposes. This is needed distinguish between accounting data from
multiple clusters being recorded in a single database.
.TP .TP
\fBControlAddr\fR \fBControlAddr\fR
Name that \fBControlMachine\fR should be referred to in Name that \fBControlMachine\fR should be referred to in
...@@ -113,17 +188,69 @@ to take effect. ...@@ -113,17 +188,69 @@ to take effect.
Acceptable values at present include "crypto/munge" and "crypto/openssl". Acceptable values at present include "crypto/munge" and "crypto/openssl".
OpenSSL offers the best performance and is available with an OpenSSL offers the best performance and is available with an
Apache style open source license. Apache style open source license.
Munge is a little slower, but is availble under the Gnu General Public Munge is a little slower, but is available under the Gnu General Public
License (GPL). License (GPL).
The default value is "crypto/openssl". The default value is "crypto/openssl".
.TP .TP
\fBDefMemPerTask\fR \fBDefMemPerTask\fR
Default real memory size availble per task in MegaBytes. Default real memory size available per task in MegaBytes.
Used to avoid over\-subscribing memory and causing paging. Used to avoid over\-subscribing memory and causing paging.
Also see \fBMaxMemPerTask\fR. Also see \fBMaxMemPerTask\fR.
The default value is 0 (unlimited). The default value is 0 (unlimited).
.TP
\fBDefaultStorageHost\fR
Define the name of the host where the database is running and used to
to store the accounting and job completion data.
Only used for database type storage plugins, ignored otherwise.
Also see \fBAccountingStorageHost\fR and \fBJobCompHost\fR.
.TP
\fBDefaultStorageLoc\fR
Specifies the location of the file or database where accounting
and job completion records are written.
Also see \fBAccountingStorageLoc\fR and \fBJobCompLoc\fR.
.TP
\fBDefaultStoragePass\fR
Define the password used to gain access to the database to store the
accounting and job completion data.
Only used for database type storage plugins, ignored otherwise.
Also see \fBAccountingStoragePass\fR and \fBJobCompPass\fR.
.TP
\fBDefaultStoragePort\fR
Define the port the database server is listening on where we are going
to store the accounting and job completion data.
Only used for database type storage plugins, ignored otherwise.
Also see \fBAccountingStoragePort\fR and \fBJobCompPort\fR.
.TP
\fBDefaultStorageType\fR
Define the accounting and job completion storage mechanism type.
Acceptable values at present include
"filetxt", "gold", "mysql", "none", "pgsql", and "slurmdbd".
The value "filetxt" indicates that records will be written to a the file.
The value "gold" indicates that records will be written to Gold
(http://www.clusterresources.com/pages/products/gold-allocation-manager.ph),
which maintains its own database.
The value "mysql" indicates that accounting records will be written to
a mysql database.
The default value is "none", which means that records are not maintained.
The value "pgsql" indicates that records will be written to a postresql
database.
The value "slurmdbd" indicates that records will be written to SlurmDbd,
which maintains its own database. See "man slurmdbd for more information".
Also see \fBAccountingStorageType\fR and \fBJobCompType\fR.
.TP
\fBDefaultStorageUser\fR
Define the name of the user we are going to connect to the database
with to store the accounting and job completion data.
Only used for database type storage plugins, ignored otherwise.
Also see \fBAccountingStorageUser\fR and \fBJobCompUser\fR.
.TP .TP
\fBEpilog\fR \fBEpilog\fR
Fully qualified pathname of a script to execute as user root on every Fully qualified pathname of a script to execute as user root on every
...@@ -158,7 +285,7 @@ Consider which value you want to be used for scheduling purposes. ...@@ -158,7 +285,7 @@ Consider which value you want to be used for scheduling purposes.
\fB1\fR (default) \fB1\fR (default)
Consider the configuration of each node to be that specified in the Consider the configuration of each node to be that specified in the
configuration file and any node with less configuration file and any node with less
than the configured resouces will be set DOWN. than the configured resources will be set DOWN.
.TP .TP
\fB0\fR \fB0\fR
Base scheduling decisions upon the actual configuration of Base scheduling decisions upon the actual configuration of
...@@ -243,52 +370,37 @@ A value of zero disables real the periodic job sampling and provides accounting ...@@ -243,52 +370,37 @@ A value of zero disables real the periodic job sampling and provides accounting
information only on job termination (reducing SLURM interference with the job). information only on job termination (reducing SLURM interference with the job).
.TP .TP
\fBJobAcctStorageType\fR \fBJobCompHost\fR
Define the job accounting storage mechanism type. Define the name of the host where the database is running and used
Acceptable values at present include "jobacct_storage/none", "jobacct_storage/filetxt", to store the job completion data.
"jobacct_storage/mysql", "jobacct_storage/pgsql", and "jobacct_storage/script".
The default value is "jobacct_storage/none", which means that job
accounting isn't recorded for the system.
The value "jobacct_storage/filetxt" indicates that a record of the job should be
written to a text file specified by the \fBJobAcctStorageLoc\fR parameter.
The value "jobacct_storage/mysql" indicates that a record of the job should be
written to a mysql database specified by the \fBJobAcctStorageLoc\fR parameter.
The value "jobacct_storage/pgsql" indicates that a record of the job should be
written to a postresql database specified by the \fBJobAcctStorageLoc\fR parameter.
.TP
\fBJobAcctStorageLoc\fR
Define the location where job accounting logs are to be written either
a filename or a database name.
.TP
\fBJobAcctStorageHost\fR
Define the name of the host the database is running where we are going
to store the job accounting data.
Only used for database type storage plugins, ignored otherwise. Only used for database type storage plugins, ignored otherwise.
Also see \fBDefaultStorageHost\fR.
.TP .TP
\fBJobAcctStoragePort\fR \fBJobCompLoc\fR
Define the port the database server is listening on where we are going The interpretation of this value depends upon the logging mechanism
to store the job accounting data. specified by the \fBJobCompType\fR parameter either a filename or a
Only used for database type storage plugins, ignored otherwise. database name.
Also see \fBDefaultStorageLoc\fR.
.TP .TP
\fBJobAcctStorageUser\fR \fBJobCompPass\fR
Define the name of the user we are going to connect to the database Define the password used to gain access to the database to store the job completion data.
with to store the job accounting data.
Only used for database type storage plugins, ignored otherwise. Only used for database type storage plugins, ignored otherwise.
Also see \fBDefaultStoragePass\fR.
.TP .TP
\fBJobAcctStoragePass\fR \fBJobCompPort\fR
Define the password used to gain access to the database to store the job accounting data. Define the port the database server is listening on where we are going
to store the job completion data.
Only used for database type storage plugins, ignored otherwise. Only used for database type storage plugins, ignored otherwise.
Also see \fBDefaultStoragePort\fR.
.TP .TP
\fBJobCompType\fR \fBJobCompType\fR
Define the job completion logging mechanism type. Define the job completion logging mechanism type.
Acceptable values at present include "jobcomp/none", "jobcomp/filetxt", Acceptable values at present include "jobcomp/none", "jobcomp/filetxt",
"jobcomp/mysql", "jobcomp/pgsql", and "jobcomp/script". "jobcomp/mysql", "jobcomp/pgsql", "jobcomp/script"and "jobcomp/slurmdbd".
The default value is "jobcomp/none", which means that upon job completion The default value is "jobcomp/none", which means that upon job completion
the record of the job is purged from the system. the record of the job is purged from the system.
The value "jobcomp/filetxt" indicates that a record of the job should be The value "jobcomp/filetxt" indicates that a record of the job should be
...@@ -300,35 +412,17 @@ written to a postgresql database specified by the \fBJobCompLoc\fR parameter. ...@@ -300,35 +412,17 @@ written to a postgresql database specified by the \fBJobCompLoc\fR parameter.
The value "jobcomp/script" indicates that a script specified by the The value "jobcomp/script" indicates that a script specified by the
\fBJobCompLoc\fR parameter is to be executed with environment variables \fBJobCompLoc\fR parameter is to be executed with environment variables
indicating the job information. indicating the job information.
The value "jobcomp/slurmdbd" indicates that job completion records
.TP will be written to SlurmDbd, which maintains its own database. See
\fBJobCompLoc\fR "man slurmdbd" for more information.
The interpretation of this value depends upon the logging mechanism Also see \fBDefaultStorageType\fR.
specified by the \fBJobCompType\fR parameter either a filename or a
database name.
.TP
\fBJobCompHost\fR
Define the name of the host the database is running where we are going
to store the job completion data.
Only used for database type storage plugins, ignored otherwise.
.TP
\fBJobCompPort\fR
Define the port the database server is listening on where we are going
to store the job completion data.
Only used for database type storage plugins, ignored otherwise.
.TP .TP
\fBJobCompUser\fR \fBJobCompUser\fR
Define the name of the user we are going to connect to the database Define the name of the user we are going to connect to the database
with to store the job completion data. with to store the job completion data.
Only used for database type storage plugins, ignored otherwise. Only used for database type storage plugins, ignored otherwise.
Also see \fBDefaultStorageUser\fR.
.TP
\fBJobCompPass\fR
Define the password used to gain access to the database to store the job completion data.
Only used for database type storage plugins, ignored otherwise.
.TP .TP
\fBJobCredentialPrivateKey\fR \fBJobCredentialPrivateKey\fR
...@@ -361,17 +455,12 @@ Use the \fBsbatch\fR \fI\-\-no\-requeue\fR or \fI\-\-requeue\fR ...@@ -361,17 +455,12 @@ Use the \fBsbatch\fR \fI\-\-no\-requeue\fR or \fI\-\-requeue\fR
option to change the default behavior for individual jobs. option to change the default behavior for individual jobs.
The default value is 1. The default value is 1.
.TP
\fBKillTree\fR
This option is mapped to "ProctrackType=proctrack/linuxproc".
It will be removed from a future release.
.TP .TP
\fBKillWait\fR \fBKillWait\fR
The interval, in seconds, given to a job's processes between the The interval, in seconds, given to a job's processes between the
SIGTERM and SIGKILL signals upon reaching its time limit. SIGTERM and SIGKILL signals upon reaching its time limit.
If the job fails to terminate gracefully If the job fails to terminate gracefully
in the interval specified, it will be forcably terminated. in the interval specified, it will be forcibly terminated.
The default value is 30 seconds. The default value is 30 seconds.
May not exceed 65533. May not exceed 65533.
...@@ -465,7 +554,7 @@ NOTE: "proctrack/linuxproc" is not compatible with "switch/elan." ...@@ -465,7 +554,7 @@ NOTE: "proctrack/linuxproc" is not compatible with "switch/elan."
Acceptable values at present include: Acceptable values at present include:
.RS .RS
.TP .TP
\fBproctrack/aix\fR which uses an AIX kernel extenstion and is \fBproctrack/aix\fR which uses an AIX kernel extension and is
the default for AIX systems the default for AIX systems
.TP .TP
\fBproctrack/linuxproc\fR which uses linux process tree using \fBproctrack/linuxproc\fR which uses linux process tree using
...@@ -665,7 +754,7 @@ Acceptable values include ...@@ -665,7 +754,7 @@ Acceptable values include
.TP .TP
\fBselect/linear\fR \fBselect/linear\fR
for allocation of entire nodes assuming a for allocation of entire nodes assuming a
one\-dimentional array of nodes in which sequentially ordered one\-dimensional array of nodes in which sequentially ordered
nodes are preferable. nodes are preferable.
This is the default value for non\-BlueGene systems. This is the default value for non\-BlueGene systems.
.TP .TP
...@@ -677,7 +766,7 @@ partitions by using the \fIShared=Exclusive\fR option. ...@@ -677,7 +766,7 @@ partitions by using the \fIShared=Exclusive\fR option.
See the partition \fBShared\fR parameter for more information. See the partition \fBShared\fR parameter for more information.
.TP .TP
\fBselect/bluegene\fR \fBselect/bluegene\fR
for a three\-dimentional BlueGene system. for a three\-dimensional BlueGene system.
The default value is "select/bluegene" for BlueGene systems. The default value is "select/bluegene" for BlueGene systems.
.RE .RE
...@@ -695,7 +784,7 @@ The following values are supported for \fBSelectType=select/cons_res\fR: ...@@ -695,7 +784,7 @@ The following values are supported for \fBSelectType=select/cons_res\fR:
\fBCR_CPU\fR \fBCR_CPU\fR
CPUs are consumable resources. CPUs are consumable resources.
There is no notion of sockets, cores or threads. There is no notion of sockets, cores or threads.
On a multi\-core system, each core will be consided a CPU. On a multi\-core system, each core will be considered a CPU.
On a multi\-core and hyperthreaded system, each thread will be On a multi\-core and hyperthreaded system, each thread will be
considered a CPU. considered a CPU.
On single\-core systems, each CPUs will be considered a CPU. On single\-core systems, each CPUs will be considered a CPU.
...@@ -816,18 +905,6 @@ will take responsibility for monitoring the state of each compute node ...@@ -816,18 +905,6 @@ will take responsibility for monitoring the state of each compute node
and its \fBslurmd\fR daemon. and its \fBslurmd\fR daemon.
The value may not exceed 65533. The value may not exceed 65533.
.TP
\fBStateSaveLocation\fR
Fully qualified pathname of a directory into which the SLURM controller,
\fBslurmctld\fR, saves its state (e.g. "/usr/local/slurm/checkpoint").
SLURM state will saved here to recover from system failures.
\fBSlurmUser\fR must be able to create files in this directory.
If you have a \fBBackupController\fR configured, this location should be
readable and writable by both systems.
The default value is "/tmp".
If any slurm daemons terminate abnormally, their core files will also be written
into this directory.
.TP .TP
\fBSlurmDbdAddr\fR \fBSlurmDbdAddr\fR
Name that the Slurm DBD (Data Base Daemon) should be referred to Name that the Slurm DBD (Data Base Daemon) should be referred to
...@@ -845,6 +922,9 @@ The interpretation of this option is specific to the configured \fBAuthType\fR. ...@@ -845,6 +922,9 @@ The interpretation of this option is specific to the configured \fBAuthType\fR.
In the case of \fIauth/munge\fR, this can be configured to use a Munge daemon In the case of \fIauth/munge\fR, this can be configured to use a Munge daemon
specifically configured to provide authentication between clusters while the specifically configured to provide authentication between clusters while the
default Munge daemon provides authentication within a cluster. default Munge daemon provides authentication within a cluster.
In that case, \fBSlurmDbdAuthInfo\fR should specify the named port to be used
for communications with the alternate Munge daemon (e.g.
"/var/run/munge/global.socket.2")
The default value is NULL, which results in the default authentication The default value is NULL, which results in the default authentication
mechanism being used. mechanism being used.
...@@ -868,12 +948,24 @@ launch of a job step. The command line arguments for the executable will ...@@ -868,12 +948,24 @@ launch of a job step. The command line arguments for the executable will
be the command and arguments of the job step. This configuration parameter be the command and arguments of the job step. This configuration parameter
may be overridden by srun's \fB\-\-prolog\fR parameter. may be overridden by srun's \fB\-\-prolog\fR parameter.
.TP
\fBStateSaveLocation\fR
Fully qualified pathname of a directory into which the SLURM controller,
\fBslurmctld\fR, saves its state (e.g. "/usr/local/slurm/checkpoint").
SLURM state will saved here to recover from system failures.
\fBSlurmUser\fR must be able to create files in this directory.
If you have a \fBBackupController\fR configured, this location should be
readable and writable by both systems.
The default value is "/tmp".
If any slurm daemons terminate abnormally, their core files will also be written
into this directory.
.TP .TP
\fBSuspendExcNodes\fR \fBSuspendExcNodes\fR
Specifies the nodes which are to not be placed in power save mode, even Specifies the nodes which are to not be placed in power save mode, even
if the node remains idle for an extended period of time. if the node remains idle for an extended period of time.
Use SLURM's hostlist expression to identify nodes. Use SLURM's hostlist expression to identify nodes.
By default no nodes are exclueded. By default no nodes are excluded.
Related configuration options include \fBResumeProgram\fR, \fBResumeRate\fR, Related configuration options include \fBResumeProgram\fR, \fBResumeRate\fR,
\fBSuspendProgram\fR, \fBSuspendRate\fR, \fBSuspendTime\fR and \fBSuspendProgram\fR, \fBSuspendRate\fR, \fBSuspendTime\fR and
\fBSuspendExcParts\fR. \fBSuspendExcParts\fR.
...@@ -883,7 +975,7 @@ Related configuration options include \fBResumeProgram\fR, \fBResumeRate\fR, ...@@ -883,7 +975,7 @@ Related configuration options include \fBResumeProgram\fR, \fBResumeRate\fR,
Specifies the partitions whose nodes are to not be placed in power save Specifies the partitions whose nodes are to not be placed in power save
mode, even if the node remains idle for an extended period of time. mode, even if the node remains idle for an extended period of time.
Multiple partitions can be identified and separated by commas. Multiple partitions can be identified and separated by commas.
By default no nodes are exclueded. By default no nodes are excluded.
Related configuration options include \fBResumeProgram\fR, \fBResumeRate\fR, Related configuration options include \fBResumeProgram\fR, \fBResumeRate\fR,
\fBSuspendProgram\fR, \fBSuspendRate\fR, \fBSuspendTime\fR and \fBSuspendProgram\fR, \fBSuspendRate\fR, \fBSuspendTime\fR and
\fBSuspendExcNodes\fR. \fBSuspendExcNodes\fR.
...@@ -1020,7 +1112,7 @@ The default value is 50, meaning each slurmd daemon can communicate ...@@ -1020,7 +1112,7 @@ The default value is 50, meaning each slurmd daemon can communicate
with up to 50 other slurmd daemons and over 2500 nodes can be contacted with up to 50 other slurmd daemons and over 2500 nodes can be contacted
with two message hops. with two message hops.
The default value will work well for most clusters. The default value will work well for most clusters.
Optimaly system performance can typically be achieved if \fBTreeWidth\fR Optimal system performance can typically be achieved if \fBTreeWidth\fR
is set to the square root of the number of nodes in the cluster for is set to the square root of the number of nodes in the cluster for
systems having no more than 2500 nodes or the cube root for larger systems having no more than 2500 nodes or the cube root for larger
systems. systems.
...@@ -1037,7 +1129,7 @@ processes. The program will be run as the same user as the slurmd (usually ...@@ -1037,7 +1129,7 @@ processes. The program will be run as the same user as the slurmd (usually
.TP .TP
\fBUnkillableStepTimeout\fR \fBUnkillableStepTimeout\fR
The length of time, in seconds, that SLURM will wait before deciding that The length of time, in seconds, that SLURM will wait before deciding that
processes in a job step are unkillable (after they have been signalled with processes in a job step are unkillable (after they have been signaled with
SIGKILL). The default timeout value is 60 seconds. SIGKILL). The default timeout value is 60 seconds.
.TP .TP
...@@ -1118,7 +1210,7 @@ in a DOWN, DRAIN or FAILING state without altering permanent ...@@ -1118,7 +1210,7 @@ in a DOWN, DRAIN or FAILING state without altering permanent
configuration information. configuration information.
A job step's tasks are allocated to nodes in order the nodes appear A job step's tasks are allocated to nodes in order the nodes appear
in the configuration file. There is presently no capability within in the configuration file. There is presently no capability within
SLURM to arbitarily order a job step's tasks. SLURM to arbitrarily order a job step's tasks.
.LP .LP
Multiple node names may be comma separated (e.g. "alpha,beta,gamma") Multiple node names may be comma separated (e.g. "alpha,beta,gamma")
and/or a simple node range expression may optionally be used to and/or a simple node range expression may optionally be used to
...@@ -1145,7 +1237,7 @@ The node configuration specified the following information: ...@@ -1145,7 +1237,7 @@ The node configuration specified the following information:
Name that SLURM uses to refer to a node (or base partition for Name that SLURM uses to refer to a node (or base partition for
BlueGene systems). BlueGene systems).
Typically this would be the string that "/bin/hostname \-s" Typically this would be the string that "/bin/hostname \-s"
returns, however it may be an arbitary string if returns, however it may be an arbitrary string if
\fBNodeHostname\fR is specified. \fBNodeHostname\fR is specified.
If the \fBNodeName\fR is "DEFAULT", the values specified If the \fBNodeName\fR is "DEFAULT", the values specified
with that record will apply to subsequent node specifications with that record will apply to subsequent node specifications
...@@ -1180,6 +1272,15 @@ they must exactly match the entries in the \fBNodeName\fR ...@@ -1180,6 +1272,15 @@ they must exactly match the entries in the \fBNodeName\fR
By default, the \fBNodeAddr\fR will be identical in value to By default, the \fBNodeAddr\fR will be identical in value to
\fBNodeName\fR. \fBNodeName\fR.
.TP
\fBCoresPerSocket\fR
Number of cores in a single physical processor socket (e.g. "2").
The CoresPerSocket value describes physical cores, not the
logical number of processors per socket.
\fBNOTE\fR: If you have multi\-core processors, you will likely
need to specify this parameter in order to optimize scheduling.
The default value is 1.
.TP .TP
\fBFeature\fR \fBFeature\fR
A comma delimited list of arbitrary strings indicative of some A comma delimited list of arbitrary strings indicative of some
...@@ -1190,11 +1291,6 @@ If desired a feature may contain a numeric component indicating, ...@@ -1190,11 +1291,6 @@ If desired a feature may contain a numeric component indicating,
for example, processor speed. for example, processor speed.
By default a node has no features. By default a node has no features.
.TP
\fBRealMemory\fR
Size of real memory on the node in MegaBytes (e.g. "2048").
The default value is 1.
.TP .TP
\fBProcs\fR \fBProcs\fR
Number of logical processors on the node (e.g. "2"). Number of logical processors on the node (e.g. "2").
...@@ -1203,26 +1299,8 @@ If Procs is omitted, it will be inferred from ...@@ -1203,26 +1299,8 @@ If Procs is omitted, it will be inferred from
The default value is 1. The default value is 1.
.TP .TP
\fBSockets\fR \fBRealMemory\fR
Number of physical processor sockets/chips on the node (e.g. "2"). Size of real memory on the node in MegaBytes (e.g. "2048").
If Sockets is omitted, it will be inferred from
\fBProcs\fR, \fBCoresPerSocket\fR, and \fBThreadsPerCore\fR.
\fBNOTE\fR: If you have multi\-core processors, you will likely
need to specify these parameters.
The default value is 1.
.TP
\fBCoresPerSocket\fR
Number of cores in a single physical processor socket (e.g. "2").
The CoresPerSocket value describes physical cores, not the
logical number of processors per socket.
\fBNOTE\fR: If you have multi\-core processors, you will likely
need to specify this parameter.
The default value is 1.
.TP
\fBThreadsPerCore\fR
Number of logical threads in a single physical core (e.g. "2").
The default value is 1. The default value is 1.
.TP .TP
...@@ -1231,6 +1309,15 @@ Identifies the reason for a node being in state "DOWN", "DRAINED" ...@@ -1231,6 +1309,15 @@ Identifies the reason for a node being in state "DOWN", "DRAINED"
"DRAINING", "FAIL" or "FAILING". "DRAINING", "FAIL" or "FAILING".
Use quotes to enclose a reason having more than one word. Use quotes to enclose a reason having more than one word.
.TP
\fBSockets\fR
Number of physical processor sockets/chips on the node (e.g. "2").
If Sockets is omitted, it will be inferred from
\fBProcs\fR, \fBCoresPerSocket\fR, and \fBThreadsPerCore\fR.
\fBNOTE\fR: If you have multi\-core processors, you will likely
need to specify these parameters.
The default value is 1.
.TP .TP
\fBState\fR \fBState\fR
State of the node with respect to the initiation of user jobs. State of the node with respect to the initiation of user jobs.
...@@ -1247,7 +1334,12 @@ to any new jobs. ...@@ -1247,7 +1334,12 @@ to any new jobs.
but will be established when the \fBslurmd\fR daemon on that node but will be established when the \fBslurmd\fR daemon on that node
registers. registers.
The default value is "UNKNOWN". The default value is "UNKNOWN".
Also see the \fBDownNodes\fR paramter below. Also see the \fBDownNodes\fR parameter below.
.TP
\fBThreadsPerCore\fR
Number of logical threads in a single physical core (e.g. "2").
The default value is 1.
.TP .TP
\fBTmpDisk\fR \fBTmpDisk\fR
...@@ -1339,7 +1431,7 @@ Jobs executed as user root can use any partition without regard to ...@@ -1339,7 +1431,7 @@ Jobs executed as user root can use any partition without regard to
the value of AllowGroups. the value of AllowGroups.
If user root attempts to execute a job as another user (e.g. using If user root attempts to execute a job as another user (e.g. using
srun's \-\-uid option), this other user must be in one of groups srun's \-\-uid option), this other user must be in one of groups
identified by AllowGroups for the job to succesfully execute. identified by AllowGroups for the job to successfully execute.
The default value is "ALL". The default value is "ALL".
.TP .TP
...@@ -1357,17 +1449,6 @@ APIs or commands. ...@@ -1357,17 +1449,6 @@ APIs or commands.
Possible values are "YES" and "NO". Possible values are "YES" and "NO".
The default value is "NO". The default value is "NO".
.TP
\fBRootOnly\fR
Specifies if only user ID zero (i.e. user \fIroot\fR) may allocate resources
in this partition. User root may allocate resources for any other user,
but the request must be initiated by user root.
This option can be useful for a partition to be managed by some
external entity (e.g. a higher\-level job manager) and prevents
users from directly using those resources.
Possible values are "YES" and "NO".
The default value is "NO".
.TP .TP
\fBMaxNodes\fR \fBMaxNodes\fR
Maximum count of nodes (or base partitions for BlueGene systems) which Maximum count of nodes (or base partitions for BlueGene systems) which
...@@ -1419,6 +1500,18 @@ Note that a partition's priority takes precedence over a job's ...@@ -1419,6 +1500,18 @@ Note that a partition's priority takes precedence over a job's
priority. priority.
The value may not exceed 65533. The value may not exceed 65533.
.TP
\fBRootOnly\fR
Specifies if only user ID zero (i.e. user \fIroot\fR) may allocate resources
in this partition. User root may allocate resources for any other user,
but the request must be initiated by user root.
This option can be useful for a partition to be managed by some
external entity (e.g. a higher\-level job manager) and prevents
users from directly using those resources.
Possible values are "YES" and "NO".
The default value is "NO".
.TP .TP
\fBShared\fR \fBShared\fR
Controls the ability of the partition to execute more than one job at a Controls the ability of the partition to execute more than one job at a
...@@ -1502,7 +1595,7 @@ BackupAddr=edev1 ...@@ -1502,7 +1595,7 @@ BackupAddr=edev1
.br .br
# #
.br .br
AuthType=auth/authd AuthType=auth/munge
.br .br
Epilog=/usr/local/slurm/epilog Epilog=/usr/local/slurm/epilog
.br .br
...@@ -1512,21 +1605,11 @@ FastSchedule=1 ...@@ -1512,21 +1605,11 @@ FastSchedule=1
.br .br
FirstJobId=65536 FirstJobId=65536
.br .br
HeartbeatInterval=60
.br
InactiveLimit=120 InactiveLimit=120
.br .br
JobCompType=jobcomp/mysql JobCompType=jobcomp/filetxt
.br
JobCompLoc=slurm_jobcomp_db
.br
JobCompHost=localhost
.br
JobCompPort=1234
.br
JobCompUser=mysql
.br .br
JobCompPass=secret? JobCompLoc=/var/log/slurm/jobcomp
.br .br
KillWait=30 KillWait=30
.br .br
...@@ -1538,13 +1621,11 @@ PluginDir=/usr/local/lib:/usr/local/slurm/lib ...@@ -1538,13 +1621,11 @@ PluginDir=/usr/local/lib:/usr/local/slurm/lib
.br .br
ReturnToService=0 ReturnToService=0
.br .br
SchedulerType=sched/wiki SchedulerType=sched/backfill
.br .br
SchedulerPort=7004 SlurmctldLogFile=/var/log/slurm/slurmctld.log
.br .br
SlurmctldLogFile=/var/log/slurmctld.log SlurmdLogFile=/var/log/slurm/slurmd.log
.br
SlurmdLogFile=/var/log/slurmd.log
.br .br
SlurmctldPort=7002 SlurmctldPort=7002
.br .br
...@@ -1564,14 +1645,6 @@ JobCredentialPrivateKey=/usr/local/slurm/private.key ...@@ -1564,14 +1645,6 @@ JobCredentialPrivateKey=/usr/local/slurm/private.key
.br .br
JobCredentialPublicCertificate=/usr/local/slurm/public.cert JobCredentialPublicCertificate=/usr/local/slurm/public.cert
.br .br
JobAcctGatherType=jobacct/linux
.br
JobAccGatherFrequency=30
.br
JobAcctStorageType=jobacct_storage/filetxt
.br
JobAcctStorageLoc=/var/log/slurm_accounting.log
.br
# #
.br .br
# Node Configurations # Node Configurations
...@@ -1604,6 +1677,7 @@ PartitionName=long Nodes=dev[9\-17] MaxTime=120 AllowGroups=admin ...@@ -1604,6 +1677,7 @@ PartitionName=long Nodes=dev[9\-17] MaxTime=120 AllowGroups=admin
.SH "COPYING" .SH "COPYING"
Copyright (C) 2002\-2007 The Regents of the University of California. Copyright (C) 2002\-2007 The Regents of the University of California.
Copyright (C) 2008 Lawrence Livermore National Security.
Produced at Lawrence Livermore National Laboratory (cf, DISCLAIMER). Produced at Lawrence Livermore National Laboratory (cf, DISCLAIMER).
UCRL\-CODE\-226842. UCRL\-CODE\-226842.
.LP .LP
...@@ -1623,8 +1697,8 @@ details. ...@@ -1623,8 +1697,8 @@ details.
/etc/slurm.conf /etc/slurm.conf
.SH "SEE ALSO" .SH "SEE ALSO"
.LP .LP
\fBbluegene.conf\fR(5), \fBbluegene.conf\fR(5), \fBgethostbyname\fR(3),
\fBgetrlimit\fR(2), \fBgetrlimit\fR(2), \fBgroup\fR(5), \fBhostname\fR(1),
\fBgethostbyname\fR(3), \fBgroup\fR(5), \fBhostname\fR(1), \fBscontrol\fR(1), \fBslurmctld\fR(8), \fBslurmd\fR(8),
\fBscontrol\fR(1), \fBslurmctld\fR(8), \fBslurmd\fR(8), \fBspank(8)\fR, \fBslurmdbd\fR(8), \fBslurmdbd.conf\fR(5), \fBspank(8)\fR,
\fBsyslog\fR(2), \fBwiki.conf\fR(5) \fBsyslog\fR(2), \fBwiki.conf\fR(5)
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment