Skip to content
Snippets Groups Projects
slurm.conf.5 116 KiB
Newer Older
.TH "slurm.conf" "5" "December 2010" "slurm.conf 2.3" "Slurm configuration file"
.SH "NAME"
slurm.conf \- Slurm configuration file
.SH "DESCRIPTION"
\fB/etc/slurm.conf\fP is an ASCII file which describes general SLURM
configuration information, the nodes to be managed, information about
how those nodes are grouped into partitions, and various scheduling
parameters associated with those partitions. This file should be
consistent across all nodes in the cluster.
You can use the \fBSLURM_CONF\fR environment variable to override the built\-in
location of this file. The SLURM daemons also allow you to override
both the built\-in and environment\-provided location using the "\-f"
Note the while SLURM daemons create log files and other files as needed,
it treats the lack of parent directories as a fatal error.
This prevents the daemons from running if critical file systems are
not mounted and will minimize the risk of cold\-starting (starting
without preserving jobs).
.LP
The contents of the file are case insensitive except for the names of nodes
and partitions. Any text following a "#" in the configuration file is treated
as a comment through the end of that line.
The size of each line in the file is limited to 1024 characters.
Changes to the configuration file take effect upon restart of
SLURM daemons, daemon receipt of the SIGHUP signal, or execution
of the command "scontrol reconfigure" unless otherwise noted.
If a line begins with the word "Include" followed by whitespace
and then a file name, that file will be included inline with the current
configuration file.
.LP
Note on file permissions:
.LP
The \fIslurm.conf\fR file must be readable by all users of SLURM, since it
is used by many of the SLURM commands.  Other files that are defined
in the \fIslurm.conf\fR file, such as log files and job accounting files,
may need to be created/owned by the "SlurmUser" uid to be successfully
accessed.  Use the "chown" and "chmod" commands to set the ownership
and permissions appropriately.
See the section \fBFILE AND DIRECTORY PERMISSIONS\fR for information
about the various files and directories used by SLURM.
The overall configuration parameters available include:
.TP
\fBAccountingStorageBackupHost\fR
The name of the backup machine hosting the accounting storage database.
If used with the accounting_storage/slurmdbd plugin, this is where the backup
slurmdbd would be running.
Only used for database type storage plugins, ignored otherwise.
Moe Jette's avatar
Moe Jette committed
.TP
\fBAccountingStorageEnforce\fR
This controls what level of enforcement you want on associations when new
jobs are submitted.  Valid options are any combination of \fIassociations\fR, \fIlimits\fR,
and \fIwckeys\fR, or \fIall\fR for all things.  If limits is set associations is implied.
If wckeys is set both limits and associations are implied along with
TrackWckey being set.  By enforcing Associations no new job is allowed to run
unless a corresponding association exists in the system.  If limits are
enforced users can be limited by association to how many nodes or how long
jobs can run or other limits.  With wckeys enforced jobs will not be scheduled
unless a valid workload characterization key is specified.  This value may not
be reset via "scontrol reconfig". It only takes effect upon restart
Moe Jette's avatar
Moe Jette committed
.TP
\fBAccountingStorageHost\fR
The name of the machine hosting the accounting storage database.
Moe Jette's avatar
Moe Jette committed
Only used for database type storage plugins, ignored otherwise.
Also see \fBDefaultStorageHost\fR.

.TP
\fBAccountingStorageLoc\fR
The fully qualified file name where accounting records are written
when the \fBAccountingStorageType\fR is "accounting_storage/filetxt"
or else the name of the database where accounting records are stored when the
\fBAccountingStorageType\fR is a database.
Moe Jette's avatar
Moe Jette committed
Also see \fBDefaultStorageLoc\fR.

.TP
\fBAccountingStoragePass\fR
The password used to gain access to the database to store the
accounting data.  Only used for database type storage plugins, ignored
Moe Jette's avatar
Moe Jette committed
otherwise.  In the case of SLURM DBD (Database Daemon) with MUNGE
authentication this can be configured to use a MUNGE daemon
specifically configured to provide authentication between clusters
Moe Jette's avatar
Moe Jette committed
while the default MUNGE daemon provides authentication within a
cluster.  In that case, \fBAccountingStoragePass\fR should specify the
Moe Jette's avatar
Moe Jette committed
named port to be used for communications with the alternate MUNGE
daemon (e.g.  "/var/run/munge/global.socket.2"). The default value is
NULL.  Also see \fBDefaultStoragePass\fR.
Moe Jette's avatar
Moe Jette committed

.TP
\fBAccountingStoragePort\fR
The listening port of the accounting storage database server.
Moe Jette's avatar
Moe Jette committed
Only used for database type storage plugins, ignored otherwise.
Also see \fBDefaultStoragePort\fR.

.TP
\fBAccountingStorageType\fR
The accounting storage mechanism type.  Acceptable values at
present include "accounting_storage/filetxt",
"accounting_storage/mysql", "accounting_storage/none",
"accounting_storage/pgsql", and "accounting_storage/slurmdbd".  The
"accounting_storage/filetxt" value indicates that accounting records
will be written to the file specified by the
\fBAccountingStorageLoc\fR parameter.  The "accounting_storage/mysql"
value indicates that accounting records will be written to a MySQL
database specified by the \fBAccountingStorageLoc\fR parameter.  The
"accounting_storage/pgsql" value indicates that accounting records
will be written to a PostgreSQL database specified by the
\fBAccountingStorageLoc\fR parameter.  The
"accounting_storage/slurmdbd" value indicates that accounting records
will be written to the SLURM DBD, which manages an underlying MySQL or
PostgreSQL database. See "man slurmdbd" for more information.  The
default value is "accounting_storage/none" and indicates that account
records are not maintained.  Note: the PostgreSQL plugin is not
complete and should not be used if wanting to use associations.  It
will however work with basic accounting of jobs and job steps.  If
interested in completing, please email slurm-dev@lists.llnl.gov.  Also
see \fBDefaultStorageType\fR.
Moe Jette's avatar
Moe Jette committed

.TP
\fBAccountingStorageUser\fR
The user account for accessing the accounting storage database.
Moe Jette's avatar
Moe Jette committed
Only used for database type storage plugins, ignored otherwise.
Also see \fBDefaultStorageUser\fR.

The authentication method for communications between SLURM
components.
Acceptable values at present include "auth/none", "auth/authd",
The default value is "auth/munge".
"auth/none" includes the UID in each communication, but it is not verified.
This may be fine for testing purposes, but
\fBdo not use "auth/none" if you desire any security\fR.
"auth/authd" indicates that Brett Chun's authd is to be used (see
"http://www.theether.org/authd/" for more information. Note that
Moe Jette's avatar
Moe Jette committed
authd is no longer actively supported).
"auth/munge" indicates that LLNL's MUNGE is to be used
(this is the best supported authentication mechanism for SLURM,
Moe Jette's avatar
Moe Jette committed
see "http://munge.googlecode.com/" for more information).
All SLURM daemons and commands must be terminated prior to changing
the value of \fBAuthType\fR and later restarted (SLURM jobs can be
The name that \fBBackupController\fR should be referred to in
establishing a communications path. This name will
be used as an argument to the gethostbyname() function for
identification. For example, "elx0000" might be used to designate
the Ethernet address for node "lx0000".
By default the \fBBackupAddr\fR will be identical in value to
\fBBackupController\fR
The name of the machine where SLURM control functions are to be
executed in the event that \fBControlMachine\fR fails. This node
may also be used as a compute server if so desired. It will come into service
as a controller only upon the failure of ControlMachine and will revert
to a "standby" mode when the ControlMachine becomes available once again.
This should be a node name without the full domain name.   I.e., the hostname
returned by the \fIgethostname()\fR function cut at the first dot (e.g. use
"tux001" rather than "tux001.my.com").
While not essential, it is recommended that you specify a backup controller.
See  the \fBRELOCATING CONTROLLERS\fR section if you change this.

The maximum time (in seconds) that a batch job is permitted for
launching before being considered missing and releasing the
allocation. The default value is 10 (seconds). Larger values may be
required if more time is required to execute the \fBProlog\fR, load
user environment variables (for Moab spawned jobs), or if the slurmd
daemon gets paged from memory.
If set to 1, the slurmd daemon will cache /etc/groups entries.
This can improve performance for highly parallel jobs if NIS servers
are used and unable to respond very quickly.
The default value is 0 to disable caching group data.
The system\-initiated checkpoint method to be used for user jobs.
The slurmctld daemon must be restarted for a change in \fBCheckpointType\fR
to take effect.
Supported values presently include:
.RS
.TP 18
\fBcheckpoint/aix\fR
for AIX systems only
.TP
\fBcheckpoint/blcr\fR
Berkeley Lab Checkpoint Restart (BLCR)
.TP
\fBcheckpoint/none\fR
no checkpoint support (default)
.TP
\fBcheckpoint/ompi\fR
OpenMPI (version 1.3 or higher)
.TP
\fBcheckpoint/xlch\fR
XLCH (requires that SlurmUser be root)
.RE
Moe Jette's avatar
Moe Jette committed
.TP
\fBClusterName\fR
The name by which this SLURM managed cluster is known in the
accounting database.  This is needed distinguish accounting records
when multiple clusters report to the same database.
.TP
\fBCompleteWait\fR
The time, in seconds, given for a job to remain in COMPLETING state
before any additional jobs are scheduled.
If set to zero, pending jobs will be started as soon as possible.
Since a COMPLETING job's resources are released for use by other
jobs as soon as the \fBEpilog\fR completes on each individual node,
this can result in very fragmented resource allocations.
To provide jobs with the minimum response time, a value of zero is
recommended (no waiting).
To minimize fragmentation of resources, a value equal to \fBKillWait\fR
plus two is recommended.
In that case, setting \fBKillWait\fR to a small value may be beneficial.
The default value of \fBCompleteWait\fR is zero seconds.
Name that \fBControlMachine\fR should be referred to in
establishing a communications path. This name will
be used as an argument to the gethostbyname() function for
identification. For example, "elx0000" might be used to designate
the Ethernet address for node "lx0000".
By default the \fBControlAddr\fR will be identical in value to
\fBControlMachine\fR
The short hostname of the machine where SLURM control functions are
executed (i.e. the name returned by the command "hostname \-s", use
"tux001" rather than "tux001.my.com").
This value must be specified.
In order to support some high availability architectures, multiple
hostnames may be listed with comma separators and one \fBControlAddr\fR
must be specified. The high availability system must insure that the
slurmctld daemon is running on only one of these hosts at a time.
See the \fBRELOCATING CONTROLLERS\fR section if you change this.
The cryptographic signature tool to be used in the creation of
job step credentials.
The slurmctld daemon must be restarted for a change in \fBCryptoType\fR
to take effect.
Moe Jette's avatar
Moe Jette committed
Acceptable values at present include "crypto/munge" and "crypto/openssl".
The default value is "crypto/munge".
Defines specific subsystems which should provide more detailed event logging.
Multiple subsystems can be specified with comma separators.
Valid subsystems available today (with more to come) include:
.TP 17
\fBBackfill\fR
Backfill scheduler details
.TP
\fBBGBlockAlgo\fR
BlueGene block selection, more details
.TP
\fBBGBlockPick\fR
BlueGene block selection for jobs
.TP
\fBBGBlockWires\fR
BlueGene block wiring (switch state details)
.TP
\fBCPU_Bind\fR
CPU binding details for jobs and steps
\fBFrontEnd\fR
Front end node details
.TP
\fBNO_CONF_HASH\fR
Do not log when the slurm.conf files differs between SLURM daemons
.TP
\fBReservation\fB
Advanced reservations
.TP
\fBSelectType\fR
Resource selection plugin
.TP
\fBSteps\fR
Slurmctld resource allocation for job steps
.TP
\fBTriggers\fR
Slurmctld triggers
.TP
\fBWiki\fR
Sched/wiki and wiki2 communications
Default real memory size available per allocated CPU in MegaBytes.
Used to avoid over\-subscribing memory and causing paging.
\fBDefMemPerCPU\fR would generally be used if individual processors
are allocated to jobs (\fBSelectType=select/cons_res\fR).
The default value is 0 (unlimited).
Also see \fBDefMemPerNode\fR and \fBMaxMemPerCPU\fR.
\fBDefMemPerCPU\fR and \fBDefMemPerNode\fR are mutually exclusive.
NOTE: Enforcement of memory limits currently requires enabling of
accounting, which samples memory use on a periodic basis (data need
not be stored, just collected).

.TP
\fBDefMemPerNode\fR
Default real memory size available per allocated node in MegaBytes.
Used to avoid over\-subscribing memory and causing paging.
\fBDefMemPerNode\fR would generally be used if whole nodes
are allocated to jobs (\fBSelectType=select/linear\fR) and
resources are shared (\fBShared=yes\fR or \fBShared=force\fR).
The default value is 0 (unlimited).
Also see \fBDefMemPerCPU\fR and \fBMaxMemPerNode\fR.
\fBDefMemPerCPU\fR and \fBDefMemPerNode\fR are mutually exclusive.
NOTE: Enforcement of memory limits currently requires enabling of
accounting, which samples memory use on a periodic basis (data need
not be stored, just collected).
Moe Jette's avatar
Moe Jette committed
.TP
\fBDefaultStorageHost\fR
The default name of the machine hosting the accounting storage and
job completion databases.
Only used for database type storage plugins and when the
\fBAccountingStorageHost\fR and \fBJobCompHost\fR have not been
defined.
Moe Jette's avatar
Moe Jette committed

.TP
\fBDefaultStorageLoc\fR
The fully qualified file name where accounting records and/or job
completion records are written when the \fBDefaultStorageType\fR is
"filetxt" or the name of the database where accounting records and/or job
completion records are stored when the \fBDefaultStorageType\fR is a
database.
Moe Jette's avatar
Moe Jette committed
Also see \fBAccountingStorageLoc\fR and \fBJobCompLoc\fR.

.TP
\fBDefaultStoragePass\fR
The password used to gain access to the database to store the
Moe Jette's avatar
Moe Jette committed
accounting and job completion data.
Only used for database type storage plugins, ignored otherwise.
Also see \fBAccountingStoragePass\fR and \fBJobCompPass\fR.

.TP
\fBDefaultStoragePort\fR
The listening port of the accounting storage and/or job completion
database server.
Moe Jette's avatar
Moe Jette committed
Only used for database type storage plugins, ignored otherwise.
Also see \fBAccountingStoragePort\fR and \fBJobCompPort\fR.

.TP
\fBDefaultStorageType\fR
The accounting and job completion storage mechanism type.  Acceptable
values at present include "filetxt", "mysql", "none", "pgsql", and
"slurmdbd".  The value "filetxt" indicates that records will be
written to a file.  The value "mysql" indicates that accounting
records will be written to a mysql database.  The default value is
"none", which means that records are not maintained.  The value
"pgsql" indicates that records will be written to a PostgreSQL
database.  The value "slurmdbd" indicates that records will be written
to the SLURM DBD, which maintains its own database. See "man slurmdbd"
for more information.
Also see \fBAccountingStorageType\fR and \fBJobCompType\fR.
Moe Jette's avatar
Moe Jette committed

.TP
\fBDefaultStorageUser\fR
The user account for accessing the accounting storage and/or job
completion database.
Moe Jette's avatar
Moe Jette committed
Only used for database type storage plugins, ignored otherwise.
Also see \fBAccountingStorageUser\fR and \fBJobCompUser\fR.

If set to "YES" then user root will be prevented from running any jobs.
The default value is "NO", meaning user root will be able to execute jobs.
\fBDisableRootJobs\fR may also be set by partition.
.TP
\fBEnforcePartLimits\fR
If set to "YES" then jobs which exceed a partition's size and/or time limits
will be rejected at submission time. If set to "NO" then the job will be
accepted and remain queued until the partition limits are altered.
The default value is "NO".

\fBEpilog\fR
Fully qualified pathname of a script to execute as user root on every
node when a user's job completes (e.g. "/usr/local/slurm/epilog"). This may
be used to purge files, disable user login, etc.
By default there is no epilog.
See \fBProlog and Epilog Scripts\fR for more information.
.TP
\fBEpilogMsgTime\fR
The number of microseconds the the slurmctld daemon requires to process
an epilog completion message from the slurmd dameons. This parameter can
be used to prevent a burst of epilog completion messages from being sent
at the same time which should help prevent lost messages and improve
throughput for large jobs.
The default value is 2000 microseconds.
For a 1000 node job, this spreads the epilog completion messages out over
two seconds.

Fully qualified pathname of a program for the slurmctld to execute
upon termination of a job allocation (e.g.
"/usr/local/slurm/epilog_controller").
The program executes as SlurmUser, which gives it permission to drain
nodes and requeue the job if a failure occurs or cancel the job if appropriate.
The program can be used to reboot nodes or perform other work to prepare
resources for use.
See \fBProlog and Epilog Scripts\fR for more information.
.TP
\fBFastSchedule\fR
Controls how a node's configuration specifications in slurm.conf are used.
If the number of node configuration entries in the configuration file
is significantly lower than the number of nodes, setting FastSchedule to
1 will permit much faster scheduling decisions to be made.
(The scheduler can just check the values in a few configuration records
instead of possibly thousands of node records.)
Note that on systems with hyper\-threading, the processor count
reported by the node will be twice the actual processor count.
Consider which value you want to be used for scheduling purposes.
.RS
.TP 5
\fB1\fR (default)
Consider the configuration of each node to be that specified in the
slurm.conf configuration file and any node with less than the
configured resources will be set DOWN.
Base scheduling decisions upon the actual configuration of each individual
node except that the node's processor count in SLURM's configuration must
match the actual hardware configuration if \fBSchedulerType=sched/gang\fR
or \fBSelectType=select/cons_res\fR are configured (both of those plugins
maintain resource allocation information using bitmaps for the cores in the
system and must remain static, while the node's memory and disk space can
Consider the configuration of each node to be that specified in the
slurm.conf configuration file and any node with less than the
configured resources will \fBnot\fR be set DOWN.
This can be useful for testing purposes.
.RE

.TP
\fBFirstJobId\fR
The job id to be used for the first submitted to SLURM without a
specific requested value. Job id values generated will incremented by 1
for each subsequent job. This may be used to provide a meta\-scheduler
with a job id space which is disjoint from the interactive jobs.
The default value is 1.
.TP
\fBGetEnvTimeout\fR
Used for Moab scheduled jobs only. Controls how long job should wait
in seconds for loading the user's environment before attempting to
load it from a cache file. Applies when the srun or sbatch
\fI\-\-get\-user\-env\fR option is used. If set to 0 then always load
the user's environment from the cache file.
The default value is 2 seconds.
\fBGresTypes\fR
A comma delimited list of generic resources to be managed.
These generic resources may have an associated plugin available to provide
additional functionality.
No generic resources are managed by default.
Insure this parameter is consistent across all nodes in the cluster for
proper operation.
The slurmctld daemon must be restarted for changes to this parameter to become
effective.
.TP
\fBGroupUpdateForce\fR
If set to a non\-zero value, then information about which users are members
of groups allowed to use a partition will be updated periodically, even when
there have been no changes to the /etc/group file.
Otherwise group member information will be updated periodically only after the
/etc/group file is updated
The default vaue is 0.
Also see the \fBGroupUpdateTime\fR parameter.

.TP
\fBGroupUpdateTime\fR
Controls how frequently information about which users are members of groups
allowed to use a partition will be updated.
The time interval is given in seconds with a default value of 600 seconds and
a maximum value of 4095 seconds.
A value of zero will prevent periodic updating of group membership information.
Also see the \fBGroupUpdateForce\fR parameter.

.TP
\fBHealthCheckInterval\fR
The interval in seconds between executions of \fBHealthCheckProgram\fR.
The default value is zero, which disables execution.

.TP
\fBHealthCheckProgram\fR
Fully qualified pathname of a script to execute as user root periodically
on all compute nodes that are not in the NOT_RESPONDING state. This may be
used to verify the node is fully operational and DRAIN the node or send email
if a problem is detected.
Any action to be taken must be explicitly performed by the program
(e.g. execute
"scontrol update NodeName=foo State=drain Reason=tmp_file_system_full"
The interval is controlled using the \fBHealthCheckInterval\fR parameter.
Note that the \fBHealthCheckProgram\fR will be executed at the same time
Moe Jette's avatar
Moe Jette committed
on all nodes to minimize its impact upon parallel programs.
This program is will be killed if it does not terminate normally within
60 seconds.
By default, no program will be executed.

.TP
\fBInactiveLimit\fR
The interval, in seconds, after which a non\-responsive job allocation
command (e.g. \fBsrun\fR or \fBsalloc\fR) will result in the job being
terminated. If the node on which the command is executed fails or the
command abnormally terminates, this will terminate its job allocation.
This option has no effect upon batch jobs.
When setting a value, take into consideration that a debugger using \fBsrun\fR
to launch an application may leave the \fBsrun\fR command in a stopped state
for extended periods of time.
This limit is ignored for jobs running in partitions with the
\fBRootOnly\fR flag set (the scheduler running as root will be
The default value is unlimited (zero) and may not exceed 65533 seconds.
The job accounting mechanism type.
Acceptable values at present include "jobacct_gather/aix" (for AIX operating
system), "jobacct_gather/linux" (for Linux operating system) and "jobacct_gather/none"
(no accounting data collected).
The default value is "jobacct_gather/none".
In order to use the \fBsstat\fR tool, "jobacct_gather/aix" or "jobacct_gather/linux"
must be configured.
The job accounting sampling interval.
For jobacct_gather/none this parameter is ignored.
For  jobacct_gather/aix and jobacct_gather/linux the parameter is a number is
seconds between sampling job state.
The default value is 30 seconds.
A value of zero disables real the periodic job sampling and provides accounting
information only on job termination (reducing SLURM interference with the job).
Smaller (non\-zero) values have a greater impact upon job performance, but
a value of 30 seconds is not likely to be noticeable for applications having
less than 10,000 tasks.
Users can override this value on a per job basis using the \fB\-\-acctg\-freq\fR
option when submitting the job.
Specifies the default directory for storing or reading job checkpoint
information. The data stored here is only a few thousand bytes per job
and includes information needed to resubmit the job request, not job's
memory image. The directory must be readable and writable by
\fBSlurmUser\fR, but not writable by regular users. The job memory images
may be in a different location as specified by \fB\-\-checkpoint\-dir\fR
option at job submit time or scontrol's \fBImageDir\fR option.
Moe Jette's avatar
Moe Jette committed
\fBJobCompHost\fR
The name of the machine hosting the job completion database.
Only used for database type storage plugins, ignored otherwise.
Moe Jette's avatar
Moe Jette committed
Also see \fBDefaultStorageHost\fR.
Moe Jette's avatar
Moe Jette committed
\fBJobCompLoc\fR
The fully qualified file name where job completion records are written
when the \fBJobCompType\fR is "jobcomp/filetxt" or the database where
job completion records are stored when the \fBJobCompType\fR is a
database.
Moe Jette's avatar
Moe Jette committed
Also see \fBDefaultStorageLoc\fR.
Moe Jette's avatar
Moe Jette committed
\fBJobCompPass\fR
The password used to gain access to the database to store the job
completion data.
Only used for database type storage plugins, ignored otherwise.
Moe Jette's avatar
Moe Jette committed
Also see \fBDefaultStoragePass\fR.
Moe Jette's avatar
Moe Jette committed
\fBJobCompPort\fR
The listening port of the job completion database server.
Only used for database type storage plugins, ignored otherwise.
Moe Jette's avatar
Moe Jette committed
Also see \fBDefaultStoragePort\fR.
The job completion logging mechanism type.
Acceptable values at present include "jobcomp/none", "jobcomp/filetxt",
"jobcomp/mysql", "jobcomp/pgsql", and "jobcomp/script"".
The default value is "jobcomp/none", which means that upon job completion
the record of the job is purged from the system.  If using the accounting
infrastructure this plugin may not be of interest since the information
The value "jobcomp/filetxt" indicates that a record of the job should be
written to a text file specified by the \fBJobCompLoc\fR parameter.
The value "jobcomp/mysql" indicates that a record of the job should be
written to a mysql database specified by the \fBJobCompLoc\fR parameter.
The value "jobcomp/pgsql" indicates that a record of the job should be
written to a PostgreSQL database specified by the \fBJobCompLoc\fR parameter.
The value "jobcomp/script" indicates that a script specified by the
\fBJobCompLoc\fR parameter is to be executed with environment variables
The user account for accessing the job completion database.
Only used for database type storage plugins, ignored otherwise.
Moe Jette's avatar
Moe Jette committed
Also see \fBDefaultStorageUser\fR.
\fBJobCredentialPrivateKey\fR
Fully qualified pathname of a file containing a private key used for
authentication by SLURM daemons.
This parameter is ignored if \fBCryptoType=crypto/munge\fR.
.TP
\fBJobCredentialPublicCertificate\fR
Fully qualified pathname of a file containing a public key used for
authentication by SLURM daemons.
This parameter is ignored if \fBCryptoType=crypto/munge\fR.
This option controls what to do if a job's output or error file
exist when the job is started.
If \fBJobFileAppend\fR is set to a value of 1, then append to
the existing file.
By default, any existing file is truncated.

This option controls what to do by default after a node failure.
If \fBJobRequeue\fR is set to a value of 1, then any batch job running
on the failed node will be requeued for execution on different nodes.
If \fBJobRequeue\fR is set to a value of 0, then any job running
on the failed node will be terminated.
Use the \fBsbatch\fR \fI\-\-no\-requeue\fR or \fI\-\-requeue\fR
option to change the default behavior for individual jobs.
The default value is 1.

.TP
\fBJobSubmitPlugins\fR
A comma delimited list of job submission plugins to be used.
The specified plugins will be executed in the order listed.
These are intended to be site\-specific plugins which can be used to set
default job parameters and/or logging events.
Danny Auble's avatar
Danny Auble committed
Sample plugins available in the distribution include "defaults", "logging",
"lua", and "partition".
See the SLURM code in "src/plugins/job_submit" and modify the code to satisfy
your needs.
No job submission plugins are used by default.

\fBKillOnBadExit\fR
If set to 1, the job will be terminated immediately when one of the
processes is crashed or aborted. With default value of 0, if one of
the processes is crashed or aborted the other processes will continue
\fBKillWait\fR
The interval, in seconds, given to a job's processes between the
SIGTERM and SIGKILL signals upon reaching its time limit.
If the job fails to terminate gracefully in the interval specified,
it will be forcibly terminated.
The default value is 30 seconds.
The value may not exceed 65533.
Specification of licenses (or other resources available on all
nodes of the cluster) which can be allocated to jobs.
License names can optionally be followed by an asterisk
and count with a default count of one.
Multiple license names should be comma separated (e.g.
"Licenses=foo*4,bar").
Note that SLURM prevents jobs from being scheduled if their
required license specification is not available.
SLURM does not prevent jobs from using licenses that are
not explicitly listed in the job submission specification.
.TP
\fBMailProg\fR
Fully qualified pathname to the program used to send email per user request.
The default value is "/bin/mail".

The maximum number of jobs SLURM can have in its active database
at one time. Set the values of \fBMaxJobCount\fR and \fBMinJobAge\fR
to insure the slurmctld daemon does not exhaust its memory or other
resources. Once this limit is reached, requests to submit additional
jobs will fail. The default value is 10000 jobs. This value may not
be reset via "scontrol reconfig". It only takes effect upon restart
of the slurmctld daemon.
Maximum real memory size available per allocated CPU in MegaBytes.
Used to avoid over\-subscribing memory and causing paging.
\fBMaxMemPerCPU\fR would generally be used if individual processors
are allocated to jobs (\fBSelectType=select/cons_res\fR).
The default value is 0 (unlimited).
Also see \fBDefMemPerCPU\fR and \fBMaxMemPerNode\fR.
\fBMaxMemPerCPU\fR and \fBMaxMemPerNode\fR are mutually exclusive.
NOTE: Enforcement of memory limits currently requires enabling of
accounting, which samples memory use on a periodic basis (data need
not be stored, just collected).

.TP
\fBMaxMemPerNode\fR
Maximum real memory size available per allocated node in MegaBytes.
Used to avoid over\-subscribing memory and causing paging.
\fBMaxMemPerNode\fR would generally be used if whole nodes
are allocated to jobs (\fBSelectType=select/linear\fR) and
resources are shared (\fBShared=yes\fR or \fBShared=force\fR).
The default value is 0 (unlimited).
Also see \fBDefMemPerNode\fR and \fBMaxMemPerCPU\fR.
\fBMaxMemPerCPU\fR and \fBMaxMemPerNode\fR are mutually exclusive.
NOTE: Enforcement of memory limits currently requires enabling of
accounting, which samples memory use on a periodic basis (data need
not be stored, just collected).
.TP
\fBMaxTasksPerNode\fR
Maximum number of tasks SLURM will allow a job step to spawn
on a single node. The default \fBMaxTasksPerNode\fR is 128.

.TP
\fBMessageTimeout\fR
Time permitted for a round\-trip communication to complete
in seconds. Default value is 10 seconds. For systems with
shared nodes, the slurmd daemon could be paged out and
The minimum age of a completed job before its record is purged from
SLURM's active database. Set the values of \fBMaxJobCount\fR and
\fBMinJobAge\fR to insure the slurmctld daemon does not exhaust
its memory or other resources. The default value is 300 seconds.
A value of zero prevents any job record purging.
Identifies the default type of MPI to be used.
Srun may override this configuration parameter in any case.
Currently supported versions include:
\fBlam\fR,
\fBmpich1_p4\fR,
\fBmpich1_shmem\fR,
\fBmpichgm\fR,
\fBmvapich\fR,
\fBnone\fR (default, which works for many other versions of MPI) and
\fBopenmpi\fR.
More information about MPI use is available here
<https://computing.llnl.gov/linux/slurm/mpi_guide.html>.
MPI parameters.
Used to identify ports used by OpenMPI only and the input format is
"ports=12000\-12999" to identify a range of communication ports to be used.
Number of minutes by which a job can exceed its time limit before
being canceled.
The configured job time limit is treated as a \fIsoft\fR limit.
Adding \fBOverTimeLimit\fR to the \fIsoft\fR limit provides a \fIhard\fR
limit, at which point the job is canceled.
This is particularly useful for backfill scheduling, which bases upon
each job's soft time limit.
The default value is zero.
Man not exceed exceed 65533 minutes.
A value of "UNLIMITED" is also supported.
Identifies the places in which to look for SLURM plugins.
This is a colon\-separated list of directories, like the PATH
environment variable.
The default value is "/usr/local/lib/slurm".
Mark Grondona's avatar
Mark Grondona committed
Location of the config file for SLURM stackable plugins that use
the Stackable Plugin Architecture for Node job (K)control (SPANK).
This provides support for a highly configurable set of plugins to
be called before and/or after execution of each task spawned as
part of a user's job step.  Default location is "plugstack.conf"
in the same directory as the system slurm.conf. For more information
on SPANK plugins, see the \fBspank\fR(8) manual.
Enables gang scheduling and/or controls the mechanism used to preempt
jobs.  When the \fBPreemptType\fR parameter is set to enable
preemption, the \fBPreemptMode\fR selects the mechanism used to
preempt the lower priority jobs.  The \fBGANG\fR option is used to
enable gang scheduling independent of whether preemption is enabled
(the \fBPreemptType\fR setting).  The \fBGANG\fR option can be
specified in addition to a \fBPreemptMode\fR setting with the two
options comma separated.  The \fBSUSPEND\fR option requires that gang
scheduling be enabled (i.e, "PreemptMode=SUSPEND,GANG").
.RS
.TP 12
\fBOFF\fR
is the default value and disables job preemption and gang scheduling.
This is the only option compatible with \fBSchedulerType=sched/wiki\fR
or \fBSchedulerType=sched/wiki2\fR (used by Maui and Moab respectively,
which provide their own job preemption functionality).
.TP
\fBCANCEL\fR
always cancel the job.
.TP
\fBCHECKPOINT\fR
preempts jobs by checkpointing them (if possible) or canceling them.
\fBGANG\fR
enables gang scheduling (time slicing) of jobs in the same partition.
.TP
\fBREQUEUE\fR
preempts jobs by requeuing them (if possible) or canceling them.
.TP
\fBSUSPEND\fR
preempts jobs by suspending them.
A suspended job will resume execution once the high priority job
The \fBSUSPEND\fR may only be used with the \fBGANG\fR option
(the gang scheduler module performs the job resume operation).
This specifies the plugin used to identify which jobs can be
preempted in order to start a pending job.
.RS
.TP
\fBpreempt/none\fR
Job preemption is disabled.
This is the default.
.TP
\fBpreempt/partition_prio\fR
Job preemption is based upon partition priority.
Jobs in higher priority partitions (queues) may preempt jobs from lower
priority partitions.
.TP
\fBpreempt/qos\fR
Job preemption rules are specified by Quality Of Service (QOS) specifications
in the SLURM database a database.
This is not compatible with \fBPreemptMode=OFF\fR or \fBPreemptMode=SUSPEND\fR
(i.e. preempted jobs must be removed from the resources).
.TP
\fBPriorityDecayHalfLife\fR
This controls how long prior resource use is considered in determining
how over\- or under\-serviced an association is (user, bank account and
cluster) in determining job priority.  If set to 0 no decay will be applied.
This is helpful if you want to enforce hard time limits per association.  If
set to 0 \fBPriorityUsageResetPeriod\fR must be set to some interval.
Applicable only if PriorityType=priority/multifactor.
The unit is a time string (i.e. min, hr:min:00, days\-hr:min:00,
Danny Auble's avatar
Danny Auble committed
or days\-hr).  The default value is 7\-0 (7 days).
.TP
\fBPriorityCalcPeriod\fR
The period of time in minutes in which the half-life decay will be
re-calculated.
Applicable only if PriorityType=priority/multifactor.
The default value is 5 (minutes).

.TP
\fBPriorityFavorSmall\fR
Moe Jette's avatar
Moe Jette committed
Specifies that small jobs should be given preferential scheduling priority.
Applicable only if PriorityType=priority/multifactor.
Supported values are "YES" and "NO".  The default value is "NO".

.TP
\fBPriorityMaxAge\fR
Specifies the job age which will be given the maximum age factor in computing
priority. For example, a value of 30 minutes would result in all jobs over
30 minutes old would get the same age\-based priority.
Applicable only if PriorityType=priority/multifactor.
The unit is a time string (i.e. min, hr:min:00, days\-hr:min:00,
Danny Auble's avatar
Danny Auble committed
or days\-hr).  The default value is 7\-0 (7 days).
At this interval the usage of associations will be reset to 0.  This is used
if you want to enforce hard limits of time usage per association.  If
PriorityDecayHalfLife is set to be 0 no decay will happen and this is the
only way to reset the usage accumulated by running jobs.  By default this is
turned off and it is advised to use the PriorityDecayHalfLife option to avoid
not having anything running on your cluster, but if your schema is set up to
only allow certain amounts of time on your system this is the way to do it.
Applicable only if PriorityType=priority/multifactor.
.RS
.TP 12
\fBNONE\fR
Never clear historic usage. The default value.
.TP
\fBNOW\fR
Clear the historic usage now.
Executed at startup and reconfiguration time.
.TP
\fBDAILY\fR
Cleared every day at midnight.
.TP
\fBWEEKLY\fR
Cleared every week on Sunday at time 00:00.
.TP
\fBMONTHLY\fR
Cleared on the first day of each month at time 00:00.
.TP
\fBQUARTERLY\fR
Cleared on the first day of each quarter at time 00:00.
.TP
\fBYEARLY\fR
Cleared on the first day of each year at time 00:00.
.RE

.TP
\fBPriorityType\fR
This specifies the plugin to be used in establishing a job's scheduling
priority. Supported values are "priority/basic" (jobs are prioritized
by order of arrival, also suitable for sched/wiki and sched/wiki2) and
"priority/multifactor" (jobs are prioritized based upon size, age,
fair\-share of allocation, etc).
The default value is "priority/basic".

.TP
\fBPriorityWeightAge\fR
An integer value that sets the degree to which the queue wait time
component contributes to the job's priority.
Applicable only if PriorityType=priority/multifactor.
The default value is 0.

.TP
\fBPriorityWeightFairshare\fR
An integer value that sets the degree to which the fair-share
component contributes to the job's priority.
Applicable only if PriorityType=priority/multifactor.
The default value is 0.

.TP
\fBPriorityWeightJobSize\fR
An integer value that sets the degree to which the job size
component contributes to the job's priority.
Applicable only if PriorityType=priority/multifactor.
The default value is 0.

.TP
\fBPriorityWeightPartition\fR