diff --git a/doc/man/man5/slurm.conf.5 b/doc/man/man5/slurm.conf.5 new file mode 100644 index 0000000000000000000000000000000000000000..ec892436b85da7cd9886c2a1328cc95d7aaf078c --- /dev/null +++ b/doc/man/man5/slurm.conf.5 @@ -0,0 +1,394 @@ +.TH "slurm.conf" "5" "October 2002" "Morris Jette" "Slurm configuration file" +.SH "NAME" +slurm.conf \- Slurm configuration file +.SH "DESCRIPTION" +\fB/etc/slurm.conf\fP is an ASCI file which describes general Slurm configuration +information, the nodes to be managed, information about how those nodes are +grouped into partitions, and various scheduling parameters associated with +those partitions. The file location can be modified at system build time using +the DEFAULT_SLURM_CONF parameter. +.LP +The contents of the file are case insensitive except for the names of nodes +and partitions. Any text following a "#" in the configuration file is treated +as a comment through the end of that line. +The size of each line in the file is limited to 1024 characters. + +.LP +The overall configuration parameters available include: +.TP +\fBBackupController\fR +The name of the machine where SLURM control functions are to be +executed in the event that ControlMachine fails. This node +may also be used as a compute server if so desired. It will come into service +as a controller only upon the failure of ControlMachine and will revert +to a "standby" mode when the ControlMachine becomes available once again. +This should be a node name without the full domain name (e.g. "lx0002"). +While not essential, it is highly recommended that you specify a backup controller. +.TP +\fBControlMachine\fR +The name of the machine where SLURM control functions are executed. +This should be a node name without the full domain name (e.g. "lx0001"). +This value must be specified. +.TP +\fBEpilog\fR +Fully qualified pathname of a program to execute as user root on every +node when a user's job completes (e.g. "/usr/local/slurm/epilog"). This may +be used to purge files, disable user login, etc. By default there is no epilog. +.TP +\fBFastSchedule\fR +If set to 1, then consider the configuration of each node to be that +specified in the configuration file. If set to 0, then base scheduling +decisions upon the actual configuration of each individual node. If the +number of node configuration entries in the configuration file is signficantly +lower than the number of nodes, setting FastSchedule to 1 will permit +much faster scheduling decisions to be made. The default value is 1. +.TP +\fBFirstJobId\fR +The job id to be used for the first submitted to SLURM without a +specific requested value. Job id values generated will incremented by 1 +for each subsequent job. This may be used to provide a meta-scheduler +with a job id space which is disjoint from the interactive jobs. +The default value is 1. +.TP +\fBHashBase\fR +If the node names include a sequence number, this value defines the +base to be used in building a hash table based upon node name. Value of 8 +and 10 are recognized for octal and decimal sequence numbers respectively. +The value of zero is also recognized for node names lacking a sequence number. +The use of node names containing a numeric suffix will provide faster +operation for larger clusters. The default value is 10. +.TP +\fBHeartbeatInterval\fR +The interval, in seconds, at which the SLURM controller tests the +status of other daemons. The default value is 30 seconds. +.TP +\fBInactiveLimit\fR +The interval, in seconds, a job is permitted to be inactive (with +no active job steps) before it is terminated. This prevents forgotten +jobs to be purged in a timely fashion without waiting for their time +limit to be reached. The default value is unlimited (zero). +.TP +\fBJobCredentialPrivateKey\fR +Fully qualified pathname of a file containing a private key used for +authentication by Slurm daemons. +.TP +\fBJobCredentialPublicCertificate\fR +Fully qualified pathname of a file containing a public key used for +authentication by Slurm daemons. +.TP +\fBKillWait\fR +The interval, in seconds, given to a job's processes between the +SIGTERM and SIGKILL signals upon reaching its time limit. +If the job fails to terminate gracefully +in the interval specified, it will be forcably terminated. The default +value is 30 seconds. +.TP +\fBPrioritize\fR +Fully qualified pathname of a program to execute in order to establish +the initial priority of a newly submitted job. By default there is no +prioritization program and each job gets a priority lower than that of +any existing jobs. +.TP +\fBProlog\fR +Fully qualified pathname of a program to execute as user root on every +node when a user's job begins execution (e.g. "/usr/local/slurm/prolog"). +This may be used to purge files, enable user login, etc. By default there +is no prolog. +.TP +\fBReturnToService\fR +If set to 1, then a DOWN node will become available for use +upon registration. The default value is 0, which +means that a node will remain DOWN until a system administrator explicitly +makes it available for use. +.TP +\fBSlurmctldPort\fR +The port number that the SLURM controller, \fBslurmctld\fR, listens +to for work. The default value is SLURMCTLD_PORT as established at system +build time. +.TP +\fBSlurmctldTimeout\fR +The interval, in seconds, that the backup controller waits for the +primary controller to respond before assuming control. The default value +is 300 seconds. +.TP +\fBSlurmdPort\fR +The port number that the SLURM compute node daemon, \fBslurmd\fR, listens +to for work. The default value is SLURMD_PORT as established at system +build time. +.TP +\fBSlurmdTimeout\fR +The interval, in seconds, that the SLURM controller waits for \fBslurmd\fR +to respond before configuring that node's state to DOWN. The default value +is 300 seconds. +.TP +\fBStateSaveLocation\fR +Fully qualified pathname of a directory into which the slurm controller, +\fBslurmctld\fR, saves its state (e.g. "/usr/local/slurm/checkpoint"). SLURM +state will saved here to recover from system failures. The default value is "/tmp". +If any slurm daemons terminate abnormally, their core files will also be written +into this directory. +.TP +\fBTmpFS\fR +Fully qualified pathname of the file system available to user jobs for +temporary storage. This parameter is used in establishing a node's \fBTmpDisk\fR +space. The default value is "/tmp". +.LP +The configuration of nodes (or machines) to be managed by Slurm is +also specified in \fB/etc/slurm.conf\fR. +Only the NodeName must be supplied in the configuration file. +All other node configuration information is optional. +It is advisable to establish baseline node configurations, +especially if the cluster is heterogeneous. +Nodes which register to the system with less than the configured resources +(e.g. too little memory), will be placed in the "DOWN" state to +avoid scheduling jobs on them. +Establishing baseline configurations will also speed SLURM's +scheduling process by permitting it to compare job requirements +against these (relatively few) configuration parameters and +possibly avoid having to check job requirements +against every individual node's configuration. +The resources checked at node registration time are: Procs, +RealMemory and TmpDisk. +While baseline values for each of these can be established +in the configuration file, the actual values upon node +registration are recorded and these actual values may be +used for scheduling purposes (depending upon the value of +\fBFastSchedule\fR in the configuration file. +.LP +Default values can be specified with a record in which +"NodeName" is "DEFAULT". +The default entry values will apply only to lines following it in the +configuration file and the default values can be reset multiple times +in the configuration file with multiple entries where "NodeName=DEFAULT". +The "NodeName=" specification must be placed on every line +describing the configuration of nodes. +In fact, it is generally possible and desirable to define the +configurations of all nodes in only a few lines. +This convention permits significant optimization in the scheduling +of larger clusters. +In order to support the concept of jobs requiring consecutive nodes +on some architectures, +node specifications should be place in this file in consecutive order. +The node configuration specifies the following information: +.TP +\fBNodeName\fR +Name of a node as returned by hostname (e.g. "lx0012"). +A simple regular expression may optionally +be used to specify ranges +of nodes to avoid building a configuration file with large numbers +of entries. The regular expression can contain one +pair of square brackets with a sequence of comma separated +numbers and/or ranges of numbers separated by a "-" +(e.g. "linux[0-64,128]", or "lx[15,18,32-33]"). +If the NodeName is "DEFAULT", the values specified +with that record will apply to subsequent node specifications +unless explicitly set to other values in that node record or +replaced with a different set of default values. +For architectures in which the node order is significant, +nodes will be considered consecutive in the order defined. +For example, if the configuration for NodeName=charlie immediately +follows the configuration for NodeName=baker they will be +considered adjacent in the computer. +.TP +\fBFeature\fR +A comma delimited list of arbitrary strings indicative of some +characteristic associated with the node. +There is no value associated with a feature at this time, a node +either has a feature or it does not. +If desired a feature may contain a numeric component indicating, +for example, processor speed. +By default a node has no features. +.TP +\fBRealMemory\fR +Size of real memory on the node in MegaBytes (e.g. "2048"). +The default value is 1. +.TP +\fBProcs\fR +Number of processors on the node (e.g. "2"). +The default value is 1. +.TP +\fBState\fR +State of the node with respect to the initiation of user jobs. +Acceptable values are "BUSY", "DOWN", "DRAINED", "DRAINING", "IDLE", +and "UNKNOWN". "BUSY" indicates the node has been allocated work +and should not be used in the configuration file. +"DOWN" indicates the node failed and is unavailable to be allocated work. +"DRAINED" indicates the node was configured unavailable to be +allocated work and is presently not performing any work. +"DRAINING" indicates the node is unavailable to be allocated new +work, but is completing the processing of a job. +"IDLE" indicates the node available to be allocated work, but +has none at present +"UNKNOWN" indicates the node's state is undefined, but will be +established when the \fBslurmd\fR daemon on that node registers. +The default value is "UNKNOWN". +.TP +\fBTmpDisk\fR +Total size of temporary disk storage in \fBTmpFS\fR in MegaBytes +(e.g. "16384"). \fBTmpFS\fR (for "Temporary File System") +identifies the location which jobs should use for temporary storage. +Note this does not indicate the amount of free +space available to the user on the node, only the total file +system size. The system administration should insure this file +system is purged as needed so that user jobs have access to +most of this space. +The Prolog and/or Epilog programs (specified in the configuration file) +might be used to insure the file system is kept clean. +The default value is 1. +.TP +\fBWeight\fR +The priority of the node for scheduling purposes. +All things being equal, jobs will be allocated the nodes with +the lowest weight which satisfies their requirements. +For example, a heterogeneous collection of nodes might +be placed into a single partition for greater system +utilization, responsiveness and capability. It would be +preferable to allocate smaller memory nodes rather than larger +memory nodes if either will satisfy a job's requirements. +The units of weight are arbitrary, but larger weights +should be assigned to nodes with more processors, memory, +disk space, higher processor speed, etc. +Weight is an integer value with a default value of 1. +.LP +The partition configuration permits you to establish different job +limits or access controls for various groups (or partitions) of nodes. +Nodes may be in only one partition. Jobs are allocated resources +within a single partition. The partition configuration +file contains the following information: +.TP +\fBAllowGroups\fR +Comma separated list of group IDs which may use the partition. +If at least one group associated with the user submitting the +job is in AllowGroups, he will be permitted to use this partition. +The default value is "ALL". +.TP +\fBDefault\fR +If this keyword is set, jobs submitted without a partition +specification will utilize this partition. +Possible values are "YES" and "NO". +The default value is "NO". +.TP +\fBRootOnly\fR +Specifies if only user ID zero (or user <i>root</i> may +initiate jobs in this partition. +Possible values are "YES" and "NO". +The default value is "NO". +.TP +\fBMaxNodes\fR +Maximum count of nodes which may be allocated to any single job, +The default value is "UNLIMITED", which is represented internally as -1. +.TP +\fBMaxTime\fR +Maximum wall-time limit for any job in minutes. The default +value is "UNLIMITED", which is represented internally as -1. +.TP +\fBNodes\fR +Comma separated list of nodes which are associated with this +partition. Node names may be specified using the +regular expression syntax described above. A blank list of nodes +(i.e. "Nodes= ") can be used if one wants a partition to exist, +but have no resources (possibly on a temporary basis). +.TP +\fBPartitionName\fR +Name by which the partition may be referenced (e.g. "Interactive"). +This name can be specified by users when submitting jobs. +.TP +\fBShared\fR +Ability of the partition to execute more than one job at a +time on each node. Shared nodes will offer unpredictable performance +for application programs, but can provide higher system utilization +and responsiveness than otherwise possible. +Possible values are "FORCE", "YES", and "NO". +The default value is "NO". +.TP +\fBState\fR +State of partition or availability for use. Possible values +are "UP" or "DOWN". The default value is "UP". +.SH "EXAMPLE" +.eo +# +.br +# Sample /etc/slurm.conf for dev[0-25].llnl.gov +.br +# Author: John Doe +.br +# Date: 11/06/2001 +.br +# +.br +ControlMachine=dev0 BackupController=dev1 +.br +Epilog=/usr/local/slurm/epilog Prolog=/usr/local/slurm/prolog +.br +FastSchedule=1 +.br +FirstJobId=65536 +.br +HashBase=10 +.br +HeartbeatInterval=60 +.br +InactiveLimit=120 +.br +KillWait=30 +.br +Prioritize=/usr/local/maui/priority +.br +ReturnToService=0 +.br +SlurmctldPort=7002 SlurmdPort=7003 +.br +SlurmctldTimeout=300 SlurmdTimeout=300 +.br +StateSaveLocation=/usr/local/slurm/slurm.state +.br +TmpFS=/tmp +.br +JobCredentialPrivateKey=/usr/local/slurm/private.key +.br +JobCredentialPublicCertificate=/usr/local/slurm/public.cert +.br +# +.br +# Node Configurations +.br +# +.br +NodeName=DEFAULT Procs=2 RealMemory=2000 TmpDisk=64000 State=UNKNOWN +.br +NodeName=dev[0-25] Weight=16 +.br +# +.br +# Partition Configurations +.br +# +.br +PartitionName=DEFAULT MaxTime=30 MaxNodes=10 +.br +PartitionName=debug Nodes=dev[0-8,18-25] State=UP Default=YES +.br +PartitionName=batch Nodes=dev[9-17] State=UP +.ec +.SH "COPYING" +Copyright (C) 2002 The Regents of the University of California. +Produced at Lawrence Livermore National Laboratory (cf, DISCLAIMER). +UCRL-CODE-2002-040. +.LP +This file is part of SLURM, a resource management program. +For details, see <http://www.llnl.gov/linux/slurm/>. +.LP +SLURM is free software; you can redistribute it and/or modify it under +the terms of the GNU General Public License as published by the Free +Software Foundation; either version 2 of the License, or (at your option) +any later version. +.LP +SLURM is distributed in the hope that it will be useful, but WITHOUT ANY +WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS +FOR A PARTICULAR PURPOSE. See the GNU General Public License for more +details. +.SH "FILES" +/etc/slurm.conf +.SH "SEE ALSO" +.LP +\fBscontrol\fR(1), \fBslurmctld\fR(8), \fBslurmd\fR(8)