Skip to content
Snippets Groups Projects
srun.1 35.52 KiB
\." $Id$
.\"
.TH SRUN "1" "September 2005" "srun 0.7" "slurm components"
.SH "NAME"
srun \- run parallel jobs
.SH SYNOPSIS
.B srun
[\fIOPTIONS\fR...] \fIexecutable \fR[\fIargs\fR...]
.br
.B srun
\-\-batch [\fIOPTIONS\fR...] job_script
.br
.B srun
\-\-allocate [\fIOPTIONS\fR...] [job_script]
.br
.B srun
\-\-attach=jobid
.SH DESCRIPTION
Allocate resources and optionally initiate parallel jobs on
clusters managed by SLURM.
.TP
parallel run options
.TP
\fB\-n\fR, \fB\-\-ntasks\fR=\fIntasks\fR
Specify the number of processes to run. Request that \fBsrun\fR
allocate \fIntasks\fR processes.  The default is one process per
node, but note that the \fB\-c\fR parameter will change this default.
.TP
\fB\-c\fR, \fB\-\-cpus\-per\-task\fR=\fIncpus\fR
Request that \fIncpus\fR be allocated \fBper process\fR. This may be
useful if the job is multithreaded and requires more than one cpu
per task for optimal performance. The default is one cpu per process.
If \fB\-c\fR is specified without \fB\-n\fR as many 
tasks will be allocated per node as possible while satisfying
the \fB\-c\fR restriction. (See \fBBUGS\fR below.)
.TP
\fB\-N\fR, \fB\-\-nodes\fR=\fIminnodes\fR[\-\fImaxnodes\fR]
Request that a minimum of \fIminnodes\fR nodes be allocated to this job.
The scheduler may decide to launch the job on more than \fIminnodes\fR nodes.
A limit on the maximum node count may be specified with \fImaxnodes\fR
(e.g. "\-\-nodes=2\-4").  The minimum and maximum node count may be the
same to specify a specific number of nodes (e.g. "\-\-nodes=2\-2" will ask
for two and ONLY two nodes).  The partition's node 
limits supersede those of the job. If a job's node limits are completely 
outside of the range permitted for its associated partition, the job 
will be left in a PENDING state. Note that the environment 
variable \fBSLURM_NNODES\fR will be set to the count of nodes actually 
allocated to the job. See the \fBENVIRONMENT VARIABLES \fR section 
for more information.  If \fB\-N\fR is not specified, the default
behaviour is to allocate enough nodes to satisfy the requirements of
the \fB\-n\fR and \fB\-c\fR options.
.TP
\fB\-r\fR, \fB\-\-relative\fR=\fIn\fR
Run a job step relative to node \fIn\fR of the current allocation. 
This option may be used to spread several job steps out among the
nodes of the current job. If \fB-r\fR is used, the current job
step will begin at node \fIn\fR of the allocated nodelist, where
the first node is considered node 0.  The \fB\-r\fR option is not 
permitted along with \fB\-w\fR or \fB\-x\fR, and will be silently
ignored when not running within a prior allocation (i.e. when
SLURM_JOBID is not set). The default for \fIn\fR is 0. If the 
value of \fB\-\-nodes\fR exceeds the number of nodes identified 
with the \fB\-\-relative\fR option, a warning message will be 
printed and the \fB\-\-relative\fR option will take precedence.
.TP
\fB\-p\fR, \fB\-\-partition\fR=\fIpartition\fR
Request resources from partition "\fIpartition\fR." Partitions
are created by the slurm administrator, who also identify one 
of those partitions as the default. 
.TP
\fB\-P\fR, \fB\-\-dependency\fR=\fIjobid\fR
Defer initiation of this job until the specified jobid
has completed execution.  Many jobs can share the same 
dependency and these jobs may belong to different users.
The value may be changed after job submission using the 
\fBscontrol\fR command.
.TP
\fB\-\-begin\fR=\fItime\fR
Defer initiation of this job until the specified time.
It accepts times of the form \fIHH:MM:SS\fR to run a job at 
a specific time of day (seconds are optional).
(If that time is already past, the next day is assumed.) 
You may also specify \fImidnight\fR, \fInoon\fR, or 
\fIteatime\fR (4pm) and you can have a time-of-day suffixed 
with \fIAM\fR or \fIPM\fR for running in the morning or the evening.  
You can also say what day the job will be run, by giving 
a date in the form \fImonth-name\fR day with an optional year,
or giving a date of the form \fIMMDDYY\fR or \fIMM/DD/YY\fR 
or \fIDD.MM.YY\fR. You can also 
give times like \fInow + count time-units\fR, where the time-units
can be \fIminutes\fR, \fIhours\fR, \fIdays\fR, or \fIweeks\fR 
and you can tell SLURM to run the job today with the keyword
\fItoday\fR and to run the job tomorrow with the keyword
\fItomorrow\fR.
The value may be changed after job submission using the
\fBscontrol\fR command.
.TP
\fB\-U\fR, \fB\-\-account\fR=\fIaccount\fR
Change resource use by this job to specified account.
The \fIaccount\fR is an arbitrary string. The may 
be changed after job submission using the \fBscontrol\fR 
command.
.TP
\fB\-t\fR, \fB\-\-time\fR=\fIminutes\fR
Establish a time limit to terminate the job after the specified number of 
minutes. If the job's time limit exceeds the partition's time limit, the 
job will be left in a PENDING state. The default value is the partition's 
time limit. When the time limit is reached, the job's processes are sent 
SIGTERM followed by SIGKILL. The interval between signals is specified by 
the SLURM configuration parameter \fBKillWait\fR. A time limit of 0 minutes
indicates that an infinite timelimit should be used.
.TP
\fB\-D\fR, \fB\-\-chdir\fR=\fIpath\fR
have the remote processes do a chdir to \fIpath\fR before beginning
execution. The default is to chdir to the current working directory
of the \fBsrun\fR process.
.TP
\fB\-I\fR, \fB\-\-immediate\fR
exit if resources are not immediately
available. By default, \fB\-\-immediate\fR is off, and
.B srun
will block until resources become available.
.TP
\fB\-k\fR, \fB\-\-no\-kill\fR
Do not automatically terminate a job of one of the nodes it has been 
allocated fails.  This option is only recognized on a job allocation, 
not for the submission of individual job steps. 
The job will assume all responsibilities for fault\-tolerance. The 
active job step (MPI job) will almost certainly suffer a fatal error, 
but subsequent job steps may be run if this option is specified. The
default action is to terminate job upon node failure. Note that
\fB\-\-batch\fR jobs will be re\-queued if a node failure occurs in the 
process of initiating it.
.TP
\fB\-K\fR, \fB\-\-kill\-on\-bad\-exit\fR
Terminate a job if any task exits with a non-zero exit code.
.TP
\fB\-s\fR, \fB\-\-share\fR
The job can share nodes with other running jobs. This may result in faster job 
initiation and higher system utilization, but lower application performance.
.TP
\fB\-O\fR, \fB\-\-overcommit\fR
overcommit resources. Normally,
.B srun
will not allocate more than one process per cpu. By specifying
\fB\-\-overcommit\fR you are explicitly allowing more than one process
per cpu. However no more than \fBMAX_TASKS_PER_NODE\fR tasks are 
permitted to execute per node.
./"NOTE: Do not document feature until user release mechanism is available.
./".TP
./"-H, --hold
./"Specify the job is to be submitted in a held state (priority of zero).
./"A held job can now be released using scontrol to reset its priority.
.TP
\fB\-T\fR, \fB\-\-threads\fR=\fInthreads\fR
Request that 
.B srun
use \fInthreads\fR to initiate and control the parallel job. The 
default value is the smaller of 10 or the number of nodes allocated.
.TP
\fB\-l\fR, \fB\-\-label\fR
prepend task number to lines of stdout/err. Normally, stdout and stderr
from remote tasks is line-buffered directly to the stdout and stderr of
.B srun
The \fB\-\-label\fR option will prepend lines of output with the remote
task id.
.TP
\fB-u\fR, \fB\-\-unbuffered\fR
do not line buffer stdout from remote tasks. This option cannot be used
with \fI\-\-label\fR. 
.TP
\fB\-m\fR, \fB\-\-distribution\fR=(\fIblock\fR|\fIcyclic\fR)
Specify an alternate distribution method for remote processes.
.RS
.TP
.B block
The block method of distribution will allocate processes in-order to
the cpus on a node. If the number of processes exceeds the number of 
cpus on all of the nodes in the allocation then all nodes will be 
utilized. For example, consider an allocation of three nodes each with 
two cpus. A four\-process block distribution request will distribute 
those processes to the nodes with processes one and two on the first 
node, process three on the second node, and process four on the third node.  
Block distribution is the default behavior if the number of tasks 
exceeds the number of nodes requested.
.TP
.B cyclic
The cyclic method distributes processes in a round-robin fashion across
the allocated nodes. That is, process one will be allocated to the first
node, process two to the second, and so on. This is the default behavior
if the number of tasks is no larger than the number of nodes requested.
.RE
.TP
\fB\-J\fR, \fB\-\-job\-name\fR=\fIjobname\fR
Specify a name for the job. The specified name will appear along with
the job id number when querying running jobs on the system. The default
is the supplied \fBexecutable\fR program's name.
.TP
\fB\-\-mpi\fR=\fImpi_type\fR
Identify the type of MPI to be used. May result in unique initiation 
procedures.
.RS
.TP
.B list
Lists avaliable mpi types to choose from.
.TP
.B lam
Initiates one 'lamd' process per node and establishes necessary
environment variables for LAM/MPI.
.TP
.B mpich\-gm
For use with Myrinet.
.TP
.B mvapich
For use with Infiniband.
.TP
.B none
No special MPI processing. This is the default and works with 
many other versions of MPI.
.RE
.TP
\fB\-\-jobid\fR=\fIid\fR
Initiate a job step under an already allocated job with job id \fIid\fR.
Using this option will cause \fBsrun\fR to behave exactly as if the
SLURM_JOBID environment variable was set.
.TP
\fB\-o\fR, \fB\-\-output\fR=\fImode\fR
Specify the mode for stdout redirection. By default in interactive mode,
.B srun
collects stdout from all tasks and line buffers this output to
the attached terminal. With \fB\-\-output\fR stdout may be redirected
to a file, to one file per task, or to /dev/null. See section 
\fBIO Redirection\fR below for the various forms of \fImode\fR.
If the specified file already exists, it will be overwritten.
.TP
\fB\-i\fR, \fB\-\-input\fR=\fImode\fR
Specify how stdin is to redirected. By default,
.B srun
redirects stdin from the terminal all tasks. See \fBIO Redirection\fR
below for more options.
.TP
\fB\-e\fR, \fB\-\-error\fR=\fImode\fR
Specify how stderr is to be redirected. By default in interactive mode,
.B srun
redirects stderr to the same file as stdout, if one is specified. The
\fB\-\-error\fR option is provided to allow stdout and stderr to be
redirected to different locations.
See \fBIO Redirection\fR below for more options.
If the specified file already exists, it will be overwritten.
.TP
\fB\-b\fR, \fB\-\-batch\fR
Submit in "batch mode." \fBsrun\fR will make a copy of the \fIexecutable\fR 
file (a script) and submit the request for execution when resouces are 
available. \fBsrun\fR will terminate after the request has been submitted. 
The \fIexecutable\fR file will run on the first node allocated to the 
job and must contain \fBsrun\fR commands to initiate parallel tasks.
stdin will be redirected from /dev/null, stdout and stderr will be
redirected to a file (default is \fIjobname\fR.out or \fIjobid\fR.out in
current working directory, see \fB\-o\fR for other IO options).
Note that if the slurm daemons are cold-started, jobid values will be 
reused. Plan accordingly to avoid over-writing output and error files. 
\fIexecutable\fR must be specified using either a fully qualified 
pathname or its pathname will be relative to the current working directory. 
The search path will not be used to locate the file. \fIexecutable\fR 
will be interpreted by the users default shell unless the file begins 
with "#!" followed by the fully qualified pathname of a valid shell.
Note that batch jobs will be re\-queued if a node fails while it is being 
initiated. 

Srun commandline options can also be inserted into the script by prefacing 
the option with #SLURM. Multiple options can be on one line or multiple lines. 
i.e.

.br 
#SLURM -N 2 -n 2
.br 
#SLURM --mpi=lam
.br

This is run the script on 2 nodes, with 2 procs with mpi type lam.  
All commandline options are able to be set inside the script with the 
exception of the mode (which has already been set since to run a batch 
script you are in batch mode).
.br
Options on the command line take precedence over options in the batch 
script, which in turn take precedence over exiting environmement variables.
.TP
\fB\-v\fR, \fB\-\-verbose\fR
verbose operation. Multiple \fB-v\fR's will further increase the verbosity of
\fBsrun\fR. By default only errors will be displayed.
.TP
\fB\-d\fR, \fB\-\-slurmd-debug\fR=\fIlevel\fR
Specify a debug level for slurmd(8). \fIlevel\fR may be an integer value
between 0 [quiet, only errors are displayed] and 4 [verbose operation]. 
The slurmd debug information is copied onto the stderr of
the job. By default only errors are displayed. 
.TP
\fB\-W\fR, \fB\-\-wait\fR=\fIseconds\fR
Specify how long to wait after the first task terminates before terminating
all remaining tasks. A value of 0 indicates an unlimited wait (a warning will
be issued after 60 seconds). The default value is set by the WaitTime
parameter in the slurm configuration file (see \fBslurm.conf(5)\fR). This
option can be useful to insure that a job is terminated in a timely fashion
in the event that one or more tasks terminate prematurely.
.TP
\fB\-q\fR, \fB\-\-quit-on-interrupt\fR
Quit immediately on single SIGINT (Ctrl-C). Use of this option
disables the status feature normally available when \fBsrun\fR receives 
a single Ctrl-C and causes \fBsrun\fR to instead immediately terminate the
running job. 
.TP
\fB\-X\fR, \fB\-\-disable-status\fR
Disable the display of task status when srun receives a single SIGINT
(Ctrl-C). Instead immediately forward the SIGINT to the running job.
A second Ctrl-C in one second will forcibly terminate the job and
\fBsrun\fR will immediately exit. May also be set via the environment
variable SLURM_DISABLE_STATUS.
.TP
\fB\-Q\fR, \fB\-\-quiet\fR
Quiet operation. Suppress informational messages. Errors will still
be displayed.
.TP
\fB\-\-uid\fR=\fIuser\fR
Attempt to submit and/or run a job as \fIuser\fR instead of the
invoking user id. The invoking user's credentials will be used
to check access permissions for the target partition. User root
may use this option to run jobs as a normal user in a RootOnly
partition for example. If run as root, \fBsrun\fR will drop
its permissions to the uid specified after node allocation is
successful. \fIuser\fR may be the user name or numerical user ID.
.TP
\fB\-\-gid\fR=\fIgroup\fR
If \fBsrun\fR is run as root, and the \fB\-\-gid\fR option is used, 
submit the job with \fIgroup\fR's group access permissions.  \fIgroup\fR 
may be the group name or the numerical group ID.
.TP
\fB\-\-core\fR=\fItype\fR
Adjust corefile format for parallel job. If possible, srun will set
up the environment for the job such that a corefile format other than
full core dumps is enabled. If run with type = "list", srun will
print a list of supported corefile format types to stdout and exit.
.TP
\fB\-\-propagate\fR[=\fIrlimits\fR]
Allows users to specify which of the modifiable (soft) resource limits
to propagate to the compute nodes and apply to their jobs.  If
\fIrlimits\fR is not specified, then all resource limits will be
propagated.
.TP
\fB\-\-prolog\fR=\fIexecutable\fR
\fBsrun\fR will run \fIexecutable\fR just before launching the job step.
The command line arguments for \fIexecutable\fR will be the command
and arguments of the job step.  If \fIexecutable\fR is "none", then
no prolog will be run.  This parameter overrides the SrunProlog
parameter in slurm.conf.
.TP
\fB\-\-epilog\fR=\fIexecutable\fR
\fBsrun\fR will run \fIexecutable\fR just after the job step completes.
The command line arguments for \fIexecutable\fR will be the command
and arguments of the job step.  If \fIexecutable\fR is "none", then
no epilog will be run.  This parameter overrides the SrunEpilog
parameter in slurm.conf.
.PP
Allocate options:
.TP
\fB\-A\fR, \fB\-\-allocate\fR
allocate resources and spawn a shell. When \fB\-\-allocate\fR is specified to
\fBsrun\fR, no remote tasks are started. Instead a subshell is started that 
has access to the allocated resources. Multiple jobs can then be run on the 
same cpus from within this subshell. See \fBAllocate Mode\fR below.
.TP
\fB\-\-no\-shell\fR
immediately exit after allocating resources instead of spawning a
shell when used with the \fB\-A\fR, \fB\-\-allocate\fR option.
.PP
Attach to running job:
.TP
\fB\-a\fR, \fB\-\-attach\fR=\fIid\fR
This option will attach \fBsrun\fR
to a running job with job id = \fIid\fR. Provided that the calling user
has access to that running job, stdout and stderr will be redirected to the
current session and signals received by
.B srun
will be forwarded to the remote processes.
.TP
\fB\-j\fR, \fB\-\-join\fR
Join with running job. This will duplicate stdout/stderr to the calling
\fBsrun\fR. stdin and signals will not be propagated to the job.
\fB\-\-join\fR is only allowed with \fB\-\-attach\fR.
.TP
\fB\-s\fR, \fB\-\-steal\fR
Steal the connection to the running job. This will close any open
sessions with the specified job and allow stdin and signals to be propagated.
\fB\-\-steal\fR is only allowed with \fB\-\-attach\fR.
.PP
Constraint Options. The following options all put constraints on the nodes
that may be considered for the job:
.TP
\fB\-\-mincpus\fR=\fIn\fR
Specify minimum number of cpus per node.
.TP
\fB\-\-mem\fR=\fIMB\fR
Specify a minimum amount of real memory.
.TP
\fB\-\-tmp\fR=\fIMB\fR
Specify a minimum amount of temporary disk space.
.TP
\fB\-C\fR, \fB\-\-constraint\fR=\fIlist\fR
Specify a list of constraints. 
The constraints are features that have been assigned to the nodes by 
the slurm administrator. 
The \fIlist\fR of constraints may include multiple features separated 
by commas, in which case all nodes must have all listed features 
(i.e. the features are ANDed together). 
Alternately the features may be separated by a vertical bar, '|', 
in which case all nodes have must have at least one of the listed 
features (i.e. the features are ORed together). 
If no nodes have the requested features, then the job will be rejected 
by the slurm job manager.
TP
\fB\-\-contiguous\fR
Demand a contiguous range of nodes. The default is "yes". Specify
--contiguous=no if a contiguous range of nodes is not a constraint.
.TP
\fB\-w\fR, \fB\-\-nodelist\fR=\fIhost1,host2,...\fR or \fIfilename\fR
Request a specific list of hosts. The job will contain \fIat least\fR
these hosts. The list may be specified as a comma-separated list of
hosts, a range of hosts (host[1-5,7,...] for example), or a filename.
The host list will be assumed to be a filename if it contains a "/"
character.
.TP
\fB\-x\fR, \fB\-\-exclude\fR=\fIhost1,host2,...\fR or \fIfilename\fR
Request that a specific list of hosts not be included in the resources 
allocated to this job. The host list will be assumed to be a filename 
if it contains a "/"character.

.PP
The following options support AIX systems, but may be applicable to 
other systems as well. Since POE is used to launch tasks, these 
options are not normally used.
.TP
\fB\-\-network\fR=\fItype\fR
Specify the communication protocol to be used. 

.PP
The following options support Blue Gene systems, but may be 
applicable to other systems as well.
.TP
\fB\-g\fR, fB\-\-geometry\fR=\fIXxYxZ\fR
Specify the geometry requirements for the job. The three numbers 
represent the required geometry giving dimensions in the X, Y and 
Z directions. For example "\-\-geometry=2x3x4", specifies a block 
of nodes having 2 x 3 x 4 = 24 nodes (actually base partions on 
Blue Gene).
.TP
\fB\-\-conn\-type\fR=\fItype\fR
Require the partition connection type to be of a certain type.  
On Blue Gene the acceptable of \fItype\fR are MESH, TORUS and NAV.  
If NAV, or if not set, then SLURM will try to fit a TORUS else MESH.
.TP
\fB\-R\fR, \fB\-\-no-rotate\fR
Disables rotation of the job's requested geometry in order to fit an 
appropriate partition.
By default the specified geometry can rotate in three dimensions.

.PP
Help options
.TP
\fB\-\-help\fR
Show this help message
.TP
\fB\-\-usage\fR
Display brief usage message
.PP
Other options
.TP
\fB\-V\fR, \fB\-\-version\fR
output version information and exit
.PP
Unless the \fB\-a\fR (\fB\-\-attach\fR) or \fB-A\fR (\fB\-\-allocate\fR)
options are specified (see \fBAllocate mode\fR and \fBAttaching to jobs\fR
below),
.B srun
will submit the job request to the slurm job controller, then initiate all
processes on the remote nodes. If the request cannot be met immediately,
.B srun
will block until the resources are free to run the job. If the
\fB\-I\fR (\fB\-\-immediate\fR) option is specified
.B srun
will terminate if resources are not immediately available.
.PP
When initiating remote processes
.B srun
will propagate the current working directory, unless
\fB\-\-chdir\fR=\fIpath\fR is specified, in which case \fIpath\fR will
become the working directory for the remote processes.
.PP
The \fB-n\fB, \fB-c\fR, and \fB-N\fR options control how CPUs  and
nodes will be allocated to the job. When specifying only the number
of processes to run with \fB-n\fR, a default of one CPU per process
is allocated. By specifying the number of CPUs required per task (\fB-c\fR),
more than one CPU may be allocated per process. If the number of nodes
is specified with \fB-N\fR,
.B srun
will attempt to allocate \fIat least\fR the number of nodes specified.
.PP
Combinations of the above three options may be used to change how
processes are distributed across nodes and cpus. For instance, by specifying
both the number of processes and number of nodes on which to run, the
number of processes per node is implied. However, if the number of CPUs
per process is more important then number of processes (\fB-n\fR) and the
number of CPUs per process (\fB-c\fR) should be specified.
.PP
.B srun
will refuse to  allocate more than one process per CPU unless
\fB\-\-overcommit\fR (\fB\-O\fR) is also specified.
.PP
.B srun
will attempt to meet the above specifications "at a minimum." That is,
if 16 nodes are requested for 32 processes, and some nodes do not have
2 CPUs, the allocation of nodes will be increased in order to meet the
demand for CPUs. In other words, a \fIminimum\fR of 16 nodes are being
requested. However, if 16 nodes are requested for 15 processes,
.B srun
will consider this an error, as 15 processes cannot run across 16 nodes.
.PP
.B "IO Redirection"
.PP
By default stdout and stderr will be redirected from all tasks to the
stdout and stderr of
.B srun
, and stdin will be redirected from the standard input of 
.B srun 
to all remote tasks. This behavior may be changed with the 
\fB\-\-output\fR, \fB\-\-error\fR, and \fB\-\-input\fR 
(\fB\-o\fR, \fB\-e\fR, \fB\-i\fR) options. Valid format specifications 
for these options are
.TP 10
\fBall\fR
stdout stderr is redirected from all tasks to srun.
stdin is broadcast to all remote tasks.
(This is the default behavior)
.TP
\fBnone\fR
stdout and stderr is not received from any task. 
stdin is not sent to any task (stdin is closed).
.TP
\fItaskid\fR
stdout and/or stderr are redirected from only the task with relative
id equal to \fItaskid\fR, where 0 <= \fItaskid\fR <= \fIntasks\fR,
where \fIntasks\fR is the total number of tasks in the current job step.
stdin is redirected from the stdin of
.B srun
to this same task.
.TP
\fIfilename\fR
.B srun
will redirect stdout and/or stderr to the named file from all tasks.
stdin will be redirected from the named file and broadcast to all
tasks in the job.  If the job is submitted in batch mode using the
.B -b
or
.B --batch
option, \fIfilename\fR refers to a path on each of the nodes on which
the job runs.  Otherwise \fIfilename\fR refers to a path on the host
that runs \fBsrun\fR.  Depending on the cluster's file system layout,
this may result in the output appearing in different places depending
on whether the job is run in batch mode.
.TP
format string
.B srun 
allows for a format string to be used to generate the named IO file 
described above. The following list of format specifiers may be
used in the format string to generate a filename that will be
unique to a given jobid, stepid, node, or task. In each case, 
the appropriate number of files are opened and associated with
the corresponding tasks.
.RS 10
.TP
%J
jobid.stepid of the running job. (e.g. "128.0")
.TP
%j 
jobid of the running job. 
.TP
%s
stepid of the running job.
.TP
%N
short hostname. This will create a separate IO file per node.
.TP
%n
Node identifier relative to current job (e.g. "0" is the first node of
the running job) This will create a separate IO file per node.
.TP
%t
task identifier (rank) relative to current job. This will create a
separate IO file per task.
.PP
A number placed between the percent character and format specifier may be
used to zero-pad the result in the IO filename. This number is ignored if 
the format specifier corresponds to  non-numeric data (%N for example).

Some examples of how the format string may be used for a 4 task job step
with a Job ID of 128 and step id of 0 are included below:
.TP 15
job%J.out
job128.0.out
.TP
job%4j.out
job0128.out
.TP
job%j-%2t.out 
job128-00.out, job128-01.out, ...
.PP
.RS -10
.PP
.B "Allocate Mode"
.PP
When the allocate option is specified (\fB\-A\fR, \fB\-\-allocate\fR)
\fBsrun\fR will not initiate any remote processes after acquiring
resources. Instead, \fBsrun\fR will spawn a subshell which has access
to the acquired resources. Subsequent instances of \fBsrun\fR from within
this subshell will then run on these resources.
.PP
If the name of a script is specified on the
commandline with \fB\-\-allocate\fR, the spawned shell will run the
specified script. Resources allocated in this way will only be freed
when the subshell terminates.
.PP
.B "Attaching to a running job"
.PP
Use of the \fB-a\fR \fIjobid\fR (or \fB\-\-attach\fR) option allows
\fBsrun\fR to reattach to a running job, receiving stdout and stderr
from the job and forwarding signals to the job, just as if the current
session of \fBsrun\fR had started the job. (stdin, however, cannot
be forwarded to the job).
.PP
There are two ways to reattach to a running job. The default method
is to attach to the current job read-only. In this case, 
stdout and stderr are duplicated to the attaching \fBsrun\fR, but
signals are not forwarded to the remote processes (A single 
Ctrl-C will detach this read-only \fBsrun\fR from the job). If
the \fB-j\fR (\fB\-\-join\fR) option is is also specified, 
\fBsrun\fR "joins" the running job, and is able to forward signals
and acts for the most part much like the \fBsrun\fR process that
initiated the job. 
.PP 
Attaching to running batch jobs is also supported, if the batch 
job is being managed by SLURM (That is, a script submitted with
\fBsrun \-b\fR). The stdout and stderr from the \fIbatch script\fR
will then be copied to the attaching \fBsrun\fR, and if \fB-j\fR
is also specified, signals will be sent to the batch script.
This feature provides a good method for determining the status 
of a running \fBsrun\fR within a batch script. For example, 
consider attaching to a running batch job with jobid 483:
.br

.br
> srun --join --attach 483
.br

.br
After pressing Ctrl-C twice within one second, SIGINT is forwarded
to the batch job script, and the running srun reports its status:
.br

.br
attach[483]: interrupt (one more within 1 sec to abort)
.br
attach[483]: sending Ctrl-C to job
.br
srun: interrupt (one more within 1 sec to abort)
.br
srun: task[0-15]: running
.br

.br
showing that all 16 tasks in the current job step are running.
.PP
Node and CPU selection options do not make sense when specifying 
\fB\-\-attach\fR, and it is an error to use \fB-n\fR, \fB-c\fR, 
or \fB-N\fR in attach mode.
.PP
.SH "ENVIRONMENT VARIABLES"
.PP
Some
.B srun
options may be set via environment variables. These environment
variables, along with their corresponding options, are listed below.
(Note: commandline options will always override these settings)
.TP 20
\fBSLURM_CONF\fR
The location of the SLURM configuration file.
.TP
\fBSLURM_ACCOUNT\fR
\fB\-U, \-\-account\fR=\fIaccount\fR
.TP 20
\fBSLURM_CPUS_PER_TASK\fR
\fB\-c, \-\-ncpus\-per\-task\fR=\fIn\fR
.TP
\fBSLURM_CONN_TYPE\fR
\fB\-\-conn\-type\fR=(\fImesh|nav|torus\fR)
.TP
\fBSLURM_CORE_FORMAT\fR
\fB\-\-core\fR=\fIformat\fR
.TP
\fBSLURM_DEBUG\fR
\fB\-v, \-\-verbose\fR
.TP
\fBSLURMD_DEBUG\fR
\fB\-d, \-\-slurmd-debug\fR
.TP
\fBSLURM_DISTRIBUTION\fR
\fB\-m, \-\-distribution\fR=(\fIblock|cyclic\fR)
.TP
\fBSLURM_GEOMETRY\fR
\fB\-g, \-\-geometry\fR=\fIX,Y,Z\fR
.TP
\fBSLURM_LABELIO\fR
\fB-l, --label\fR
.TP
\fBSLURM_NNODES\fR
\fB\-N, \-\-nodes\fR=(\fIn|min-max\fR)
.TP
\fBSLURM_NO_ROTATE\fR
\fB\-\-no\-rotate\fR
.TP
\fBSLURM_NODE_USE\fR
\fB\-\-node\-use\fR=(\fIcoprocessor|virtual\fR)
.TP
\fBSLURM_NPROCS\fR
\fB\-n, \-\-ntasks\fR=\fIn\fR
.TP
\fBSLURM_OVERCOMMIT\fR
\fB\-o, \-\-overcommit\fR
.TP
\fBSLURM_PARTITION\fR
\fB\-p, --partition\fR=\fIpartition\fR
.TP
\fBSLURM_REMOTE_CWD\fR
\fB\-D, --chdir=\R=\fIdir\fR
.TP
\fBSLURM_STDERRMODE\fR
\fB\-e, \-\-error\fR=\fImode\fR
.TP
\fBSLURM_STDINMODE\fR
\fB\-i, \-\-input\fR=\fImode\fR
.TP
\fBSLURM_STDOUTMODE\fR
\fB\-o, \-\-output\fR=\fImode\fR
.TP
\fBSLURM_TIMELIMIT\fR
\fB\-t, \-\-time\fR=\fIminutes\fR
.TP
\fBSLURM_WAIT\fR
\fB\-W, \-\-wait\fR=\fIseconds\fR
.TP
\fBSLURM_DISABLE_STATUS\fR
\fB\-X, \-\-disable-status\fR
.PP
Additionally,
.B srun
will set some environment variables  in the environment of the
executing tasks on the remote compute nodes. These environment variables
are:
.TP 20
\fBSLURM_CPUS_ON_NODE\fR
Count of processors available to the job on this node
.TP
\fBSLURM_JOBID\fR
Job id of the executing job
.TP
\fBSLURM_LAUNCH_NODE_IPADDR\fR
IP adddress of the node from which the task launch was 
initiated (where the srun command ran from)
.TP
\fBSLURM_NNODES\fR
Total number of nodes in the job's resource allocation
.TP
\fBSLURM_NODEID\fR
The relative node ID of the current node
.TP
\fBSLURM_NODELIST\fR
List of nodes allocated to the job
.TP
\fBSLURM_NPROCS\fR
Total number of processes in the current job
.TP
\fBSLURM_PROCID\fR
The MPI rank (or relative process ID) of the current process
.TP
\fBSLURM_TASKS_PER_NODE\fR
Number of tasks to be initiated on each node. Values are 
comma separated and in the same order as SLURM_NODELIST.
If two or more consecutive nodes are to have the same task 
count, that count is followed by "(x#)" where "#" is the 
repetition count. For example, "SLURM_TASKS_PER_NODE=2(x3),1"
indicates that the first three nodes will each execute three 
tasks and the fourth node will execute one task.

.SH "SIGNALS AND ESCAPE SEQUENCES"
Signals sent to the \fBsrun\fR command are automatically forwarded to 
the tasks it is controlling with a few exceptions. The escape sequence
\fB<control-c>\fR will report the state of all tasks associated with 
the \fBsrun\fR command. If \fB<control-c>\fR is entered twice within 
one second, then the associated SIGINT signal will be sent to all tasks.
If a third \fB<control-c>\fR is received, the job will be forcefully
terminated without waiting for remote tasks to exit.

The escape sequence \fB<control-z>\fR is presently ignored. Our intent 
is for this put the \fBsrun\fR command into a mode where various special 
actions may be invoked.

.SH "MPI SUPPORT"
LAM/MPI version 7.0.4 or higher is well intergated with SLURM. 
The \fBlamboot\fR command will acquire a SLURM resource allocation 
and uses the \fBsrun\fR command to launch its \fBlamd\fR daemons on 
each allocated node.  See \fIhttp://www.lam\-mpi.org/\fR for more 
information.

On computers with a Quadrics interconnect, \fBsrun\fR directly supports
the Quadrics version of MPI without modification. Applications build
using the Quadrics MPI library will communicate directly over the
Quadrics interconnect without any special \fBsrun\fR options.

Users may also use MPICH on any computer where that is available. 
The \fBmpirun\fR command may need to be provided with information 
on its command line identifying the resources to be used. The 
installer of the MPICH software may configure it to perform these 
steps automatically. At worst, you must specify two parameters:
.TP
\fB\-np SLURM_NPROCS\fR
number of processors to run on
.TP
\fB\-machinefile <machinefile>\fR
list of computers on which to execute. This list can be constructed 
executing the command \fBsrun /bin/hostname\fR and writing its standard 
output to the desired file. Execute \fBmpirun \-\-help\fR for more options.

.SH "EXAMPLES"
This simple example demonstrates the execution of the command \fBhostname\fR
in eight tasks. At least eight processors will be allocated to the job 
(the same as the task count) on however many nodes are required to satisfy 
the request. The output of each task will be proceeded with its task number.
(The machine "dev" in the example below has a total of two CPUs per node)

.nf

> srun \-n8 \-l hostname
0: dev0
1: dev0
2: dev1
3: dev1
4: dev2
5: dev2
6: dev3
7: dev3

.fi
.PP
This example demonstrates how one might submit a script for later 
execution (batch mode). The script will be initiated when resources 
are available and no higher priority job is pending for the same 
partition. The script will execute on 4 nodes with one task per node 
implicit. Note that the script executes on one node. For the script 
to utilize all allocated nodes, it must execute the \fBsrun\fR command 
or an MPI program.

.nf

> cat test.sh
#!/bin/sh
date
srun \-l hostname

> srun \-N4 \-b test.sh
srun: jobid 42 submitted

.fi
.PP
The output of test.sh would be found in the default output file
"slurm-42.out."
.PP
The srun \fB-r\fR option is used within a job script
to run two job steps on disjoint nodes in the following
example. The script is run using allocate mode instead
of as a batch job in this case.

.nf

> cat test.sh
#!/bin/sh
echo $SLURM_NODELIST
srun -lN2 -r2 hostname
srun -lN2 hostname

> srun -A -N4 test.sh
dev[7-10]
0: dev9
1: dev10
0: dev7
1: dev8

.fi
.PP
The follwing script runs two job steps in parallel 
within an allocated set of nodes. 

.nf
> cat test.sh
#!/bin/bash
srun -lN2 -n4 -r 2 sleep 60 &
srun -lN2 -r 0 sleep 60 &
sleep 1
squeue
squeue -s
wait

> srun -A -N4 test.sh
  JOBID PARTITION     NAME     USER  ST      TIME  NODES NODELIST
  65641     batch  test.sh   grondo   R      0:01      4 dev[7-10]

STEPID     PARTITION     USER      TIME NODELIST
65641.0        batch   grondo      0:01 dev[7-8]
65641.1        batch   grondo      0:01 dev[9-10]

.fi
.PP
This example demonstrates how one executes a simple MPICH job.
We use \fBsrun\fR to build a list of machines (nodes) to be used by 
\fBmpirun\fR in its required format. A sample command line and 
the script to be executed follow.

.nf

> cat test.sh
#!/bin/sh
MACHINEFILE="nodes.$SLURM_JOBID"

# Generate Machinefile for mpich such that hosts are in the same
#  order as if run via srun
#
srun -l /bin/hostname | sort -n | awk '{print $2}' > $MACHINEFILE

# Run using generated Machine file:
mpirun -np $SLURM_NPROCS -machinefile $MACHINEFILE mpi-app

rm $MACHINEFILE

> srun -AN2 -n4 test.sh

.fi 

.SH "BUGS"
If the number of processors per node allocated to a job is not evenly 
divisible by the value of \fBcpus\-per\-node\fR, tasks may be initiated 
on nodes lacking a sufficient number of processors for the desired parallelism. 

For example, if we are running a job on a cluster comprised of quad-processor
nodes, and we run the following:

.nf
> srun -n 4 -c 3 -l hostname
0: quad0
1: quad0
2: quad1
3: quad2
.fi
.PP
The desired outcome for -c 3 with quad-processor nodes is each process
running own its own node, but this is not the result.
\fBsrun\fR assumes that the job requires 3 * 4 = 12 processes, and only requests
as much from slurmctld.  slurmctld can satisfy the request of 12 processors with only
three nodes in this example, and that is all that the job receives.  Unfortunatly,
the -c 3 parameter is not honored.
.PP
The \fBnodes\fR and \fBmincpus\fR options may be helpful in preventing this problem.
For instance to acheive the desired allocation in the above example:
.nf
> srun -N 4 -n 4 -c 3 --mincpus 3 -l hostname
0: quad0
1: quad1
2: quad2
3: quad3
.fi

.SH "SEE ALSO"
\fBscancel\fR(1), \fBscontrol\fR(1), \fBsqueue\fR(1), \fBslurm.conf\fR(5)