Newer
Older
.TH "srun" "1" "SLURM 14.11" "November 2014" "SLURM Commands"
srun \- Run parallel jobs
.SH "SYNOPSIS"
\fBsrun\fR [\fIOPTIONS\fR...] \fIexecutable \fR[\fIargs\fR...]
.SH "DESCRIPTION"

Christopher J. Morrone
committed
Run a parallel job on cluster managed by SLURM. If necessary, srun will
first create a resource allocation in which to run the parallel job.
The following document describes the the influence of various options on the
allocation of cpus to jobs and tasks.
http://slurm.schedmd.com/cpu_management.html
.SH "OPTIONS"
.LP
.TP
\fB\-A\fR, \fB\-\-account\fR=<\fIaccount\fR>
Charge resources used by this job to specified account.
The \fIaccount\fR is an arbitrary string. The account name may
be changed after job submission using the \fBscontrol\fR
command.
\fB\-\-acctg\-freq\fR
Define the job accounting and profiling sampling intervals.
This can be used to override the \fIJobAcctGatherFrequency\fR parameter in SLURM's
configuration file, \fIslurm.conf\fR.
The supported format is follows:
.RS
.TP 12
\fB\-\-acctg\-freq=\fR\fI<datatype>\fR\fB=\fR\fI<interval>\fR
where \fI<datatype>\fR=\fI<interval>\fR specifies the task sampling
interval for the jobacct_gather plugin or a
sampling interval for a profiling type by the
acct_gather_profile plugin. Multiple,
comma-separated \fI<datatype>\fR=\fI<interval>\fR intervals
may be specified. Supported datatypes are as follows:
.RS
.TP
\fBtask=\fI<interval>\fR
where \fI<interval>\fR is the task sampling interval in seconds
for the jobacct_gather plugins and for task
profiling by the acct_gather_profile plugin.
NOTE: This frequency is used to monitor memory usage. If memory limits
are enforced the highest frequency a user can request is what is configured in
the slurm.conf file. They can not turn it off (=0) either.
.TP
\fBenergy=\fI<interval>\fR
where \fI<interval>\fR is the sampling interval in seconds
for energy profiling using the acct_gather_energy plugin
.TP
\fBnetwork=\fI<interval>\fR
where \fI<interval>\fR is the sampling interval in seconds
for infiniband profiling using the acct_gather_infiniband
plugin.
.TP
\fBfilesystem=\fI<interval>\fR
where \fI<interval>\fR is the sampling interval in seconds
for filesystem profiling using the acct_gather_filesystem
plugin.
.TP
.RE
.RE
.br
The default value for the task sampling interval
is 30. The default value for all other intervals is 0.
An interval of 0 disables sampling of the specified type.
If the task sampling interval is 0, accounting
information is collected only at job termination (reducing SLURM
interference with the job).
.br
.br
Smaller (non\-zero) values have a greater impact upon job performance,
but a value of 30 seconds is not likely to be noticeable for
applications having less than 10,000 tasks.
.RE
\fB\-B\fR \fB\-\-extra\-node\-info\fR=<\fIsockets\fR[:\fIcores\fR[:\fIthreads\fR]]>
Request a specific allocation of resources with details as to the
number and type of computational resources within a cluster:
number of sockets (or physical processors) per node,
cores per socket, and threads per core.
The total amount of resources being requested is the product of all of
the terms.
Each value specified is considered a minimum.
An asterisk (*) can be used as a placeholder indicating that all available
resources of that type are to be utilized.
As with nodes, the individual levels can also be specified in separate
options if desired:
.nf
\fB\-\-sockets\-per\-node\fR=<\fIsockets\fR>
\fB\-\-cores\-per\-socket\fR=<\fIcores\fR>
\fB\-\-threads\-per\-core\fR=<\fIthreads\fR>
If task/affinity plugin is enabled, then specifying an allocation in this
manner also sets a default \fB\-\-cpu_bind\fR option of \fIthreads\fR
if the \fB\-B\fR option specifies a thread count, otherwise an option of
\fIcores\fR if a core count is specified, otherwise an option of \fIsockets\fR.
If SelectType is configured to select/cons_res, it must have a parameter of
CR_Core, CR_Core_Memory, CR_Socket, or CR_Socket_Memory for this option
to be honored.
This option is not supported on BlueGene systems (select/bluegene plugin
is configured).
If not specified, the scontrol show job will display 'ReqS:C:T=*:*:*'.
\fB\-\-begin\fR=<\fItime\fR>
Defer initiation of this job until the specified time.
It accepts times of the form \fIHH:MM:SS\fR to run a job at
a specific time of day (seconds are optional).
(If that time is already past, the next day is assumed.)
You may also specify \fImidnight\fR, \fInoon\fR, \fIfika\fR (3 PM) or
\fIteatime\fR (4 PM) and you can have a time\-of\-day suffixed
with \fIAM\fR or \fIPM\fR for running in the morning or the evening.
You can also say what day the job will be run, by specifying
a date of the form \fIMMDDYY\fR or \fIMM/DD/YY\fR
\fIYYYY\-MM\-DD\fR. Combine date and time using the following
format \fIYYYY\-MM\-DD[THH:MM[:SS]]\fR. You can also
give times like \fInow + count time\-units\fR, where the time\-units
can be \fIseconds\fR (default), \fIminutes\fR, \fIhours\fR,
\fIdays\fR, or \fIweeks\fR and you can tell SLURM to run
the job today with the keyword \fItoday\fR and to run the
job tomorrow with the keyword \fItomorrow\fR.
The value may be changed after job submission using the
\fBscontrol\fR command.
For example:
.nf
\-\-begin=16:00
\-\-begin=now+1hour
\-\-begin=now+60 (seconds by default)
\-\-begin=2010\-01\-20T12:34:00
.RS
.PP
Notes on date/time specifications:
\- Although the 'seconds' field of the HH:MM:SS time specification is
allowed by the code, note that the poll time of the SLURM scheduler
is not precise enough to guarantee dispatch of the job on the exact
second. The job will be eligible to start on the next poll
following the specified time. The exact poll interval depends on the
SLURM scheduler (e.g., 60 seconds with the default sched/builtin).
\- If no time (HH:MM:SS) is specified, the default is (00:00:00).
\- If a date is specified without a year (e.g., MM/DD) then the current
year is assumed, unless the combination of MM/DD and HH:MM:SS has
already passed for that year, in which case the next year is used.
.RE
\fB\-\-checkpoint\fR=<\fItime\fR>
Specifies the interval between creating checkpoints of the job step.
By default, the job step will have no checkpoints created.
Acceptable time formats include "minutes", "minutes:seconds",
"hours:minutes:seconds", "days\-hours", "days\-hours:minutes" and
"days\-hours:minutes:seconds".
\fB\-\-checkpoint\-dir\fR=<\fIdirectory\fR>
Specifies the directory into which the job or job step's checkpoint should
be written (used by the checkpoint/blcr and checkpoint/xlch plugins only).
The default value is the current working directory.
Checkpoint files will be of the form "<job_id>.ckpt" for jobs
and "<job_id>.<step_id>.ckpt" for job steps.
.TP
\fB\-\-comment\fR=<\fIstring\fR>
An arbitrary comment.
.TP
\fB\-C\fR, \fB\-\-constraint\fR=<\fIlist\fR>
Nodes can have \fBfeatures\fR assigned to them by the SLURM administrator.
Users can specify which of these \fBfeatures\fR are required by their job
using the constraint option.
Only nodes having features matching the job constraints will be used to
satisfy the request.
Multiple constraints may be specified with AND, OR, matching OR,
resource counts, etc.
Supported \fbconstraint\fR options include:
.PD 1
.RS
.TP
\fBSingle Name\fR
Only nodes which have the specified feature will be used.
For example, \fB\-\-constraint="intel"\fR
.TP
A request can specify the number of nodes needed with some feature
by appending an asterisk and count after the feature name.
For example "\fB\-\-nodes=16 \-\-constraint=graphics*4 ..."\fR
indicates that the job requires 16 nodes at that at least four of those
nodes must have the feature "graphics."
.TP
\fBAND\fR
If only nodes with all of specified features will be used.
The ampersand is used for an AND operator.
For example, \fB\-\-constraint="intel&gpu"\fR
.TP
\fBOR\fR
If only nodes with at least one of specified features will be used.
The vertical bar is used for an OR operator.
For example, \fB\-\-constraint="intel|amd"\fR
.TP
If only one of a set of possible options should be used for all allocated
nodes, then use the OR operator and enclose the options within square brackets.
For example: "\fB\-\-constraint=[rack1|rack2|rack3|rack4]"\fR might
be used to specify that all nodes must be allocated on a single rack of
the cluster, but any of those four racks can be used.
.TP
\fBMultiple Counts\fR
Specific counts of multiple resources may be specified by using the AND
operator and enclosing the options within square brackets.
For example: "\fB\-\-constraint=[rack1*2&rack2*4]"\fR might
be used to specify that two nodes must be allocated from nodes with the feature
of "rack1" and four nodes must be allocated from nodes with the feature
"rack2".
.RE
\fBWARNING\fR: When srun is executed from within salloc or sbatch,
the constraint value can only contain a single feature name. None of the
other operators are currently supported for job steps.
\fB\-\-contiguous\fR
If set, then the allocated nodes must form a contiguous set.
Not honored with the \fBtopology/tree\fR or \fBtopology/3d_torus\fR
plugins, both of which can modify the node ordering.
Not honored for a job step's allocation.
.TP
\fB\-\-cores\-per\-socket\fR=<\fIcores\fR>
Restrict node selection to nodes with at least the specified number of
cores per socket. See additional information under \fB\-B\fR option
above when task/affinity plugin is enabled.
\fB\-\-cpu_bind\fR=[{\fIquiet,verbose\fR},]\fItype\fR

Morris Jette
committed
Bind tasks to CPUs.
Used only when the task/affinity or task/cgroup plugin is enabled.
The configuration parameter \fBTaskPluginParam\fR may override these options.
For example, if \fBTaskPluginParam\fR is configured to bind to cores,
your job will not be able to bind tasks to sockets.
NOTE: To have SLURM always report on the selected CPU binding for all
commands executed in a shell, you can enable verbose mode by setting
the SLURM_CPU_BIND environment variable value to "verbose".
The following informational environment variables are set when \fB\-\-cpu_bind\fR
is in use:
SLURM_CPU_BIND_VERBOSE
SLURM_CPU_BIND_TYPE
SLURM_CPU_BIND_LIST
See the \fBENVIRONMENT VARIABLES\fR section for a more detailed description
of the individual SLURM_CPU_BIND variables. These variable are available
only if the task/affinity plugin is configured.
When using \fB\-\-cpus\-per\-task\fR to run multithreaded tasks, be aware that
CPU binding is inherited from the parent of the process. This means that
the multithreaded task should either specify or clear the CPU binding
itself to avoid having all threads of the multithreaded task use the same
mask/CPU as the parent. Alternatively, fat masks (masks which specify more
than one allowed CPU) could be used for the tasks in order to provide
multiple CPUs for the multithreaded tasks.
By default, a job step has access to every CPU allocated to the job.
To ensure that distinct CPUs are allocated to each job step, use the
\fB\-\-exclusive\fR option.
If the job step allocation includes an allocation with a number of
sockets, cores, or threads equal to the number of tasks times cpus\-per\-task,

Martin Perry
committed
then the tasks will by default be bound to the appropriate resources (auto
binding). Disable this mode of operation by explicitly setting
Note that a job step can be allocated different numbers of CPUs on each node
or be allocated CPUs not starting at location zero. Therefore one of the
options which automatically generate the task binding is recommended.
Explicitly specified masks or bindings are only honored when the job step
has been allocated every available CPU on the node.
Binding a task to a NUMA locality domain means to bind the task to the set of
CPUs that belong to the NUMA locality domain or "NUMA node".
If NUMA locality domain options are used on systems with no NUMA support, then
each socket is considered a locality domain.
Supported options include:
.PD 1
.RS
Quietly bind before task runs (default)
Verbosely report binding before task runs

Martin Perry
committed
Do not bind tasks to CPUs (default unless auto binding is applied)
Automatically bind by task rank.
The lowest numbered task on each node is bound to socket (or core or thread) zero, etc.
Not supported unless the entire node is allocated to the job.
Bind by mapping CPU IDs to tasks as specified
where <list> is <cpuid1>,<cpuid2>,...<cpuidN>.
The mapping is specified for a node and identical mapping is applied to the
tasks on every node (i.e. the lowest task ID on each node is mapped to the
CPU IDs are interpreted as decimal values unless they are preceded
with '0x' in which case they are interpreted as hexadecimal values.
Not supported unless the entire node is allocated to the job.
Bind by setting CPU masks on tasks as specified
where <list> is <mask1>,<mask2>,...<maskN>.
The mapping is specified for a node and identical mapping is applied to the
tasks on every node (i.e. the lowest task ID on each node is mapped to the
CPU masks are \fBalways\fR interpreted as hexadecimal values but can be
preceded with an optional '0x'. Not supported unless the entire node is
allocated to the job.
.B rank_ldom
Bind to a NUMA locality domain by rank. Not supported unless the entire
node is allocated to the job.
.TP
.B map_ldom:<list>
Bind by mapping NUMA locality domain IDs to tasks as specified where
<list> is <ldom1>,<ldom2>,...<ldomN>.
The locality domain IDs are interpreted as decimal values unless they are
preceded with '0x' in which case they are interpreted as hexadecimal values.
Not supported unless the entire node is allocated to the job.
.TP
.B mask_ldom:<list>
Bind by setting NUMA locality domain masks on tasks as specified
where <list> is <mask1>,<mask2>,...<maskN>.
NUMA locality domain masks are \fBalways\fR interpreted as hexadecimal
values but can be preceded with an optional '0x'.
Not supported unless the entire node is allocated to the job.
Automatically generate masks binding tasks to sockets.
Only the CPUs on the socket which have been allocated to the job will be used.
If the number of tasks differs from the number of allocated sockets
this can result in sub\-optimal binding.
Automatically generate masks binding tasks to cores.
If the number of tasks differs from the number of allocated cores
this can result in sub\-optimal binding.
Automatically generate masks binding tasks to threads.
If the number of tasks differs from the number of allocated threads
this can result in sub\-optimal binding.
.B ldoms
Automatically generate masks binding tasks to NUMA locality domains.
If the number of tasks differs from the number of allocated locality domains
this can result in sub\-optimal binding.
.B boards
Automatically generate masks binding tasks to boards.
If the number of tasks differs from the number of allocated boards
this can result in sub\-optimal binding. This option is supported by the
task/cgroup plugin only.
.TP
Show help message for cpu_bind
.TP
\fB\-\-cpu\-freq\fR =<\fIrequested frequency in kilohertz\fR>
Request that the job step initiated by this srun command be run at the
requested frequency if possible, on the CPUs selected for the step on
the compute node(s).
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
Acceptable values at present include:
.RS
.TP 14
\fBLow\fR
the lowest available frequency
.TP
\fBHigh\fR
the highest available frequency
.TP
\fBHighM1\fR
(high minus one) will select the next highest available frequency
.TP
\fBMedium\fR
attempts to set a frequency in the middle of the available range
.TP
\fBConservative\fR
attempts to use the Conservative CPU governor
.TP
\fBOnDemand\fR
attempts to use the OnDemand CPU governor (the default value)
.TP
\fBPerformance\fR
attempts to use the Performance CPU governor
.TP
\fBPowerSave\fR
attempts to use the PowerSave CPU governor
.RE
The following informational environment variable is set in the job
step when \fB\-\-cpu\-freq\fR option is requested.
.nf
SLURM_CPU_FREQ_REQ
.fi
This environment variable can also be used to supply the value for the
CPU frequency request if it is set when the 'srun' command is issued.
The \fB\-\-cpu\-freq\fR on the command line will override the
environment variable value. See the \fBENVIRONMENT VARIABLES\fR
section for a description of the SLURM_CPU_FREQ_REQ variable.
\fBNOTE\fR: This parameter is treated as a request, not a requirement.
If the job step's node does not support setting the CPU frequency, or
the requested value is outside the bounds of the legal frequencies, an
error is logged, but the job step is allowed to continue.
\fBNOTE\fR: Setting the frequency for just the CPUs of the job step
implies that the tasks are confined to those CPUs. If task
confinement (i.e., TaskPlugin=task/affinity or
TaskPlugin=task/cgroup with the "ConstrainCores" option) is not
configured, this parameter is ignored.
\fBNOTE\fR: When the step completes, the frequency and governor of each
selected CPU is reset to the configured \fBCpuFreqDef\fR value with a
default value of the OnDemand CPU governor.
\fBNOTE\fR: When submitting jobs with the \fB\-\-cpu\-freq\fR option
with linuxproc as the ProctrackType can cause jobs to run too quickly before
Accounting is able to poll for job information. As a result not all of
accounting information will be present.
\fB\-c\fR, \fB\-\-cpus\-per\-task\fR=<\fIncpus\fR>
Request that \fIncpus\fR be allocated \fBper process\fR. This may be
useful if the job is multithreaded and requires more than one CPU
per task for optimal performance. The default is one CPU per process.
If \fB\-c\fR is specified without \fB\-n\fR, as many
tasks will be allocated per node as possible while satisfying
the \fB\-c\fR restriction. For instance on a cluster with 8 CPUs
per node, a job request for 4 nodes and 3 CPUs per task may be
allocated 3 or 6 CPUs per node (1 or 2 tasks per node) depending
upon resource consumption by other jobs. Such a job may be
unable to execute more than a total of 4 tasks.
This option may also be useful to spawn tasks without allocating
resources to the job step from the job's allocation when running
multiple job steps with the \fB\-\-exclusive\fR option.
\fBWARNING\fR: There are configurations and options interpreted differently by
job and job step requests which can result in inconsistencies for this option.
For example \fIsrun \-c2 \-\-threads\-per\-core=1 prog\fR may allocate two
cores for the job, but if each of those cores contains two threads, the job
allocation will include four CPUs. The job step allocation will then launch two
threads per CPU for a total of two tasks.
\fBWARNING\fR: When srun is executed from within salloc or sbatch,
there are configurations and options which can result in inconsistent
allocations when \-c has a value greater than \-c on salloc or sbatch.
.TP
\fB\-d\fR, \fB\-\-dependency\fR=<\fIdependency_list\fR>
Defer the start of this job until the specified dependencies have been
satisfied completed.
<\fItype:job_id[:job_id][,type:job_id[:job_id]]\fR>.
Many jobs can share the same dependency and these jobs may even belong to
different users. The value may be changed after job submission using the
scontrol command.
.PD
.RS
.TP
\fBafter:job_id[:jobid...]\fR
This job can begin execution after the specified jobs have begun
execution.
.TP
\fBafterany:job_id[:jobid...]\fR
This job can begin execution after the specified jobs have terminated.
.TP
\fBafternotok:job_id[:jobid...]\fR
This job can begin execution after the specified jobs have terminated
in some failed state (non-zero exit code, node failure, timed out, etc).
.TP
\fBafterok:job_id[:jobid...]\fR
This job can begin execution after the specified jobs have successfully
executed (ran to completion with an exit code of zero).
.TP
\fBexpand:job_id\fR
Resources allocated to this job should be used to expand the specified job.
The job to expand must share the same QOS (Quality of Service) and partition.
Gang scheduling of resources in the partition is also not supported.
.TP
\fBsingleton\fR
This job can begin execution after any previously launched jobs
sharing the same job name and user have terminated.
.RE
\fB\-D\fR, \fB\-\-chdir\fR=<\fIpath\fR>
Have the remote processes do a chdir to \fIpath\fR before beginning
execution. The default is to chdir to the current working directory
of the \fBsrun\fR process. The path can be specified as full path or
relative path to the directory where the command is executed.
\fB\-e\fR, \fB\-\-error\fR=<\fImode\fR>
Specify how stderr is to be redirected. By default in interactive mode,
redirects stderr to the same file as stdout, if one is specified. The
\fB\-\-error\fR option is provided to allow stdout and stderr to be
redirected to different locations.
See \fBIO Redirection\fR below for more options.
If the specified file already exists, it will be overwritten.
.TP
\fB\-E\fR, \fB\-\-preserve-env\fR
Pass the current values of environment variables SLURM_NNODES and
SLURM_NTASKS through to the \fIexecutable\fR, rather than computing them
from commandline parameters.
\fB\-\-epilog\fR=<\fIexecutable\fR>
\fBsrun\fR will run \fIexecutable\fR just after the job step completes.
The command line arguments for \fIexecutable\fR will be the command
and arguments of the job step. If \fIexecutable\fR is "none", then
no srun epilog will be run. This parameter overrides the SrunEpilog
parameter in slurm.conf. This parameter is completely independent from
the Epilog parameter in slurm.conf.
.TP
\fB\-\-exclusive\fR
This option has two slightly different meanings for job and job step
allocations.
When used to initiate a job, the job allocation cannot share nodes with
other running jobs.
The default shared/exclusive behavior depends on system configuration and the
partition's \fBShared\fR option takes precedence over the job's option.
This option can also be used when initiating more than one job step within
an existing resource allocation, where you want separate processors to
be dedicated to each job step. If sufficient processors are not
available to initiate the job step, it will be deferred. This can
be thought of as providing a mechanism for resource management to the job
within it's allocation.
The exclusive allocation of CPUs only applies to job steps explicitly invoked
with the \fB\-\-exclusive\fR option.
For example, a job might be allocated one node with four CPUs and a remote
shell invoked on the allocated node. If that shell is not invoked with the
\fB\-\-exclusive\fR option, then it may create a job step with four tasks
using the \fB\-\-exclusive\fR option and not conflict with the remote shell's
resource allocation.
Use the \fB\-\-exclusive\fR option to invoke every job step to insure distinct
resources for each step.
Note that all CPUs allocated to a job are available
to each job step unless the \fB\-\-exclusive\fR option is used plus
task affinity is configured. Since resource management is provided by
processor, the \fB\-\-ntasks\fR option must be specified, but the
following options should NOT be specified
\fB\-\-relative\fR, \fB\-\-distribution\fR=\fIarbitrary\fR.
See \fBEXAMPLE\fR below.
.TP
\fB\-\-export\fR=<\fIenvironment variables | NONE\fR>
Identify which environment variables are propagated to the laucnhed application.
Multiple environment variable names should be comma separated.
Environment variable names may be specified to propagate the current
value of those variables (e.g. "\-\-export=EDITOR") or specific values
for the variables may be exported (e.g.. "\-\-export=EDITOR=/bin/vi")
in addition to the environment variables that would otherwise be set.
By default all environment variables are propagated.
\fB\-\-gid\fR=<\fIgroup\fR>
If \fBsrun\fR is run as root, and the \fB\-\-gid\fR option is used,
submit the job with \fIgroup\fR's group access permissions. \fIgroup\fR
may be the group name or the numerical group ID.
.TP
\fB\-\-gres\fR=<\fIlist\fR>
Specifies a comma delimited list of generic consumable resources.
The format of each entry on the list is "name[[:type]:count]".
The name is that of the consumable resource.
The count is the number of those resources with a default value of 1.
The specified resources will be allocated to the job on each node.
The available generic consumable resources is configurable by the system
administrator.
A list of available generic consumable resources will be printed and the
command will exit if the option argument is "help".
Examples of use include "\-\-gres=gpu:2,mic=1", "\-\-gres=gpu:kepler:2", and
"\-\-gres=help".
NOTE: By default, a job step is allocated all of the generic resources that
have allocated to the job. To change the behavior so that each job step is
allocated no generic resources, explicitly set the value of \-\-gres to specify
zero counts for each generic resource OR set "\-\-gres=none" OR set the
SLURM_STEP_GRES environment variable to "none".
.TP
\fB\-H, \-\-hold\fR
Specify the job is to be submitted in a held state (priority of zero).
A held job can now be released using scontrol to reset its priority
(e.g. "\fIscontrol release <job_id>\fR").
\fB\-h\fR, \fB\-\-help\fR
Display help information and exit.
\fB\-\-hint\fR=<\fItype\fR>
Bind tasks according to application hints
.RS
.TP
.B compute_bound
Select settings for compute bound applications:
use all cores in each socket, one thread per core
.TP
.B memory_bound
Select settings for memory bound applications:
use only one core in each socket, one thread per core
.TP
.B [no]multithread
[don't] use extra threads with in-core multi-threading
which can benefit communication intensive applications
.TP
.B help
show this help message
.RE
.TP
\fB\-I\fR, \fB\-\-immediate\fR[=<\fIseconds\fR>]
exit if resources are not available within the
time period specified.
If no argument is given, resources must be available immediately
for the request to succeed.
By default, \fB\-\-immediate\fR is off, and the command
will block until resources become available. Since this option's
argument is optional, for proper parsing the single letter option
must be followed immediately with the value and not include a
space between them. For example "\-I60" and not "\-I 60".
\fB\-i\fR, \fB\-\-input\fR=<\fImode\fR>
Specify how stdin is to redirected. By default,
.B srun
redirects stdin from the terminal all tasks. See \fBIO Redirection\fR
below for more options.
For OS X, the poll() function does not support stdin, so input from
a terminal is not possible.
.TP
\fB\-J\fR, \fB\-\-job\-name\fR=<\fIjobname\fR>
Specify a name for the job. The specified name will appear along with
the job id number when querying running jobs on the system. The default
is the supplied \fBexecutable\fR program's name. NOTE: This information
may be written to the slurm_jobacct.log file. This file is space delimited
so if a space is used in the \fIjobname\fR name it will cause problems in
properly displaying the contents of the slurm_jobacct.log file when the
\fBsacct\fR command is used.
\fB\-\-jobid\fR=<\fIjobid\fR>
Initiate a job step under an already allocated job with job id \fIid\fR.
Using this option will cause \fBsrun\fR to behave exactly as if the
SLURM_JOB_ID environment variable was set.
\fB\-K\fR, \fB\-\-kill\-on\-bad\-exit\fR[=0|1]
Controls whether or not to terminate a job if any task exits with a non\-zero
exit code. If this option is not specified, the default action will be based
upon the SLURM configuration parameter of \fBKillOnBadExit\fR. If this option
is specified, it will take precedence over \fBKillOnBadExit\fR. An option
argument of zero will not terminate the job. A non\-zero argument or no
argument will terminate the job.
Note: This option takes precedence over the \fB\-W\fR, \fB\-\-wait\fR option
to terminate the job immediately if a task exits with a non\-zero exit code.
Since this option's argument is optional, for proper parsing the
single letter option must be followed immediately with the value and
not include a space between them. For example "\-K1" and not "\-K 1".
.TP
\fB\-k\fR, \fB\-\-no\-kill\fR
Do not automatically terminate a job if one of the nodes it has been
allocated fails. This option is only recognized on a job allocation,
not for the submission of individual job steps.
The job will assume all responsibilities for fault\-tolerance.
Tasks launch using this option will not be considered terminated
(e.g. \fB\-K\fR, \fB\-\-kill\-on\-bad\-exit\fR and
\fB\-W\fR, \fB\-\-wait\fR options will have no effect upon the job step).
The active job step (MPI job) will likely suffer a fatal error,
but subsequent job steps may be run if this option is specified.
The default action is to terminate the job upon node failure.
.TP
\fB\-\-launch-cmd\fR
Print external launch command instead of running job normally through
SLURM. This option is only valid if using something other than the
\fIlaunch/slurm\fR plugin.
.TP
\fB\-\-launcher\-opts\fR=<\fIoptions\fR>
Options for the external launcher if using something other than the
\fIlaunch/slurm\fR plugin.
Prepend task number to lines of stdout/err.
The \fB\-\-label\fR option will prepend lines of output with the remote
\fB\-L\fR, \fB\-\-licenses\fR=<\fBlicense\fR>
Specification of licenses (or other resources available on all
nodes of the cluster) which must be allocated to this job.
License names can be followed by a colon and count
(the default count is one).
Multiple license names should be comma separated (e.g.
"\-\-licenses=foo:4,bar").
\fB\-m\fR, \fB\-\-distribution\fR=
<\fIblock\fR|\fIcyclic\fR|\fIarbitrary\fR|\fIplane=<options>\fR[:\fIblock\fR|\fIcyclic\fR]>
Specify alternate distribution methods for remote processes.
This option controls the assignment of tasks to the nodes on which
resources have been allocated, and the distribution of those resources
to tasks for binding (task affinity). The first distribution
method (before the ":") controls the distribution of resources across
nodes. The optional second distribution method (after the ":")
controls the distribution of resources across sockets within a node.
Note that with select/cons_res, the number of cpus allocated on each
socket and node may be different. Refer to
http://slurm.schedmd.com/mc_support.html
for more information on resource allocation, assignment of tasks to
nodes, and binding of tasks to CPUs.
First distribution method:
The block distribution method will distribute tasks to a node such
that consecutive tasks share a node. For example, consider an
allocation of three nodes each with two cpus. A four\-task block
distribution request will distribute those tasks to the nodes with
tasks one and two on the first node, task three on the second node,
and task four on the third node. Block distribution is the default
behavior if the number of tasks exceeds the number of allocated nodes.
The cyclic distribution method will distribute tasks to a node such
that consecutive tasks are distributed over consecutive nodes (in a
round\-robin fashion). For example, consider an allocation of three
nodes each with two cpus. A four\-task cyclic distribution request
will distribute those tasks to the nodes with tasks one and four on
the first node, task two on the second node, and task three on the
third node.
Note that when SelectType is select/cons_res, the same number of CPUs
may not be allocated on each node. Task distribution will be
round\-robin among all the nodes with CPUs yet to be assigned to tasks.
Cyclic distribution is the default behavior if the number
of tasks is no larger than the number of allocated nodes.
The tasks are distributed in blocks of a specified size. The options
include a number representing the size of the task block. This is
followed by an optional specification of the task distribution scheme
within a block of tasks and between the blocks of tasks. The number of tasks
distributed to each node is the same as for cyclic distribution, but the
taskids assigned to each node depend on the plane size. For more
details (including examples and diagrams), please see
.br
http://slurm.schedmd.com/mc_support.html
.br
and
.br
http://slurm.schedmd.com/dist_plane.html
.B arbitrary
The arbitrary method of distribution will allocate processes in\-order
as listed in file designated by the environment variable
SLURM_HOSTFILE. If this variable is listed it will over ride any
other method specified. If not set the method will default to block.
Inside the hostfile must contain at minimum the number of hosts
requested and be one per line or comma separated. If specifying a
task count (\fB\-n\fR, \fB\-\-ntasks\fR=<\fInumber\fR>), your tasks
will be laid out on the nodes in the order of the file.
.br
\fBNOTE:\fR The arbitrary distribution option on a job allocation only
controls the nodes to be allocated to the job and not the allocation of
CPUs on those nodes. This option is meant primarily to control a job step's
task layout in an existing job allocation for the srun command.
.TP
Second distribution method:
.TP
.B block
The block distribution method will distribute tasks to sockets such
that consecutive tasks share a socket.
.TP
.B cyclic
The cyclic distribution method will distribute tasks to sockets such
that consecutive tasks are distributed over consecutive sockets (in a
round\-robin fashion).
\fB\-\-mail\-type\fR=<\fItype\fR>
Notify user by email when certain event types occur.
Valid \fItype\fR values are BEGIN, END, FAIL, REQUEUE, ALL (equivalent to
BEGIN, END, FAIL and REQUEUE), TIME_LIMIT, TIME_LIMIT_90 (reached 90 percent of
time limit), TIME_LIMIT_80 (reached 80 percent of time limit), and TIME_LIMIT_50
(reached 50 percent of time limit).
Multiple \fItype\fR values may be specified in a comma separated list.
The user to be notified is indicated with \fB\-\-mail\-user\fR.
\fB\-\-mail\-user\fR=<\fIuser\fR>
User to receive email notification of state changes as defined by
\fB\-\-mail\-type\fR.
The default value is the submitting user.
\fB\-\-mem\fR=<\fIMB\fR>
Specify the real memory required per node in MegaBytes.
Default value is \fBDefMemPerNode\fR and the maximum value is
\fBMaxMemPerNode\fR. If configured, both of parameters can be
seen using the \fBscontrol show config\fR command.
This parameter would generally be used if whole nodes
are allocated to jobs (\fBSelectType=select/linear\fR).
Specifying a memory limit of zero for a job step will restrict the job step
to the amount of memory allocated to the job, but not remove any of the job's
memory allocation from being available to other job steps.
Also see \fB\-\-mem\-per\-cpu\fR.
\fB\-\-mem\fR and \fB\-\-mem\-per\-cpu\fR are mutually exclusive.
NOTE: Enforcement of memory limits currently relies upon the task/cgroup plugin
or enabling of accounting, which samples memory use on a periodic basis (data
need not be stored, just collected). In both cases memory use is based upon
the job's Resident Set Size (RSS). A task may exceed the memory limit until
the next periodic accounting sample.
.TP
\fB\-\-mem\-per\-cpu\fR=<\fIMB\fR>
Minimum memory required per allocated CPU in MegaBytes.
Default value is \fBDefMemPerCPU\fR and the maximum value is \fBMaxMemPerCPU\fR
(see exception below). If configured, both of parameters can be
seen using the \fBscontrol show config\fR command.
Note that if the job's \fB\-\-mem\-per\-cpu\fR value exceeds the configured
\fBMaxMemPerCPU\fR, then the user's limit will be treated as a memory limit
per task; \fB\-\-mem\-per\-cpu\fR will be reduced to a value no larger than
\fBMaxMemPerCPU\fR; \fB\-\-cpus\-per\-task\fR will be set and value of
\fB\-\-cpus\-per\-task\fR multiplied by the new \fB\-\-mem\-per\-cpu\fR
value will equal the original \fB\-\-mem\-per\-cpu\fR value specified by
the user.
This parameter would generally be used if individual processors
are allocated to jobs (\fBSelectType=select/cons_res\fR).
If resources are allocated by the core, socket or whole nodes; the number
of CPUs allocated to a job may be higher than the task count and the value
of \fB\-\-mem\-per\-cpu\fR should be adjusted accordingly.
Specifying a memory limit of zero for a job step will restrict the job step
to the amount of memory allocated to the job, but not remove any of the job's
memory allocation from being available to other job steps.
Also see \fB\-\-mem\fR.
\fB\-\-mem\fR and \fB\-\-mem\-per\-cpu\fR are mutually exclusive.
\fB\-\-mem_bind\fR=[{\fIquiet,verbose\fR},]\fItype\fR
Bind tasks to memory. Used only when the task/affinity plugin is enabled
and the NUMA memory functions are available.
\fBNote that the resolution of CPU and memory binding
may differ on some architectures.\fR For example, CPU binding may be performed
at the level of the cores within a processor while memory binding will
be performed at the level of nodes, where the definition of "nodes"
may differ from system to system. \fBThe use of any type other than
"none" or "local" is not recommended.\fR
If you want greater control, try running a simple test code with the
options "\-\-cpu_bind=verbose,none \-\-mem_bind=verbose,none" to determine
the specific configuration.
NOTE: To have SLURM always report on the selected memory binding for
all commands executed in a shell, you can enable verbose mode by
setting the SLURM_MEM_BIND environment variable value to "verbose".
The following informational environment variables are set when
\fB\-\-mem_bind\fR is in use:
SLURM_MEM_BIND_VERBOSE
SLURM_MEM_BIND_TYPE
SLURM_MEM_BIND_LIST
.fi
See the \fBENVIRONMENT VARIABLES\fR section for a more detailed description
of the individual SLURM_MEM_BIND* variables.
Supported options include:
.RS
.B q[uiet]
quietly bind before task runs (default)
.B v[erbose]
verbosely report binding before task runs
.B no[ne]
don't bind tasks to memory (default)
.B rank
bind by task rank (not recommended)
.B local
Use memory local to the processor in use
.B map_mem:<list>
bind by mapping a node's memory to tasks as specified
where <list> is <cpuid1>,<cpuid2>,...<cpuidN>.
CPU IDs are interpreted as decimal values unless they are preceded
with '0x' in which case they interpreted as hexadecimal values
(not recommended)
.TP
.B mask_mem:<list>
bind by setting memory masks on tasks as specified
where <list> is <mask1>,<mask2>,...<maskN>.
memory masks are \fBalways\fR interpreted as hexadecimal values.
Note that masks must be preceded with a '0x' if they don't begin
with [0-9] so they are seen as numerical values by srun.
.TP
.B help
show this help message
.RE
.TP
\fB\-\-mincpus\fR=<\fIn\fR>
Specify a minimum number of logical cpus/processors per node.
\fB\-\-msg\-timeout\fR=<\fIseconds\fR>
The default value is \fBMessageTimeout\fR in the SLURM configuration file slurm.conf.
Changes to this are typically not recommended, but could be useful to diagnose problems.
\fB\-\-mpi\fR=<\fImpi_type\fR>
Identify the type of MPI to be used. May result in unique initiation
procedures.
.RS
.TP
.B list
.TP
.B lam
Initiates one 'lamd' process per node and establishes necessary
environment variables for LAM/MPI.
.TP
.B mpich1_shmem
Initiates one process per node and establishes necessary
environment variables for mpich1 shared memory model.
This also works for mvapich built for shared memory.
.B mpichgm
For use with Myrinet.
.TP
.B mvapich
For use with Infiniband.
.TP
.B openmpi
For use with OpenMPI.
.TP
.B pmi2
To enable PMI2 support. The PMI2 support in Slurm works only if the MPI
implementation supports it, in other words if the MPI has the PMI2
interface implemented. The \-\-mpi=pmi2 will load the library
lib/slurm/mpi_pmi2.so which provides the server side functionality but
the client side must implement PMI2_Init() and the other interface calls.
.TP
No special MPI processing. This is the default and works with
many other versions of MPI.
.RE
.TP
\fB\-\-multi\-prog\fR
Run a job with different programs and different arguments for
each task. In this case, the executable program specified is
actually a configuration file specifying the executable and
arguments for each task. See \fBMULTIPLE PROGRAM CONFIGURATION\fR
below for details on the configuration file contents.