Newer
Older
.TH SCONTROL "1" "September 2005" "scontrol 0.7" "Slurm components"
.SH "NAME"
scontrol \- Used view and modify Slurm configuration and state.
.SH "SYNOPSIS"
\fBscontrol\fR [\fIOPTIONS\fR...] [\fICOMMAND\fR...]
.SH "DESCRIPTION"
\fBscontrol\fR is used to view or modify Slurm configuration including: job,
job step, node, partition, and overall system configuration. Most of the
commands can only be executed by user root. If an attempt to view or modify
configuration information is made by an unauthorized user, an error message
will be printed and the requested action will not occur. If no command is
entered on the execute line, \fBscontrol\fR will operate in an interactive
mode and prompt for input. It will continue prompting for input and executing
commands until explicitly terminated. If a command is entered on the execute
line, \fBscontrol\fR will execute that command and terminate. All commands
and options are case-insensitive, although node names and partition names
are case-sensitive (node names "LX" and "lx" are distinct). Commands can
be abbreviated to the extent that the specification is unique.
\fB\-a\fR, \fB\-\-all\fR
When the \fIshow\fR command is used, then display all partitions, their jobs
and jobs steps. This causes information to be displayed about partitions
that are configured as hidden and partitions that are unavailable to user's
group.
Print a help message describing the usage of scontrol.
.TP
\fB\-\-hide\fR
Do not display information about hidden partitions, their jobs and job steps.
By default, neither partitions that are configured as hidden nor those partitions
unavailable to user's group will be displayed (i.e. this is the default behavior).
.TP
\fB\-o\fR, \fB\-\-oneliner\fR
Print information one line per record.
.TP
\fB\-q\fR, \fB\-\-quiet\fR
Print no warning or informational messages, only fatal error messages.
.TP
Print detailed event logging. This includes time-stamps on data structures,
record counts, etc.
.TP
\fB\-V\fR , \fB\-\-version\fR
Print version information and exit.
.TP
\fIall\fP
Show all partitiion, their jobs and jobs steps. This causes information to be
displayed about partitions that are configured as hidden and partitions that
are unavailable to user's group.
.TP
\fIabort\fP
Instruct the Slurm controller to terminate immediately and generate a core file.
.TP
\fIcheckpoint\fP \fICKPT_OP\fP \fIID\fP
Perform a checkpoint activity on the job step(s) with the specified identification.
\fICKPT_OP\fP may be
\fIdisable\fP (disable future checkpoints),
\fIenable\fP (enable future checkpoints),,
\fIable\fP (test if presently not disabled, report start time if checkpoint in progress),
\fIcreate\fP (create a checkpoint and continue the job step),
\fIvacate\fP (create a checkpoint and terminate the job step),
\fIerror\fP (report the result for the last checkpoint request, error code and message), or
\fIrestart\fP (restart execution of the previously checkpointed job steps).
\fIID\fP can be used to identify a specific job (e.g. "<job_id>",
which applies to all of its existing steps)
or a specific job step (e.g. "<job_id>.<step_id>").
.TP
\fIcompleting\fP
Display all jobs in a COMPLETING state along with associated nodes in either a
COMPLETING or DOWN state.
.TP
\fIdelete\fP \fISPECIFICATION\fP
Delete the entry with the specified \fISPECIFICATION\fP.
The only supported \fISPECIFICATION\fP presently is of the form
\fIPartitionName=<name>\fP.
.TP
\fIexit\fP
Terminate the execution of scontrol.
.TP
\fIhelp\fP
Display a description of scontrol options and commands.
.TP
\fIhide\fP
Do not display partitiion, job or jobs step information for partitions that are
configured as hidden or partitions that are unavailable to the user's group.
This is the default behavior.
.TP
\fIoneliner\fP
Print information one line per record.
.TP
\fIpidinfo\fP \fIPROC_ID\fP
Print the Slurm job id and scheduled termination time corresponding to the
supplied process id, \fIPROC_ID\fP, on the current node.
\fIping\fP
Ping the primary and secondary slurmctld daemon and report if
they are responding.
.TP
\fIquiet\fP
Print no warning or informational messages, only fatal error messages.
.TP
\fIquit\fP
Terminate the execution of scontrol.
.TP
\fIreconfigure\fP
Instruct all Slurm daemons to re-read the configuration file.
This command does not restart the daemons.
This mechanism would be used to modify configuration parameters (Epilog,
Prolog, SlurmctldLogFile, SlurmdLogFile, etc.) register the physical
addition or removal of nodes from the cluster or recognize the change
of a node's configuration, such as the addition of memory or processors.
The Slurm controller (slurmctld) forwards the request all other daemons
(slurmd daemon on each compute node). Running jobs continue execution.
Most configuration parameters can be changed by just running this command,
however, SLURM daemons should be shutdown and restarted if any of these
parameters are to be changed: AuthType, BackupAddr, BackupController,
ControlAddr, ControlMach, PluginDir, StateSaveLocation, SlurmctldPort
or SlurmdPort.
Display the state of the specified entity with the specified identification.
\fIENTITY\fP may be \fIconfig\fP, \fIdaemons\fP, \fIjob\fP, \fInode\fP,
\fIpartition\fP or \fIstep\fP.
\fIID\fP can be used to identify a specific element of the identified
entity: the configuration parameter name, job ID, node name, partition name,
or job step ID for entities \fIconfig\fP, \fIjob\fP, \fInode\fP, \fIpartition\fP,
and \fIstep\fP respectively.
Multiple node names may be specified using simple node range expressions
(e.g. "lx[10-20]"). All other \fIID\fP values must identify a single
element. The job step ID is of the form "job_id.step_id", (e.g. "1234.1").
By default, all elements of the entity type specified are printed.
Instruct all Slurm daemons to save current state and terminate.
The Slurm controller (slurmctld) forwards the request all other daemons
(slurmd daemon on each compute node).
\fIupdate\fP \fISPECIFICATION\fP
Update job, node or partition configuration per the supplied specification.
\fISPECIFICATION\fP is in the same format as the Slurm configuration file
and the output of the \fIshow\fP command described above. It may be desirable
to execute the \fIshow\fP command (described above) on the specific entity
you which to update, then use cut-and-paste tools to enter updated configuration
values to the \fIupdate\fP. Note that while most configuration values can be
changed using this command, not all can be changed using this mechanism. In
particular, the hardware configuration of a node or the physical addition or
removal of nodes from the cluster may only be accomplished through editing
the Slurm configuration file and executing the \fIreconfigure\fP command
(described above).
.TP
\fIverbose\fP
Print detailed event logging. This includes time-stamps on data structures,
record counts, etc.
.TP
\fIversion\fP
Display the version number of scontrol being executed.
.TP
\fI!!\fP
Repeat the last command executed.
.TP
\fBSPECIFICATIONS FOR UPDATE COMMAND, JOBS\fR
.TP
\fIAccount\fP=<account>
Account name to be changed for this job's resource use.
Value may be cleared with blank data value, "Account=".
.TP
\fIContiguous\fP=<yes|no>
Set the job's requirement for contiguous (consecutive) nodes to be allocated.
Possible values are"YES" and "NO".
.TP
\fIDependency\fP=<job_id>
Defer job's initiation until specified job_id completes.
Cancel dependency with job_id value of "0", "Depedency=0".
.TP
\fIFeatures\fP=<features>
Set the job's required features on nodes specified value. Multiple values
may be comma separated if all features are required (AND operation) or
separated by "|" if any of the specified features are required (OR operation).
Value may be cleared with blank data value, "Features=".
.TP
\fIJobId\fP=<id>
Identify the job to be updated. This specification is required.
.TP
\fIMinMemory\fP=<megabytes>
Set the job's minimum real memory required per nodes to the specified value.
.TP
\fIMinProcs\fP=<count>
Set the job's minimum number of processors per nodes to the specified value.
.TP
\fIMinTmpDisk\fP=<megabytes>
Set the job's minimum temporary disk space required per nodes to the specified value.
.TP
\fIName\fP=<name>
Set the job's name to the specified value.
.TP
\fIPartition\fP=<name>
Set the job's partition to the specified value.
.TP
\fIPriority\fP=<minutes>
Set the job's priority to the specified value.
.TP
\fIReqNodeList\fP=<nodes>
Set the job's list of required node. Multiple node names may be specified using
simple node range expressions (e.g. "lx[10-20]").
Value may be cleared with blank data value, "ReqNodeList=".
.TP
\fIReqNodes\fP=<count>
Set the job's count of required nodes to the specified value.
.TP
\fIReqProcs\fP=<count>
Set the job's count of required processors to the specified value.
.TP
\fIShared\fP=<yes|no>
Set the job's ability to share nodes with other jobs. Possible values are
"YES" and "NO".
.TP
\fIStartTime\fP=<time_spec>
Set the job's earliest initiation time.
It accepts times of the form \fIHH:MM:SS\fR to run a job at
a specific time of day (seconds are optional).
(If that time is already past, the next day is assumed.)
You may also specify \fImidnight\fR, \fInoon\fR, or
\fIteatime\fR (4pm) and you can have a time-of-day suffixed
with \fIAM\fR or \fIPM\fR for running in the morning or the evening.
You can also say what day the job will be run, by giving
a date in the form \fImonth-name\fR day with an optional year,
or giving a date of the form \fIMMDDYY\fR or \fIMM/DD/YY\fR
or \fIDD.MM.YY\fR. You can also
give times like \fInow + count time-units\fR, where the time-units
can be \fIminutes\fR, \fIhours\fR, \fIdays\fR, or \fIweeks\fR
and you can tell SLURM to run the job today with the keyword
\fItoday\fR and to run the job tomorrow with the keyword
\fItomorrow\fR.
.TP
\fITimeLimit\fP=<minutes>
Set the job's time limit to the specified value.
.TP
\fIConnection\fP=<type>
Reset the node connection type.
Possible values on Blue Gene are "MESH", "TORUS" and "NAV"
(mesh else torus).
.TP
\fIGeometry\fP=<geo>
Reset the required job geometry.
On Blue Gene the value should be three digits separated by
"x" or ",". The digits represent the allocation size in
X, Y and Z dimentions (e.g. "2x3x4").
.TP
\fINodeUse\fP=<use>
Reset the mode of node use.
Possible values on Blue Gene are "VIRTUAL" and "COPROCESSOR".
.TP
\fIRotate\fP=<yes|no>
Permit the job's geometry to be rotated.
Possible values are "YES" and "NO".
.TP
\fBSPECIFICATIONS FOR UPDATE COMMAND, NODES\fR
.TP
\fINodeName\fP=<name>
Identify the node(s) to be updated. Multiple node names may be specified using
simple node range expressions (e.g. "lx[10-20]"). This specification is required.
\fIReason\fP=<reason>
Identify the reason the node is in a "DOWN" or "DRAINED" or "DRAINING" state.
Use quotes to enclose a reason having more than one word.
.TP
Identify the state to be assigned to the node. Possible values are "NoResp",
"RESUME", "DOWN", "IDLE", "DRAIN", "DRAINED", "DRAINING", and "ALLOCATED".
To drain a node specify "DRAIN", "DRAINED", or "DRAINING".
SLURM will automatically set it to the appropriate value of either
"DRAINING" or "DRAINED" depending if the node is allocated or not.
"RESUME is not an actual node state, but will return a DRAINED, DRAINING,
or DOWN node to service, either IDLE or ALLOCATED state as appropriate.
The "NoResp" state will only set the "NoResp" flag for a node without
changing its underlying state.
\fBSPECIFICATIONS FOR UPDATE AND DELETE COMMANDS, PARTITIONS\fR
.TP
\fIAllowGroups\fP=<name>
Identify the user groups which may use this partition.
Multiple groups may be specified in a comma separated list.
To permit all groups to use the partition specify "AllowGroups=ALL".
.TP
\fIDefault\fP=<yes|no>
Specify if this partition is to be used by jobs which do not explicitly
identify a partition to use. Possible values are"YES" and "NO".
.TP
\fIHidden\fP=<yes|no>
Specify if the partition and its jobs should be hidden from view.
Hidden partitions will by default not be reported by SLURM APIs
or commands.
Possible values are"YES" and "NO".
.TP
\fINodes\fP=<name>
Identify the node(s) to be associated with this partition. Multiple node names
may be specified using simple node range expressions (e.g. "lx[10-20]").
Note that jobs may only be associated with one partition at any time.
Specify a blank data value to remove all nodes from a partition: "Nodes=".
.TP
\fIPartitionName\fP=<name>
Identify the partition to be updated. This specification is required.
.TP
\fIRootOnly\fP=<yes|no>
Specify if only allocation requests initiated by user root will be satisfied.
This can be used to restrict control of the partition to some meta-scheduler.
Possible values are"YES" and "NO".
.TP
\fIShared\fP=<yes|no|force>
Specify if nodes in this partition can be shared by multiple jobs.
Possible values are"YES", "NO" and "FORCE".
.TP
\fIState\fP=<up|down>
Specify if jobs can be allocated nodes in this partition.
Possible values are"UP" and "DOWN".
If a partition allocated nodes to running jobs, those jobs will continue
execution even after the partition's state is set to "DOWN". The jobs
.TP
\fIMaxNodes\fP=<count>
Set the maximum number of nodes which will be allocated to any single job
in the partition. Specify a number or "INFINITE".
.TP
\fIMinNodes\fP=<count>
Set the minimum number of nodes which will be allocated to any single job
in the partition.
.SH "ENVIRONMENT VARIABLES"
.PP
Some \fBscontrol\fR options may
be set via environment variables. These environment variables,
along with their corresponding options, are listed below. (Note:
Commandline options will always override these settings.)
.TP 20
\fBSCONTROL_ALL\fR
\fB\-a, \-\-all\fR
.TP
\fBSLURM_CONF\fR
The location of the SLURM configuration file.
.SH "EXAMPLE"
.eo
.br
# scontrol
.br
scontrol: show part class
.br
PartitionName=class TotalNodes=10 TotalCPUs=20 RootOnly=NO
Default=NO Shared=NO State=UP MaxTime=30 Hidden=NO
.br
MinNodes=1 MaxNodes=2 AllowGroups=students
Nodes=lx[0031-0040] NodeIndices=31,40,-1
scontrol: update PartitionName=class MaxTime=99 MaxNodes=4
JobId=65539 UserId=1500 JobState=PENDING TimeLimit=100
Priority=100 Partition=batch Name=job01 NodeList=(null)
StartTime=0 EndTime=0 Shared=0 ReqProcs=1000
.br
ReqNodes=400 Contiguous=1 MinProcs=4 MinMemory=1024
MinTmpDisk=2034ReqNodeList=lx[3000-3003]
.br
Features=(null) JobScript=/bin/hostname
.br
scontrol: update JobId=65539 TimeLimit=200 Priority=500
.br
scontrol: quit
.ec
.SH "COPYING"
Copyright (C) 2002 The Regents of the University of California.
Produced at Lawrence Livermore National Laboratory (cf, DISCLAIMER).
UCRL-CODE-2002-040.
.LP
This file is part of SLURM, a resource management program.
For details, see <http://www.llnl.gov/linux/slurm/>.
.LP
SLURM is free software; you can redistribute it and/or modify it under
the terms of the GNU General Public License as published by the Free
Software Foundation; either version 2 of the License, or (at your option)
any later version.
.LP
SLURM is distributed in the hope that it will be useful, but WITHOUT ANY
WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
FOR A PARTICULAR PURPOSE. See the GNU General Public License for more
details.
.SH "FILES"
.LP
/etc/slurm.conf
\fBscancel\fR(1), \fBsinfo\fR(1), \fBsqueue\fR(1),
\fBslurm_checkpoint\fR(3),
\fBslurm_delete_partition\fR(3),
\fBslurm_load_ctl_conf\fR(3),
\fBslurm_load_jobs\fR(3), \fBslurm_load_node\fR(3),
\fBslurm_reconfigure\fR(3), \fBslurm_shutdown\fR(3),
\fBslurm_update_job\fR(3), \fBslurm_update_node\fR(3),
\fBslurm_update_partition\fR(3),