Skip to content
Snippets Groups Projects
Commit d6de68d7 authored by Skyler Malinowski's avatar Skyler Malinowski Committed by Ben Roberts
Browse files

Docs - Documented UnkillableStepProgram environment variables

UnkillableStepProgram exposes environment variables to the script.
Those variables are now documented.

Bug 11231
parent 943ad500
No related branches found
No related tags found
No related merge requests found
...@@ -4290,16 +4290,15 @@ systems. The value may not exceed 65533. ...@@ -4290,16 +4290,15 @@ systems. The value may not exceed 65533.
If the processes in a job step are determined to be unkillable for a period If the processes in a job step are determined to be unkillable for a period
of time specified by the \fBUnkillableStepTimeout\fR variable, the program of time specified by the \fBUnkillableStepTimeout\fR variable, the program
specified by \fBUnkillableStepProgram\fR will be executed. specified by \fBUnkillableStepProgram\fR will be executed.
This program can be used to take special actions to clean up the unkillable
processes and/or notify computer administrators.
The program will be run \fBSlurmdUser\fR (usually "root") on the compute node.
By default no program is run. By default no program is run.
See section \fBUNKILLABLE STEP PROGRAM SCRIPT\fR for more information.
.TP .TP
\fBUnkillableStepTimeout\fR \fBUnkillableStepTimeout\fR
The length of time, in seconds, that Slurm will wait before deciding that The length of time, in seconds, that Slurm will wait before deciding that
processes in a job step are unkillable (after they have been signaled with processes in a job step are unkillable (after they have been signaled with
SIGKILL) and execute \fBUnkillableStepProgram\fR as described above. SIGKILL) and execute \fBUnkillableStepProgram\fR.
The default timeout value is 60 seconds. The default timeout value is 60 seconds.
If exceeded, the compute node will be drained to prevent future jobs from being If exceeded, the compute node will be drained to prevent future jobs from being
scheduled on the node. scheduled on the node.
...@@ -5627,6 +5626,21 @@ User name of the job's owner. ...@@ -5627,6 +5626,21 @@ User name of the job's owner.
\fBSLURM_SCRIPT_CONTEXT\fR \fBSLURM_SCRIPT_CONTEXT\fR
Identifies which epilog or prolog program is currently running. Identifies which epilog or prolog program is currently running.
.SH "UNKILLABLE STEP PROGRAM SCRIPT"
This program can be used to take special actions to clean up the unkillable
processes and/or notify system administrators.
The program will be run as \fBSlurmdUser\fR (usually "root") on the compute
node where \fBUnkillableStepTimeout\fR was triggered.
Information about the unkillable job step is passed to the script using
environment variables.
.TP
\fBSLURM_JOB_ID\fR
Job ID.
.TP
\fBSLURM_STEP_ID\fR
Job Step ID.
.SH "NETWORK TOPOLOGY" .SH "NETWORK TOPOLOGY"
Slurm is able to optimize job allocations to minimize network contention. Slurm is able to optimize job allocations to minimize network contention.
Special Slurm logic is used to optimize allocations on systems with a Special Slurm logic is used to optimize allocations on systems with a
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment