From d6de68d7b76d0ed84c4a7493dd047efc0afa4964 Mon Sep 17 00:00:00 2001 From: Skyler Malinowski <malinowski@schedmd.com> Date: Mon, 5 Apr 2021 14:05:00 -0400 Subject: [PATCH] Docs - Documented UnkillableStepProgram environment variables UnkillableStepProgram exposes environment variables to the script. Those variables are now documented. Bug 11231 --- doc/man/man5/slurm.conf.5 | 22 ++++++++++++++++++---- 1 file changed, 18 insertions(+), 4 deletions(-) diff --git a/doc/man/man5/slurm.conf.5 b/doc/man/man5/slurm.conf.5 index c14337dabe6..390c9b2f567 100644 --- a/doc/man/man5/slurm.conf.5 +++ b/doc/man/man5/slurm.conf.5 @@ -4290,16 +4290,15 @@ systems. The value may not exceed 65533. If the processes in a job step are determined to be unkillable for a period of time specified by the \fBUnkillableStepTimeout\fR variable, the program specified by \fBUnkillableStepProgram\fR will be executed. -This program can be used to take special actions to clean up the unkillable -processes and/or notify computer administrators. -The program will be run \fBSlurmdUser\fR (usually "root") on the compute node. By default no program is run. +See section \fBUNKILLABLE STEP PROGRAM SCRIPT\fR for more information. + .TP \fBUnkillableStepTimeout\fR The length of time, in seconds, that Slurm will wait before deciding that processes in a job step are unkillable (after they have been signaled with -SIGKILL) and execute \fBUnkillableStepProgram\fR as described above. +SIGKILL) and execute \fBUnkillableStepProgram\fR. The default timeout value is 60 seconds. If exceeded, the compute node will be drained to prevent future jobs from being scheduled on the node. @@ -5627,6 +5626,21 @@ User name of the job's owner. \fBSLURM_SCRIPT_CONTEXT\fR Identifies which epilog or prolog program is currently running. +.SH "UNKILLABLE STEP PROGRAM SCRIPT" +This program can be used to take special actions to clean up the unkillable +processes and/or notify system administrators. +The program will be run as \fBSlurmdUser\fR (usually "root") on the compute +node where \fBUnkillableStepTimeout\fR was triggered. + +Information about the unkillable job step is passed to the script using +environment variables. +.TP +\fBSLURM_JOB_ID\fR +Job ID. +.TP +\fBSLURM_STEP_ID\fR +Job Step ID. + .SH "NETWORK TOPOLOGY" Slurm is able to optimize job allocations to minimize network contention. Special Slurm logic is used to optimize allocations on systems with a -- GitLab