Newer
Older
This file describes changes in recent versions of SLURM. It primarily
documents those changes that are of interest to users and admins.
-- With sched/wiki or sched/wiki2 (Maui or Moab scheduler), insure that a
requeued job's priority is reset to zero.
-- BLUEGENE - fix to run steps correctly in a BGL/P emulated system.
-- Fixed issue where if there was a network issue between the slurmctld and
the DBD where both remained up but were disconnected the slurmctld would
get registered again with the DBD.
-- Fixed issue where if the DBD connection from the ctld goes away because of
a POLLERR the dbd_fail callback is called.
-- BLUEGENE - Fix to smap command-line mode display.
-- Change in GRES behavior for job steps: A job step's default generic
resource allocation will be set to that of the job. If a job step's --gres
value is set to "none" then none of the generic resources which have been
allocated to the job will be allocated to the job step.
-- Add srun environment value of SLURM_STEP_GRES to set default --gres value
for a job step.
-- Require SchedulerTimeSlice configuration parameter to be at least 5 seconds
to avoid thrashing slurmd daemon.
-- Cray - Fix to make nodes state in accounting consistent with state set by
ALPS.
-- Cray - A node DOWN to ALPS will be marked DOWN to SLURM only after reaching
SlurmdTimeout. In the interim, the node state will be NO_RESPOND. This
change makes behavior makes SLURM handling of the node DOWN state more
consistent with ALPS. This change effects only Cray systems.
-- Cray - Fix to work with 4.0.* instead of just 4.0.0
-- Cray - Modify srun/aprun wrapper to map --exclusive to -F exclusive and
--share to -F share. Note this does not consider the partition's Shared
configuration, so it is an imperfect mapping of options.
-- BLUEGENE - Added notice in the print config to tell if you are emulated
or not.
-- BLUEGENE - Fix job step scalability issue with large task count.
-- BGQ - Improved c-node selection when asked for a sub-block job that
cannot fit into the available shape.
-- BLUEGENE - Modify "scontrol show step" to show I/O nodes (BGL and BGP) or
c-nodes (BGQ) allocated to each step. Change field name from "Nodes=" to
"BP_List=".
-- Code cleanup on step request to get the correct select_jobinfo.
-- Memory leak fixed for rolling up accounting with down clusters.
-- BGQ - fix issue where if first job step is the entire block and then the
next parallel step is ran on a sub block, SLURM won't over subscribe cnodes.
-- Treat duplicate switch name in topology.conf as fatal error. Patch from Rod
Schultz, Bull
-- Minor update to documentation describing the AllowGroups option for a
partition in the slurm.conf.
-- Fix problem with _job_create() when not using qos's. It makes
_job_create() consistent with similar logic in select_nodes().
-- Fix for squeue -t "CONFIGURING" to actually work.
-- CRAY - Add cray.conf parameter of SyncTimeout, maximum time to defer job
scheduling if SLURM node or job state are out of synchronization with ALPS.
-- If salloc was run as interactive, with job control, reset the foreground
process group of the terminal to the process group of the parent pid before
exiting. Patch from Don Albert, Bull.
-- BGQ - set up the corner of a sub block correctly based on a relative
position in the block instead of absolute.
-- BGQ - make sure the recently added select_jobinfo of a step launch request
isn't sent to the slurmd where environment variables would be overwritten
incorrectly.
-- NOTE THERE HAVE BEEN NEW FIELDS ADDED TO THE JOB AND PARTITION STATE SAVE
FILES AND RPCS. PENDING AND RUNNING JOBS WILL BE LOST WHEN UPGRADING FROM
EARLIER VERSION 2.3 PRE-RELEASES AND RPCS WILL NOT WORK WITH EARLIER
VERSIONS.
-- select/cray: Add support for Accelerator information including model and
memory options.
-- Cray systems: Add support to suspend/resume salloc command to insure that
aprun does not get initiated when the job is suspended. Processes suspended
and resumed are determined by using process group ID and parent process ID,
so some processes may be missed. Since salloc runs as a normal user, it's
ability to identify processes associated with a job is limited.
-- Cray systems: Modify smap and sview to display all nodes even if multiple
nodes exist at each coordinate.
-- Improve efficiency of select/linear plugin with topology/tree plugin
configured, Patch by Andriy Grytsenko (Massive Solutions Limited).
-- For front-end architectures on which job steps are run (emulated Cray and
BlueGene systems only), fix bug that would free memory still in use.
-- Add squeue support to display a job's license information. Patch by Andy
Roosen (University of Deleware).
-- Add flag to the select APIs for job suspend/resume indicating if the action
is for gang scheduling or an explicit job suspend/resume by the user. Only
an explicit job suspend/resume will reset the job's priority and make
resources exclusively held by the job available to other jobs.
-- Fix possible invalid memory reference in sched/backfill. Patch by Andriy
Grytsenko (Massive Solutions Limited).
-- Add select_jobinfo to the task launch RPC. Based upon patch by Andriy
Grytsenko (Massive Solutions Limited).
-- Add DefMemPerCPU/Node and MaxMemPerCPU/Node to partition configuration.
This improves flexibility when gang scheduling only specific partitions.
-- Added new enums to print out when a job is held by a QOS instead of an
association limit.
-- Enhancements to sched/backfill performance with select/cons_res plugin.
Patch from Bjørn-Helge Mevik, University of Oslo.
-- Correct job run time reported by smap for suspended jobs.
-- Improve job preemption logic to avoid preempting more jobs than needed.
-- Add contribs/arrayrun tool providing support for job arrays. Contributed by
Bjørn-Helge Mevik, University of Oslo. NOTE: Not currently packaged as RPM
and manual file editing is required.
-- When suspending a job, wait 2 seconds instead of 1 second between sending
SIGTSTP and SIGSTOP. Some MPI implementation were not stopping within the
1 second delay.
-- Add support for managing devices based upon Linux cgroup container. Based
upon patch by Yiannis Georgiou, Bull.
-- Fix memory buffering bug if a AllowGroups parameter of a partition has 100
or more users. Patch by Andriy Grytsenko (Massive Solutions Limited).
-- Fix bug in generic resource tracking of gres associated with specific CPUs.
Resources were being over-allocated.
-- On systems with front-end nodes (IBM BlueGene and Cray) limit batch jobs to
only one CPU of these shared resources.
-- Set SLURM_MEM_PER_CPU or SLURM_MEM_PER_NODE environment variables for both
interactive (salloc) and batch jobs if the job has a memory limit. For Cray
systems also set CRAY_AUTO_APRUN_OPTIONS environment variable with the
memory limit.
-- Fix bug in select/cons_res task distribution logic when tasks-per-node=0.
Patch from Rod Schultz, Bull.
-- Restore node configuration information (CPUs, memory, etc.) for powered
down when slurmctld daemon restarts rather than waiting for the node to be
restored to service and getting the information from the node (NOTE: Only
relevent if FastSchedule=0).
-- For Cray systems with the srun2aprun wrapper, rebuild the srun man page
identifying the srun optioins which are valid on that system.
-- BlueGene: Permit users to specify a separate connection type for each
dimension (e.g. "--conn-type=torus,mesh,torus").
-- Add the ability for a user to limit the number of leaf switches in a job's
allocation using the --switch option of salloc, sbatch and srun. There is
also a new SchedulerParameters value of max_switch_wait, which a SLURM
administrator can used to set a maximum job delay and prevent a user job
from blocking lower priority jobs for too long. Based on work by Rod
Schultz, Bull.
* Changes in SLURM 2.3.0.pre6
=============================
-- NOTE: THERE HAS BEEN A NEW FIELD ADDED TO THE CONFIGURATION RESPONSE RPC
AS SHOWN BY "SCONTROL SHOW CONFIG". THIS FUNCTION WILL ONLY WORK WHEN THE
SERVER AND CLIENT ARE BOTH RUNNING SLURM VERSION 2.3.0.pre6
-- Modify job expansion logic to support licenses, generic resources, and
currently running job steps.
-- Added an rpath if using the --with-munge option of configure.
-- Add support for multiple sets of DEFAULT node, partition, and frontend
specifications in slurm.conf so that default values can be changed mulitple
times as the configuration file is read.

Danny Auble
committed
-- BLUEGENE - Improved logic to place small blocks in free space before freeing
larger blocks.
-- Add optional argument to srun's --kill-on-bad-exit so that user can set
its value to zero and override a SLURM configuration parameter of
KillOnBadExit.
-- Fix bug in GraceTime support for preempted jobs that prevented proper
operation when more than one job was being preempted. Based on patch from
Bill Brophy, Bull.
-- Fix for running sview from a non-bluegene cluster to a bluegene cluster.
Regression from pre5.
-- If job's TMPDIR environment is not set or is not usable, reset to "/tmp".
Patch from Andriy Grytsenko (Massive Solutions Limited).
-- Remove logic for defunct RPC: DBD_GET_JOBS.
-- Propagate DebugFlag changes by scontrol to the plugins.
-- Improve accuracy of REQUEST_JOB_WILL_RUN start time with respect to higher
priority pending jobs.
-- Add -R/--reservation option to squeue command as a job filter.
-- Add scancel support for --clusters option.
-- Note that scontrol and sprio can only support a single cluster at one time.
-- Add support to salloc for a new environment variable SALLOC_KILL_CMD.
-- Add scontrol ability to increment or decrement a job or step time limit.
-- Add support for SLURM_TIME_FORMAT environment variable to control time
stamp output format. Work by Gerrit Renker, CSCS.
-- Fix error handling in mvapich plugin that could cause srun to enter an
infinite loop under rare circumstances.
-- Add support for multiple task plugins. Patch from Andriy Grytsenko (Massive
Solutions Limited).
-- Addition of per-user node/cpu limits for QOS's. Patch from Aaron Knister,
UMBC.
-- Fix logic for multiple job resize operations.
-- BLUEGENE - many fixes to make things work correctly on a L/P system.
-- Fix bug in layout of job step with --nodelist option plus node count. Old
code could allocate too few nodes.
* Changes in SLURM 2.3.0.pre5
=============================
-- NOTE: THERE HAS BEEN A NEW FIELD ADDED TO THE JOB STATE FILE. UPGRADES FROM
VERSION 2.3.0-PRE4 WILL RESULT IN LOST JOBS UNLESS THE "orig_dependency"
FIELD IS REMOVED FROM JOB STATE SAVE/RESTORE LOGIC. ON CRAY SYSTEMS A NEW
"confirm_cookie" FIELD WAS ADDED AND HAS THE SAME EFFECT OF DISABLING JOB
STATE RESTORE.
-- BLUEGENE - Improve speed of start up when removing blocks at the beginning.
-- Correct init.d/slurm status to have non-zero exit code if ANY Slurm
damon that should be running on the node is not running. Patch from Rod
Schulz, Bull.
-- Improve accuracy of response to "srun --test-only jobid=#".
-- Fix bug in front-end configurations which reports job_cnt_comp underflow
errors after slurmctld restarts.
-- Eliminate "error from _trigger_slurmctld_event in backup.c" due to lack of
event triggers.
-- Fix logic in BackupController to properly recover front-end node state and
avoid purging active jobs.
-- Added man pages to html pages and the new cpu_management.html page.
Submitted by Martin Perry / Rod Schultz, Bull.
Loading
Loading full blame...