This file describes changes in recent versions of SLURM. It primarily
documents those changes that are of interest to users and admins.
* Changes in Slurm 2.6.0rc2
-- HDF5 - Fix issue with Ubuntu where HDF5 development headers are
overwritten by the parallel versions thus making it so we need handle
both cases.
-- ACCT_GATHER - handle suspending correctly for polling threads.
-- Make the list_sort use a merge sort with is n log n instead of the n^2
algo we have previously used.
-- Make SLURM_DISTRIBUTION env var hold both types of distribution if
-- Remove hardcoded /usr/local from slurm.spec.
-- Modify slurmctld locking to improve performance under heavy load with
very large numbers of batch job submissions or job cancellations.
-- sstat - Fix issue where if -j wasn't given allow last argument to be checked
for as the job/step id.
* Changes in Slurm 2.6.0rc1
-- Added helper script for launching symmetric and MIC-only MPI tasks within
SLURM (in contribs/mic/mpirun-mic).
-- Change maximum delay for state save from 2 secs to 5 secs. Make timeout
configurable at build time by defining SAVE_MAX_WAIT.
-- Modify slurmctld data structure locking to interleave read and write
locks rather than always favor write locks over read locks.
-- Added sacct format option of "ALL" to print all fields.
-- Deprecate the SchedulerParameters value of "interval" use "bf_interval"
instead as documented.
-- Add acct_gather_profile/hdf5 to profile jobs with hdf5
-- Added MaxCPUsPerNode partition configuration parameter. This can be
especially useful to schedule systems with GPUs.
-- Permit "scontrol reboot_node" for nodes in MAINT reservation.
-- Added "PriorityFlags" value of "SMALL_RELATIVE_TO_TIME". If set, the job's
size component will be based upon not the job size alone, but the job's
size divided by it's time limit.

Morris Jette
-- Added sbatch option "--ignore-pbs" to ignore "#PBS" options in the batch
-- Rename slurm_step_ctx_params_t field from "mem_per_cpu" to "pn_min_memory".
Job step now accepts memory specification in either per-cpu or per-node
-- Add ability to specify host repitition count in the srun hostfile (e.g.
"host1*2" is equivalent to "host1,host1").
* Changes in Slurm 2.6.0pre3
-- Add milliseconds to default log message header (both RFC 5424 and ISO 8601
time formats). Disable milliseconds logging using the configure
parameter "--disable-log-time-msec". Default time format changes to
ISO 8601 (without time zone information). Specify "--enable-rfc5424time"
to restore the time zone information.
-- Add username (%u) to the filename pattern in the batch script.
-- Added options for front end nodes of AllowGroups, AllowUsers, DenyGroups,
and DenyUsers.
-- Fix sched/backfill logic to initiate jobs with maximum time limit over the
partition limit, but the minimum time limit permits it to start.
-- gres/gpu - Fix for gres.conf file with multiple files on a single line
using a slurm expression (e.g. "File=/dev/nvidia[0-1]").
-- Replaced ipmi.conf with generic acct_gather.conf file for all acct_gather
plugins. For those doing development to use this follow the model set
forth in the acct_gather_energy_ipmi plugin.
-- Added more options to update a step's information
-- Add DebugFlags=ThreadID which will print the thread id of the calling
-- CRAY - Allocate whole node (CPUs) in reservation despite what the
user requests. We have found any srun/aprun afterwards will work on a
subset of resources.
-- Do not purge inactive interactive jobs that lack a port to ping (added
for MR+ operation).
-- Advanced reservations with hostname and core counts now supports asymetric
reservations (e.g. specific different core count for each node).
-- Added slurmctld/dynalloc plugin for MapReduce+ support.
-- Added "DynAllocPort" configuration parameter.
-- Added partition paramter of SelectTypeParameters to override system-wide
-- Added cr_type to partition_info data structure.
-- Added allocated memory to node information available (within the existing
select_nodeinfo field of the node_info_t data structure). Added Allocated
Memory to node information displayed by sview and scontrol commands.
-- Make sched/backfill the default scheduling plugin rather than sched/builtin

Alejandro Lucero Palau
-- Added support for a job having different priorities in different partitions.
-- Added new SchedulerParameters configuration parameter of "bf_continue"
which permits the backfill scheduler to continue considering jobs for
backfill scheduling after yielding locks even if new jobs have been
submitted. This can result in lower priority jobs from being backfill
scheduled instead of newly arrived higher priority jobs, but will permit
more queued jobs to be considered for backfill scheduling.
-- Added support to purge reservation records from accounting.
-- Cray - Add support for Basil 1.3
* Changes in SLURM 2.6.0pre1
-- Add "state" field to job step information reported by scontrol.
-- Notify srun to retry step creation upon completion of other job steps
rather than polling. This results in much faster throughput for job step
execution with --exclusive option.
-- Added "ResvEpilog" and "ResvProlog" configuration parameters to execute a
program at the beginning and end of each reservation.
-- Added "slurm_load_job_user" function. This is a variation of
"slurm_load_jobs", but accepts a user ID argument, potentially resulting
in substantial performance improvement for "squeue --user=ID"
-- Added "slurm_load_node_single" function. This is a variation of
"slurm_load_nodes", but accepts a node name argument, potentially resulting
in substantial performance improvement for "sinfo --nodes=NAME".
-- Added "HealthCheckNodeState" configuration parameter identify node states
on which HealthCheckProgram should be executed.
-- Remove sacct --dump --formatted-dump options which were deprecated in
-- Added support for job arrays (phase 1 of effort). See "man sbatch" option
-a/--array for details.
-- Add new AccountStorageEnforce options of 'nojobs' and 'nosteps' which will
allow the use of accounting features like associations, qos and limits but
not keep track of jobs or steps in accounting.
-- Cray - Add new cray.conf parameter of "AlpsEngine" to specify the
communication protocol to be used for ALPS/BASIL.
-- select/cons_res plugin: Correction to CPU allocation count logic in for
cores without hyperthreading.
-- Added new SelectTypeParameter value of "CR_ALLOCATE_FULL_SOCKET".
-- Added PriorityFlags value of "TICKET_BASED" and merged priority/multifactor2
plugin into priority/multifactor plugin.

Morris Jette
-- Add "KeepAliveTime" configuration parameter controlling how long sockets
used for srun/slurmstepd communications are kept alive after disconnect.
-- Added SLURM_SUBMIT_HOST to salloc, sbatch and srun job environment.
-- Added SLURM_ARRAY_TASK_ID to environment of job array.
-- Added squeue --array/-r option to optimize output for job arrays.
-- Added "SlurmctldPlugstack" configuration parameter for generic stack of
slurmctld daemon plugins.
-- Removed contribs/arrayrun tool. Use native support for job arrays.
-- Modify default installation locations for RPMs to match "make install":
_prefix /usr/local
_slurm_sysconfdir %{_prefix}/etc/slurm
_mandir %{_prefix}/share/man
_infodir %{_prefix}/share/info
-- Add acct_gather_energy/ipmi which works off freeipmi for energy gathering
* Changes in Slurm 2.5.8
-- Fix for slurmctld segfault on NULL front-end reason field.
-- Avoid gres step allocation errors when a job shrinks in size due to either
down nodes or explicit resizing. Generated slurmctld errors of this type:
"step_test ... gres_bit_alloc is NULL"
-- Fix bug that would leak memory and over-write the AllowGroups field if on
"scontrol reconfig" when AllowNodes is manually changed using scontrol.
-- Get html/man files to install in correct places with rpms.
-- Remove --program-prefix from spec file since it appears to be added by
default and appeared to break other things.
-- Updated the automake min version in to be correct.
* Changes in Slurm 2.5.7
-- Fix for linking to the select/cray plugin to not give warning about
undefined variable.
-- Add missing symbols to the xlator.h
-- Avoid placing pending jobs in AdminHold state due to backfill scheduler
interactions with advanced reservation.
-- Accounting - make average by task not cpu.
-- CRAY - Change logging of transient ALPS errors from error() to debug().
-- POE - Correct logic to support poe option "-euidevice sn_all" and
"-euidevice sn_single".
-- Accounting - Fix minor initialization error.
-- POE - Correct logic to support srun network instances count with POE.
-- POE - With the srun --launch-cmd option, report proper task count when
the --cpus-per-task option is used without the --ntasks option.
-- POE - Fix logic binding tasks to CPUs.
-- sview - Fix race condition where new information could of slipped past
the node tab and we didn't notice.
-- Accounting - Fix an invalid memory read when slurmctld sends data about
start job to slurmdbd.
-- If a prolog or epilog failure occurs, drain the node rather than setting it
down and killing all of its jobs.
-- Priority/multifactor - Avoid underflow in half-life calculation.
-- POE - pack missing variable to allow fanout (more than 32 nodes)
-- Prevent clearing reason field for pending jobs. This bug was introduced in
v2.5.5 (see "Reject job at submit time ...").
-- BGQ - Fix issue with preemption on sub-block jobs where a job would kill
all preemptable jobs on the midplane instead of just the ones it needed to.
-- switch/nrt - Validate dynamic window allocation size.
-- BGQ - When --geo is requested do not impose the default conn_types.
-- RebootNode logic - Defers (rather than forgets) reboot request with job
running on the node within a reservation.
-- switch/nrt - Correct network_id use logic. Correct support for user sn_all
and sn_single options.
-- sched/backfill - Modify logic to reduce overhead under heavy load.
-- Fix job step allocation with --exclusive and --hostlist option.
-- Select/cons_res - Fix bug resulting in error of "cons_res: sync loop not
progressing, holding job #"
-- checkpoint/blcr - Reset max_nodes from zero to NO_VAL on job restart.
-- launch/poe - Fix for hostlist file support with repeated host names.
-- priority/multifactor2 - Prevent possible divide by zero.
-- srun - Don't check for executable if --test-only flag is used.
-- energy - On a single node only use the last task for gathering energy.
Since we don't currently track energy usage per task (only per step).
Otherwise we get double the energy.
Loading full blame...