NEWS

This file describes changes in recent versions of SLURM. It primarily
documents those changes that are of interest to users and admins.

* Changes in SLURM 2.3.0.rc2
============================
 -- With sched/wiki or sched/wiki2 (Maui or Moab scheduler), insure that a
    requeued job's priority is reset to zero.
 -- BLUEGENE - fix to run steps correctly in a BGL/P emulated system.
 -- Fixed issue where if there was a network issue between the slurmctld and
    the DBD where both remained up but were disconnected the slurmctld would
    get registered again with the DBD.
 -- Fixed issue where if the DBD connection from the ctld goes away because of
    a POLLERR the dbd_fail callback is called.
 -- BLUEGENE - Fix to smap command-line mode display.
 -- Change in GRES behavior for job steps: A job step's default generic
    resource allocation will be set to that of the job. If a job step's --gres
    value is set to "none" then none of the generic resources which have been
    allocated to the job will be allocated to the job step.
 -- Add srun environment value of SLURM_STEP_GRES to set default --gres value
    for a job step.
 -- Require SchedulerTimeSlice configuration parameter to be at least 5 seconds
    to avoid thrashing slurmd daemon.
 -- Cray - Fix to make nodes state in accounting consistent with state set by
    ALPS.
 -- Cray - A node DOWN to ALPS will be marked DOWN to SLURM only after reaching
    SlurmdTimeout. In the interim, the node state will be NO_RESPOND. This
    change makes behavior makes SLURM handling of the node DOWN state more
    consistent with ALPS. This change effects only Cray systems.
 -- Cray - Fix to work with 4.0.* instead of just 4.0.0
 -- Cray - Modify srun/aprun wrapper to map --exclusive to -F exclusive and
    --share to -F share. Note this does not consider the partition's Shared
    configuration, so it is an imperfect mapping of options.
 -- BLUEGENE - Added notice in the print config to tell if you are emulated
    or not.
 -- BLUEGENE - Fix job step scalability issue with large task count.
 -- BGQ - Improved c-node selection when asked for a sub-block job that
    cannot fit into the available shape.
 -- BLUEGENE - Modify "scontrol show step" to show  I/O nodes (BGL and BGP) or
    c-nodes (BGQ) allocated to each step. Change field name from "Nodes=" to
    "BP_List=".
 -- Code cleanup on step request to get the correct select_jobinfo.
 -- Memory leak fixed for rolling up accounting with down clusters.
 -- BGQ - fix issue where if first job step is the entire block and then the
    next parallel step is ran on a sub block, SLURM won't over subscribe cnodes.
 -- Treat duplicate switch name in topology.conf as fatal error. Patch from Rod
    Schultz, Bull
 -- Minor update to documentation describing the AllowGroups option for a
    partition in the slurm.conf.
 -- Fix problem with _job_create() when not using qos's.  It makes
    _job_create() consistent with similar logic in select_nodes().
 -- GrpCPURunMins in a QOS flushed out.
 -- Fix for squeue -t "CONFIGURING" to actually work.
 -- CRAY - Add cray.conf parameter of SyncTimeout, maximum time to defer job
    scheduling if SLURM node or job state are out of synchronization with ALPS.
 -- If salloc was run as interactive, with job control, reset the foreground
    process group of the terminal to the process group of the parent pid before
    exiting. Patch from Don Albert, Bull.
 -- BGQ - set up the corner of a sub block correctly based on a relative
    position in the block instead of absolute.
 -- BGQ - make sure the recently added select_jobinfo of a step launch request
    isn't sent to the slurmd where environment variables would be overwritten
    incorrectly.

* Changes in SLURM 2.3.0.rc1
============================
 -- NOTE THERE HAVE BEEN NEW FIELDS ADDED TO THE JOB AND PARTITION STATE SAVE
    FILES AND RPCS. PENDING AND RUNNING JOBS WILL BE LOST WHEN UPGRADING FROM
    EARLIER VERSION 2.3 PRE-RELEASES AND RPCS WILL NOT WORK WITH EARLIER
    VERSIONS.
 -- select/cray: Add support for Accelerator information including model and
    memory options.
 -- Cray systems: Add support to suspend/resume salloc command to insure that
    aprun does not get initiated when the job is suspended. Processes suspended
    and resumed are determined by using process group ID and parent process ID,
    so some processes may be missed. Since salloc runs as a normal user, it's
    ability to identify processes associated with a job is limited.
 -- Cray systems: Modify smap and sview to display all nodes even if multiple
    nodes exist at each coordinate.
 -- Improve efficiency of select/linear plugin with topology/tree plugin
    configured, Patch by Andriy Grytsenko (Massive Solutions Limited).
 -- For front-end architectures on which job steps are run (emulated Cray and
    BlueGene systems only), fix bug that would free memory still in use.
 -- Add squeue support to display a job's license information. Patch by Andy
    Roosen (University of Deleware).
 -- Add flag to the select APIs for job suspend/resume indicating if the action
    is for gang scheduling or an explicit job suspend/resume by the user. Only
    an explicit job suspend/resume will reset the job's priority and make
    resources exclusively held by the job available to other jobs.
 -- Fix possible invalid memory reference in sched/backfill. Patch by Andriy
    Grytsenko (Massive Solutions Limited).
 -- Add select_jobinfo to the task launch RPC. Based upon patch by Andriy
    Grytsenko (Massive Solutions Limited).
 -- Add DefMemPerCPU/Node and MaxMemPerCPU/Node to partition configuration.
    This improves flexibility when gang scheduling only specific partitions.
 -- Added new enums to print out when a job is held by a QOS instead of an
    association limit.
 -- Enhancements to sched/backfill performance with select/cons_res plugin.
    Patch from Bjørn-Helge Mevik, University of Oslo.
 -- Correct job run time reported by smap for suspended jobs.
 -- Improve job preemption logic to avoid preempting more jobs than needed.
 -- Add contribs/arrayrun tool providing support for job arrays. Contributed by
    Bjørn-Helge Mevik, University of Oslo. NOTE: Not currently packaged as RPM
    and manual file editing is required.
 -- When suspending a job, wait 2 seconds instead of 1 second between sending
    SIGTSTP and SIGSTOP. Some MPI implementation were not stopping within the
    1 second delay.
 -- Add support for managing devices based upon Linux cgroup container. Based
    upon patch by Yiannis Georgiou, Bull.
 -- Fix memory buffering bug if a AllowGroups parameter of a partition has 100
    or more users. Patch by Andriy Grytsenko (Massive Solutions Limited).
 -- Fix bug in generic resource tracking of gres associated with specific CPUs.
    Resources were being over-allocated.
 -- On systems with front-end nodes (IBM BlueGene and Cray) limit batch jobs to
    only one CPU of these shared resources.
 -- Set SLURM_MEM_PER_CPU or SLURM_MEM_PER_NODE environment variables for both
    interactive (salloc) and batch jobs if the job has a memory limit. For Cray
    systems also set CRAY_AUTO_APRUN_OPTIONS environment variable with the
    memory limit.
 -- Fix bug in select/cons_res task distribution logic when tasks-per-node=0.
    Patch from Rod Schultz, Bull.
 -- Restore node configuration information (CPUs, memory, etc.) for powered
    down when slurmctld daemon restarts rather than waiting for the node to be
    restored to service and getting the information from the node (NOTE: Only
    relevent if FastSchedule=0).
 -- For Cray systems with the srun2aprun wrapper, rebuild the srun man page
    identifying the srun optioins which are valid on that system.
 -- BlueGene: Permit users to specify a separate connection type for each
    dimension (e.g. "--conn-type=torus,mesh,torus").
 -- Add the ability for a user to limit the number of leaf switches in a job's
    allocation using the --switch option of salloc, sbatch and srun. There is
    also a new SchedulerParameters value of max_switch_wait, which a SLURM
    administrator can used to set a maximum job delay and prevent a user job
    from blocking lower priority jobs for too long. Based on work by Rod
    Schultz, Bull.

* Changes in SLURM 2.3.0.pre6
=============================
 -- NOTE: THERE HAS BEEN A NEW FIELD ADDED TO THE CONFIGURATION RESPONSE RPC
    AS SHOWN BY "SCONTROL SHOW CONFIG". THIS FUNCTION WILL ONLY WORK WHEN THE
    SERVER AND CLIENT ARE BOTH RUNNING SLURM VERSION 2.3.0.pre6
 -- Modify job expansion logic to support licenses, generic resources, and
    currently running job steps.
 -- Added an rpath if using the --with-munge option of configure.
 -- Add support for multiple sets of DEFAULT node, partition, and frontend
    specifications in slurm.conf so that default values can be changed mulitple
    times as the configuration file is read.
 -- BLUEGENE - Improved logic to place small blocks in free space before freeing
    larger blocks.
 -- Add optional argument to srun's --kill-on-bad-exit so that user can set
    its value to zero and override a SLURM configuration parameter of
    KillOnBadExit.
 -- Fix bug in GraceTime support for preempted jobs that prevented proper
    operation when more than one job was being preempted. Based on patch from
    Bill Brophy, Bull.
 -- Fix for running sview from a non-bluegene cluster to a bluegene cluster.
    Regression from pre5.
 -- If job's TMPDIR environment is not set or is not usable, reset to "/tmp".
    Patch from Andriy Grytsenko (Massive Solutions Limited).
 -- Remove logic for defunct RPC: DBD_GET_JOBS.
 -- Propagate DebugFlag changes by scontrol to the plugins.
 -- Improve accuracy of REQUEST_JOB_WILL_RUN start time with respect to higher
    priority pending jobs.
 -- Add -R/--reservation option to squeue command as a job filter.
 -- Add scancel support for --clusters option.
 -- Note that scontrol and sprio can only support a single cluster at one time.
 -- Add support to salloc for a new environment variable SALLOC_KILL_CMD.
 -- Add scontrol ability to increment or decrement a job or step time limit.
 -- Add support for SLURM_TIME_FORMAT environment variable to control time
    stamp output format. Work by Gerrit Renker, CSCS.
 -- Fix error handling in mvapich plugin that could cause srun to enter an
    infinite loop under rare circumstances.
 -- Add support for multiple task plugins. Patch from Andriy Grytsenko (Massive
    Solutions Limited).
 -- Addition of per-user node/cpu limits for QOS's. Patch from Aaron Knister,
    UMBC.
 -- Fix logic for multiple job resize operations.
 -- BLUEGENE - many fixes to make things work correctly on a L/P system.
 -- Fix bug in layout of job step with --nodelist option plus node count. Old
    code could allocate too few nodes.

* Changes in SLURM 2.3.0.pre5
=============================
 -- NOTE: THERE HAS BEEN A NEW FIELD ADDED TO THE JOB STATE FILE. UPGRADES FROM
    VERSION 2.3.0-PRE4 WILL RESULT IN LOST JOBS UNLESS THE "orig_dependency"
    FIELD IS REMOVED FROM JOB STATE SAVE/RESTORE LOGIC. ON CRAY SYSTEMS A NEW
    "confirm_cookie" FIELD WAS ADDED AND HAS THE SAME EFFECT OF DISABLING JOB
    STATE RESTORE.
 -- BLUEGENE - Improve speed of start up when removing blocks at the beginning.
 -- Correct init.d/slurm status to have non-zero exit code if ANY Slurm
    damon that should be running on the node is not running. Patch from Rod
    Schulz, Bull.
 -- Improve accuracy of response to "srun --test-only jobid=#".
 -- Fix bug in front-end configurations which reports job_cnt_comp underflow
    errors after slurmctld restarts.
 -- Eliminate "error from _trigger_slurmctld_event in backup.c" due to lack of
    event triggers.
 -- Fix logic in BackupController to properly recover front-end node state and
    avoid purging active jobs.
 -- Added man pages to html pages and the new cpu_management.html page.
    Submitted by Martin Perry / Rod Schultz, Bull.