NEWS

This file describes changes in recent versions of Slurm. It primarily
documents those changes that are of interest to users and administrators.

* Changes in Slurm 16.05.10
===========================
 -- Record job state as PREEMPTED instead of TIMEOUT when GraceTime is reached.
 -- task/cgroup - print warnings to stderr when --cpu_bind=verbose is enabled
    and the requested processor affinity cannot be set.
 -- power/cray - Disable power cap get and set operations on DOWN nodes.
 -- Jobs preempted with PreemptMode=REQUEUE were incorrectly recorded as
    REQUEUED in the accounting.
 -- PMIX - Use volatile specifier to avoid flag caching and lock the flag to
    make sure it is protected.
 -- PMIX/PMI2 - Make it possible to use %n or %h in a spool dir.
 -- burst_buffer/cray - Support default pool which is not the first pool
    reported by DataWarp and log in Slurm when pools that are added or removed
    from DataWarp.
 -- Insure job does not start running before node is booted and PrologSlurmctld
    is complete.
 -- Fix minor memory leak in the slurmctld when removing a QOS.

* Changes in Slurm 16.05.9
==========================
 -- Fix parsing of SBCAST_COMPRESS environment variable in sbcast.
 -- Change some debug messages to errors in task/cgroup plugin.
 -- backfill scheduler: Stop trying to determine expected start time for a job
    after 2 seconds of wall time. This can happen if there are many running jobs
    and a pending job can not be started soon.
 -- Improve performance of cr_sort_part_rows() in cons_res plugin.
 -- CRAY - Fix dealock issue when updating accounting in the slurmctld and
    scheduling a Datawarp job.
 -- Correct the job state accounting information for jobs requeued due to burst
    buffer errors.
 -- burst_buffer/cray - Avoid "pre_run" operation if not using buffer (i.e.
    just creating or deleting a persistent burst buffer).
 -- Fix slurm.spec file support for BlueGene builds.
 -- Fix missing TRES read lock in acct_policy_job_runnable_pre_select() code.
 -- Fix debug2 message printing value using wrong array index in
    _qos_job_runnable_post_select().
 -- Prevent job timeout on node power up.
 -- MYSQL - Fix minor memory leak when querying steps and the sql fails.
 -- Make it so sacctmgr accepts column headers like MaxTRESPU and not MaxTRESP.
 -- Only look at SLURM_STEP_KILLED_MSG_NODE_ID on startup, to avoid race
    condition later when looking at a steps env.
 -- Make backfill scheduler behave like regular scheduler in respect to
    'assoc_limit_stop'.
 -- Allow a lower version client command to talk to a higher version contoller
    using the multi-cluster options (e.g. squeue -M<clsuter>).
 -- slurmctld/agent race condition fix: Prevent job launch while PrologSlurmctld
    daemon is running or node boot in progress.
 -- MYSQL - Fix a few other minor memory leaks when uncommon failures occur.
 -- burst_buffer/cray - Fix race condition that could cause multiple batch job
    launch requests resulting in drained nodes.
 -- Correct logic to purge old reservations.
 -- Fix DBD cache restore from previous versions.
 -- Fix to logic for getting expected start time of existing job ID with
    explicit begin time that is in the past.
 -- Clear job's reason of "BeginTime" in a more timely fashion and/or prevents
    them from being stuck in a PENDING state.
 -- Make sure acct policy limits imposed on a job are correct after requeue.

* Changes in Slurm 16.05.8
==========================
 -- Remove StoragePass from being printed out in the slurmdbd log at debug2
    level.
 -- Defer PATH search for task program until launch in slurmstepd.
 -- Modify regression test1.89 to avoid leaving vestigial job. Also reduce
    logging to reduce likelyhood of Expect buffer overflow.
 -- Do not PATH search for mult-prog launches if LaunchParamters=test_exec is
    enabled.
 -- Fix for possible infinite loop in select/cons_res plugin when trying to
    satisfy a job's ntasks_per_core or socket specification.
 -- If job is held for bad constraints make it so once updated the job doesn't
    go into JobAdminHeld.
 -- sched/backfill - Fix logic to reserve resources for jobs that require a
    node reboot (i.e. to change KNL mode) in order to start.
 -- When unpacking a node or front_end record from state and the protocol
    version is lower than the min version, set it to the min.
 -- Remove redundant lookup for part_ptr when updating a reservation's nodes.
 -- Fix memory and file descriptor leaks in slurmd daemon's sbcast logic.
 -- Do not allocate specialized cores to jobs using the --exclusive option.
 -- Cancel interactive job if Prolog failure with "PrologFlags=contain" or
    "PrologFlags=alloc" configured. Send new error prolog failure message to
    the salloc or srun command as needed.
 -- Prevent possible out-of-bounds read in slurmstepd on an invalid #! line.
 -- Fix check for PluginDir within slurmctld to work with multiple directories.
 -- Cancel interactive jobs automatically on communication error to launching
    srun/salloc process.
 -- Fix security issue caused by insecure file path handling triggered by the
    failure of a Prolog script. To exploit this a user needs to anticipate or
    cause the Prolog to fail for their job. CVE-2016-10030.

* Changes in Slurm 16.05.7
==========================
 -- Fix issue in the priority/multifactor plugin where on a slurmctld restart,
    where more time is accounted for than should be allowed.
 -- cray/busrt_buffer - If total_space in a pool decreases, reset used_space
    rather than trying to account for buffer allocations in progress.
 -- cray/busrt_buffer - Fix for double counting of used_space at slurmctld
    startup.
 -- Fix regression in 16.05.6 where if you request multiple cpus per task (-c2)
    and request --ntasks-per-core=1 and only 1 task on the node
    the slurmd would abort on an infinite loop fatal.
 -- cray/busrt_buffer - Internally track both allocated and unusable space.
    The reported UsedSpace in a pool is now the allocated space (previously was
    unusable space). Base available space on whichever value leaves least free
    space.
 -- cray/burst_buffer - Preserve job ID and don't translate to job array ID.
 -- cray/burst_buffer - Update "instance" parsing to match updated dw_wlm_cli
    output.
 -- sched/backfill - Insure we don't try to start a job that was already started
    and requeued by the main scheduling logic.
 -- job_submit/lua - add access to the job features field in job_record.
 -- select/linear plugin modified to better support heterogeneous clusters when
    topology/none is also configured.
 -- Permit cancellation of jobs in configuring state.
 -- acct_gather_energy/rapl - prevent segfault in slurmd from race to gather
    data at slurmd startup.
 -- Integrate node_feature/knl_generic with "hbm" GRES information.
 -- Fix output routines to prevent rounding the TRES values for memory or BB.
 -- switch/cray plugin - fix use after free error.
 -- docs - elaborate on how way to clear TRES limits in sacctmgr.
 -- knl_cray plugin - Avoid abort from backup slurmctld at start time.
 -- cgroup plugins - fix two minor memory leaks.
 -- If a node is booting for some job, don't allocate additional jobs to the
    node until the boot completes.
 -- testsuite - fix job id output in test17.39.
 -- Modify backfill algorithm to improve performance with large numbers of
    running jobs. Group running jobs that end in a "similar" time frame using a
    time window that grows exponentially rather than linearly. After one second
    of wall time, simulate the termination of all remaining running jobs in
    order to respond in a reasonable time frame.
 -- Fix slurm_job_cpus_allocated_str_on_node_id() API call.
 -- sched/backfill plugin: Make malloc match data type (defined as uint32_t and
    allocated as int).
 -- srun - prevent segfault when terminating job step before step has launched.
 -- sacctmgr - prevent segfault when trying to reset usage for an invalid
    account name.
 -- Make the openssl crypto plugin compile with openssl >= 1.1.
 -- Fix SuspendExcNodes and SuspendExcParts on slurmctld reconfiguration.
 -- sbcast - prevent segfault in slurmd due to race condition between file
    transfers from separate jobs using zlib compression
 -- cray/burst_buffer - Increase time to synchronize operations between threads
    from 5 to 60 seconds ("setup" operation time observed over 17 seconds).
 -- node_features/knl_cray - Fix possible race condition when changing node
    state that could result in old KNL mode as an active features.
 -- Make sure if a job can't run because of resources we also check accounting
    limits after the node selection to make sure it doesn't violate those limits
    and if it does change the reason for waiting so we don't reserve resources
    on jobs violating accounting limits.
 -- NRT - Make it so a system running against IBM's PE will work with PE
    version 1.3.
 -- NRT - Make it so protocols pgas and test are allowed to be used.
 -- NRT - Make it so you can have more than 1 protocol listed in MP_MSG_API.
 -- cray/burst_buffer - If slurmctld daemon restarts with pending job and burst
    buffer having unknown file stage-in status, teardown the buffer, defer the
    job, and start stage-in over again.
 -- On state restore in the slurmctld don't overwrite the mem_spec_limit given
    from the slurm.conf when using FastSchedule=0.
 -- Recognize a KNL's proper NUMA count (rather than setting it to the value
    in slurm.conf) when using FastSchedule=0.
 -- Fix parsing in regression test1.92 for some prompts.
 -- sbcast - use slurmd's gid cache rather than a separate lookup.
 -- slurmd - return error if setgroups() call fails in _drop_privileges().
 -- Remove error messages about gres counts changing when a job is resized on
    a slurmctld restart or reconfig, as they aren't really error messages.
 -- Fix possible memory corruption if a job is using GRES and changing size.
 -- jobcomp/elasticsearch - fix printf format for a value on 32-bit builds.
 -- task/cgroup - Change error message if CPU binding can not take place to
    better identify the root cause of the problem.
 -- Fix issue where task/cgroup would not always honor --cpu_bind=threads.
 -- Fix race condition in with getgrouplist() in slurmd that can lead to
    user accounts being granted access to incorrect group memberships during
    job launch.

* Changes in Slurm 16.05.6
==========================
 -- Docs - the correct default value for GroupUpdateForce is 0.
 -- mpi/pmix - improve point to point communication performance.
 -- SlurmDB - include pending jobs in search during 'sacctmgr show runawayjobs'.
 -- Add client side out-of-range checks to --nice flag.
 -- Fix support for sbatch "-W" option, previously eeded to use "--wait".
 -- node_features/knl_cray plugin and capmc_suspend/resume programs modified to
    sleep and retry capmc operations if the Cray State Manager is down. Added
    CapmcRetries configuration parameter to knl_cray.conf.
 -- node_features/knl_cray plugin: Remove any KNL MCDRAM or NUMA features from
    node's configuration if capmc does NOT report the node as being KNL.
 -- node_features/knl_cray plugin: drain any node not reported by
    "capmc node_status" on startup or reconfig.
 -- node_features/knl_cray plugin: Substantially streamline and speed up logic
    to load current node state on reconfigure failure or unexpected node boot.
 -- node_features/knl_cray plugin: Add separate thread to interact with capmc
    in response to unexpected node reboots.
 -- node_features plugin - Add "mode" argument to node_features_p_node_xlate()
    function to fix some bugs updating a node's features using the node update
    RPC.
 -- node_features/knl_cray plugin: If the reconfiguration of nodes for an
    interactive job fails, kill the job (it can't be requeued like a batch job).
 -- Testsuite - Added srun/salloc/sbatch tests with --use-min-nodes option.
 -- Fix typo when an error occurs when discovering pmix version on