NEWS

This file describes changes in recent versions of Slurm. It primarily
documents those changes that are of interest to users and administrators.

* Changes in Slurm 16.05.4
==========================
 -- Fix potential deadlock if running with message aggregation.
 -- Streamline when schedule() is called when running with message aggregation
    on batch script completes.
 -- Fix incorrect casting when [un]packing derived_ec on slurmdb_job_rec_t.
 -- Document that persistent burst buffers can not be created or destroyed using
    the salloc or srun --bb options.
 -- Add support for setting the SLURM_JOB_ACCOUNT, SLURM_JOB_QOS and
    SLURM_JOB_RESERVAION environment variables are set for the salloc command.
    Document the same environment variables for the salloc, sbatch and srun
    commands in their man pages.
 -- Fix issue where sacctmgr load cluster.cfg wouldn't load associations
    that had a partition in them.
 -- Don't return the extern step from sstat by default.
 -- In sstat print 'extern' instead of 4294967295 for the extern step.
 -- Make advanced reservations work properly with core specialization.
 -- Fix race condition in the account_gather plugin that could result in job
    stuck in COMPLETING state.
 -- Regression test fixes if SelectTypePlugin not managing memory and no node
    memory size set (defaults to 1 MB per node).
 -- Add missing partition write locks to _slurm_rpc_dump_nodes/node_single to
    prevent a race condition leading to inconsistent sinfo results.
 -- Fix task:CPU binding logic for some processors. This bug was introduced
    in version 16.05.1 to address KNL bunding problem.
 -- Fix two minor memory leaks in slurmctld.
 -- Improve partition-specific limit logging from slurmctld daemon.
 -- Fix incorrect access check when using MaxNodes setting on the partition.
 -- Fix issue with sacctmgr when specifying a list of clusters to query.
 -- Fix issue when calculating future StartTime for a job.
 -- Make EnforcePartLimit support logic work with any ordering of partitions
    in job submit request.
 -- Prevent restoration of wrong CPU governor and frequency when using
    multiple task plugins.
 -- Prevent slurmd abort if hwloc library fails to populate the "children"
    arrays (observed with hwloc version "dev-333-g85ea6e4").
 -- burst_buffer/cray: Add "--groupid" to DataWarp "setup" command.
 -- Fix lustre profiling putting it in the Filesystem dataset instead of the
    Network dataset.
 -- Fix profiling documentation and code to match be consistent with
    Filesystem instead of Lustre.
 -- Correct the way watts is calculated in the rapl plugin when using a poll
    frequency other than AcctGatherNodeFreq.
 -- Don't about step launch if job reaches expected end time while node is
    configuring/booting (NOTE: The job end time will be adjusted after node
    becomes ready for use).
 -- Fix several print routines to respect a custom output delimiter when
    printing NO_VAL or INFINITE.
 -- Correct documented configurations where --ntasks-per-core and
    --ntasks-per-socket are supported.

* Changes in Slurm 16.05.3
==========================
 -- Make it so the extern step uses a reverse tree when cleaning up.
 -- If extern step doesn't get added into the proctrack plugin make sure the
    sleep is killed.
 -- Fix areas the slurmctld can segfault if an extern step is in the system
    cleaning up on a restart.
 -- Prevent possible incorrect counting of GRES of a given type if a node has
    the multiple "types" of a given GRES "name", which could over-subscribe
    GRES of a given type.
 -- Add web links to Slurm Diamond Collectors (from Harvard University) and
    collectd (from EDF).
 -- Add job_submit plugin for the "reboot" field.
 -- Make some more Slurm constants (INFINITE, NO_VAL64, etc.) available to
    job_submit/lua plugins.
 -- Send in a -1 for a taskid into spank_task_post_fork for the extern_step.
 -- MYSQL - Sightly better logic if a job completion comes in with an end time
    of 0.
 -- task/cgroup plugin is configured with ConstrainRAMSpace=yes, then set soft
    memory limit to allocated memory limit (previously no soft limit was set).
 -- Document limitations in burst buffer use by the salloc command (possible
    access problems from a login node).
 -- Fix proctrack plugin to only add the pid of a process once
    (regression in 16.05.2).
 -- Fix for sstat to print correct info when requesting jobid.batch as part of
    a comma-separated list.
 -- CRAY - Fix issue if pid has already been added to another job container.
 -- CRAY - Fix add of extern step to AELD.
 -- burstbufer/cray: avoid batch submit error condition if waiting for stagein.
 -- CRAY - Fix for reporting steps lingering after they are already finished.
 -- Testsuite - fix test1.29 / 17.15 for limits with values above 32-bits.
 -- CRAY - Simplify when a NHC is called on a step that has unkillable
    processes.
 -- CRAY - If trying to kill a step and you have NHC_NO_STEPS set run NHC
    anyway to attempt to log the backtraces of the potential
    unkillable processes.
 -- Fix gang scheduling and license release logic if single node job killed on
    bad node.
 -- Make scontrol show steps show the extern step correctly.
 -- Do not scheduled powered down nodes in FAILED state.
 -- Do not start slurmctld power_save thread until partition information is read
    in order to prevent race condition that can result invalid pointer when
    trying to resolve configured SuspendExcParts.
 -- Add SLURM_PENDING_STEP id so it won't be confused with SLURM_EXTERN_CONT.
 -- Fix for core selection with job --gres-flags=enforce-binding option.
    Previous logic would in some cases allocate a job zero cores, resulting in
    slurmctld abort.
 -- Minimize preempted jobs for configurations with multiple jobs per node.
 -- Improve partition AllowGroups caching. Update the table of UIDs permitted to
    use a partition based upon it's AllowGroups configuration parameter as new
    valid UIDs are found rather than looking up that user's group information
    for every job they submit. If the user is now allowed to use the partition,
    then do not check that user's group access again for 5 seconds.
 -- Add routing queue information to Slurm FAQ web page.
 -- Do not select_g_step_finish() a SLURM_PENDING_STEP step, as nothing has
    been allocated for the step yet.
 -- Fixed race condition in PMIx Fence logic.
 -- Prevent slurmctld abort if job is killed or requeued while waiting for
    reboot of its allocated compute nodes.
 -- Treat invalid user ID in AllowUserBoot option of knl.conf file as error
    rather than fatal (log and do not exit).
 -- qsub - When doing the default output files for an array in qsub style
    make them using the master job ID instead of the normal job ID.
 -- Create the extern step while creating the job instead of waiting until the
    end of the job to do it.
 -- Always report a 0 exit code for the extern step instead of being canceled
    or failed based on the signal that would always be killing it.
 -- Fix to allow users to update QOS of pending jobs.
 -- Print correct cluster name in "slurmd -C" output.
 -- CRAY - Fix minor memory leak in switch plugin.
 -- CRAY - Change slurmconfgen_smw.py to skip over disabled nodes.
 -- Fix eligible_time for elasticsearch as well as add queue_wait
    (difference between start of job and when it was eligible).

* Changes in Slurm 16.05.2
==========================
 -- CRAY - Fix issue where the proctrack plugin could hang if the container
    id wasn't able to be made.
 -- Move test for job wait reason value of BurstBufferResources and
    BurstBufferStageIn later in the scheduling logic.
 -- Document which srun options apply to only job, only step, or job and step
    allocations.
 -- Use more compatible function to get thread name (>= 2.6.11).
 -- Fix order of job then step id when noting cleaning flag being set.
 -- Make it so the extern step sends a message with accounting information
    back to the slurmctld.
 -- Make it so the extern step calls the select_g_step_start|finish functions.
 -- Don't print error when extern step is canceled because job is ending.
 -- Handle a few error codes when dealing with the extern step to make sure
    we have the pids added to the system correctly.
 -- Add support for job dependencies with job array expressions. Previous logic
    required listing each task of job array individually.
 -- Make sure tres_cnt is set before creating a slurmdb_assoc_usage_t.
 -- Prevent backfill scheduler from starting a second "singleton" job if another
    one started during a backfill sleep.
 -- Fix for invalid array pointer when creating advanced reservation when job
    allocations span heterogeneous nodes (differing core or socket counts).
 -- Fix hostlist_ranged_string_xmalloc_dims to correctly not put brackets on
    hostlists when brackets == 0.
 -- Make sure we don't get brackets when making a range of reserved ports
    for a step.
 -- Change fatal to an error if port ranges aren't correct when reading state
    for steps.

* Changes in Slurm 16.05.1
==========================
 -- Fix __cplusplus macro in spank.h to allow compilation with C++.
 -- Fix compile issue with older glibc < 2.12
 -- Fix for starting batch step with mpi/pmix plugin.
 -- Fix for "scontrol -dd show job" with respect to displaying the specific
    CPUs allocated to a job on each node. Prior logic would only display
    the CPU information for the first node in the job allocation.
 -- Print correct return code on failure to update active node features
    through sview.
 -- Allow QOS timelimit to override partition timelimit when EnforcePartLimits
    is set to all/any.
 -- Make it so qsub will do a "basename" on a wrapped command for the output
    and error files.
 -- Fix issue where slurmd could core when running the ipmi energy plugin.
 -- Documentation - clean up typos.
 -- Add logic so that slurmstepd can be launched under valgrind.
 -- Increase buffer size to read /proc/*/stat files.
 -- Fix for tracking job resource allocation when slurmctld is reconfigured
    while Cray Node Health Check (NHC) is running. Previous logic would fail to
    record the job's allocation then perform release operation upon NHC
    completion, resulting in underflow error messages.
 -- Make "scontrol show daemons" work with long node names.
 -- CRAY - Collect energy using a uint64_t instead of uint32_t.
 -- Fix incorrect if statements when determining if the user has a default
    account or wckey.
 -- Prevent job stuck in configuring state if slurmctld daemon restarted while
    PrologSlurmctld is running. Also re-issue burst_buffer/pre-load operation
    as needed.
 -- Correct task affinity support for FreeBSD.
 -- Fix for task affinity on KNL in SNC2/Flat mode.
 -- Recalculate a job's memory allocation after node reboot if job requests all
    of a node's memory and FastSchedule=0 is configured. Intel KNL memory size
    can change on reboot with various MCDRAM modes.
 -- Fix small memory leak when printing HealthCheckNodeState.
 -- Eliminate memory leaks when AuthInfo is configured.
 -- Improve sdiag output description in man page.
 -- Cray/capmc_resume script modify a node's features (as needed) when the
    reinit (reboot) command is issued rather than wait for the nodes to change
    to the "on" state.
 -- Correctly print ranges when using step values in job arrays.
 -- Allow from file names / paths over 256 characters when launching steps,