NEWS

This file describes changes in recent versions of Slurm. It primarily
documents those changes that are of interest to users and administrators.

* Changes in Slurm 15.08.10
===========================
 -- Fix issue where if a slurmdbd rollup lasted longer than 1 hour the
    rollup would effectively never run again.
 -- Make error message in the pmi2 code to debug as the issue can be expected
    and retries are done making the error message a little misleading.
 -- Power/cray: Don't specify NID list to Cray APIs. If any of those nodes are
    not in a ready state, the API returned an error for ALL nodes rather than
    valid data for nodes in ready state.
 -- Fix potential divide by zero when tree_width=1.
 -- checkpoint/blcr plugin: Fix memory leak.
 -- If using PrologFlags=contain: Don't launch the extern step if a job is
    cancelled while launching.

* Changes in Slurm 15.08.9
==========================
 -- BurstBuffer/cray - Defer job cancellation or time limit while "pre-run"
    operation in progress to avoid inconsistent state due to multiple calls
    to job termination functions.
 -- Fix issue with resizing jobs and limits not be kept track of correctly.
 -- BGQ - Remove redeclaration of job_read_lock.
 -- BGQ - Tighter locks around structures when nodes/cables change state.
 -- Make it possible to change CPUsPerTask with scontrol.
 -- Make it so scontrol update part qos= will take away a partition QOS from
    a partition.
 -- Fix issue where SocketsPerBoard didn't translate to Sockets when CPUS=
    was also given.
 -- Add note to slurm.conf man page about setting "--cpu_bind=no" as part
    of SallocDefaultCommand if a TaskPlugin is in use.
 -- Set correct reason when a QOS' MaxTresMins is violated.
 -- Insure that a job is completely launched before trying to suspend it.
 -- Remove historical presentations and design notes. Only distribute
    maintained doc/html and doc/man directories.
 -- Remove duplicate xmalloc() in task/cgroup plugin.
 -- Backfill scheduler to validate correct job partition for job submitted to
    multiple partitions.
 -- Force close on exec on first 256 file descriptors when launching a
    slurmstepd to close potential open ones.
 -- Step GRES value changed from type "int" to "int64_t" to support larger
    values.
 -- Fix getting reservations to database when database is down.
 -- Fix issue with sbcast not doing a correct fanout.
 -- Fix issue where steps weren't always getting the gres/tres involved.
 -- Fixed double read lock on getting job's gres/tres.
 -- Fix display for RoutePlugin parameter to display the correct value.
 -- Fix route/topology plugin to prevent segfault in sbcast when in use.
 -- Fix Cray slurmconfgen_smw.py script to use nid as nid, not nic.
 -- Fix Cray NHC spawning on job requeue. Previous logic would leave nodes
    allocated to a requeued job as non-usable on job termination.
 -- burst_buffer/cray plugin: Prevent a requeued job from being restarted while
    file stage-out is still in progress. Previous logic could restart the job
    and not perform a new stage-in.
 -- Fix job array formatting to allow return [0-100:2] display for arrays with
    step functions rather than [0,2,4,6,8,...] .
 -- FreeBSD - replace Linux-specific set_oom_adj to avoid errors in slurmd log.
 -- Add option for TopologyParam=NoInAddrAnyCtld to make the slurmctld listen
    on only one port like TopologyParam=NoInAddrAny does for everything else.
 -- Fix burst buffer plugin to prevent corruption of the CPU TRES data when bb
    is not set as an AccountingStorageTRES type.
 -- Surpress error messages in acct_gather_energy/ipmi plugin after repeated
    failures.
 -- Change burst buffer use completion email message from
    "SLURM Job_id=1360353 Name=tmp Staged Out, StageOut time 00:01:47" to
    "SLURM Job_id=1360353 Name=tmp StageOut/Teardown time 00:01:47"
 -- Generate burst buffer use completion email immediately afer teardown
    completes rather than at job purge time (likely minutes later).
 -- Fix issue when adding a new TRES to AccountingStorageTRES for the first
    time.
 -- Update gang scheduling tables when job manually suspended or resumed. Prior
    logic could mess up job suspend/resume sequencing.
 -- Update gang scheduling data structures when job changes in size.
 -- Associations - prevent hash table corruption if uid initially unset for
    a user, which can cause slurmctld to crash if that user is deleted.
 -- Avoid possibly aborting srun on SIGSTOP while creating the job step due to
    threading bug.
 -- Fix deadlock issue with burst_buffer/cray when a newly created burst
    buffer is found.
 -- burst_buffer/cray: Set environment variables just before starting job rather
    than at job submission time to reflect persistent buffers created or
    modified while the job is pending.
 -- Fix check of per-user qos limits on the initial run by a user.
 -- Fix gang scheduling resource selection bug which could prevent multiple jobs
    from being allocated the same resources. Bug was introduced in 15.08.6.
 -- Don't print the Rgt value of an association from the cache as it isn't
    kept up to date.
 -- burst_buffer/cray - If the pre-run operation fails then don't issue
    duplicate job cancel/requeue unless the job is still in run state. Prevents
    jobs hung in COMPLETING state.
 -- task/cgroup - Fix bug in task binding to CPUs.

* Changes in Slurm 15.08.8
==========================
 -- Backfill scheduling properly synchronized with Cray Node Health Check.
    Prior logic could result in highest priority job getting improperly
    postponed.
 -- Make it so daemons also support TopologyParam=NoInAddrAny.
 -- If scancel is operating on large number of jobs and RPC responses from
    slurmctld daemon are slow then introduce a delay in sending the cancel job
    requests from scancel in order to reduce load on slurmctld.
 -- Remove redundant logic when updating a job's task count.
 -- MySQL - Fix querying jobs with reservations when the id's have rolled.
 -- Perl - Fix use of uninitialized variable in slurm_job_step_get_pids.
 -- Launch batch job requsting --reboot after the boot completes.
 -- Move debug messages like "not the right user" from association manager
    to debug3 when trying to find the correct association.
 -- Fix incorrect logic when querying assoc_mgr information.
 -- Move debug messages to debug3 notifying a gres_bit_alloc was NULL for
    gres types without a file.
 -- Sanity Check Patch to setup variables for RAPL if in a race for it.
 -- GRES - Fix minor typecast issues.
 -- burst_buffer/cray - Increase size of intermediate variable used to store
    buffer byte size read from DW instance from 32 to 64-bits to avoid overflow
    and reporting invalid buffer sizes.
 -- Allow an existing reservation with running jobs to be modified without
    Flags=IGNORE_JOBS.
 -- srun - don't attempt to execve() a directory with a name matching the
    requested command
 -- Do not automatically relocate an advanced reservation for individual cores
    that spans multiple nodes when nodes in that reservation go down (e.g.
    a 1 core reservation on node "tux1" will be moved if node "tux1" goes
    down, but a reservation containing 2 cores on node "tux1" and 3 cores on
    "tux2" will not be moved node "tux1" goes down). Advanced reservations for
    whole nodes will be moved by default for down nodes.
 -- Avoid possible double free of memory (and likely abort) for slurmctld in
    background mode.
 -- contribs/cray/csm/slurmconfgen_smw.py - avoid including repurposed compute
    nodes in configs.
 -- Support AuthInfo in slurmdbd.conf that is different from the value in
    slurm.conf.
 -- Fix build on FreeBSD 10.
 -- Fix hdf5 build on ppc64 by using correct fprintf formatting for types.
 -- Fix cosmetic printing of NO_VALs in scontrol show assoc_mgr.
 -- Fix perl api for newer perl versions.
 -- Fix for jobs requesting cpus-per-task (eg. -c3) that exceed the number of
    cpus on a core.
 -- Remove unneeded perl files from the .spec file.
 -- Flesh out filters for scontrol show assoc_mgr.
 -- Add function to remove assoc_mgr_info_request_t members without freeing
    structure.
 -- Fix build on some non-glibc systems by updating includes.
 -- Add new PowerParameters options of get_timeout and set_timeout. The default
    set_timeout was increased from 5 seconds to 30 seconds. Also re-read current
    power caps periodically or after any failed "set" operation.
 -- Fix slurmdbd segfault when listing users with blank user condition.
 -- Save the ClusterName to a file in SaveStateLocation, and use that to
    verify the state directory belongs to the given cluster at startup to avoid
    corruption from multiple clusters attempting to share a state directory.
 -- MYSQL - Fix issue when rerolling monthly data to work off correct time
    period.  This would only hit you if you rerolled a 15.08 prior to this
    commit.
 -- If FastSchedule=0 is used make sure TRES are set up correctly in accounting.
 -- Fix sreport's truncation of columns with large TRES and not using
    a parsing option.
 -- Make sure count of boards are restored when slurmctld has option -R.
 -- When determine if a job can fit into a TRES time limit after resources
    have been selected set the time limit appropriately if the job didn't
    request one.
 -- Fix inadequate locks when updating a partition's TRES.
 -- Add new assoc_limit_continue flag to SchedulerParameters.
 -- Avoid race in acct_gather_energy_cray if energy requested before available.
 -- MYSQL - Avoid having multiple default accounts when a user is added to
    a new account and making it a default all at once.

* Changes in Slurm 15.08.7
==========================
 -- sched/backfill: If a job can not be started within the configured
    backfill_window, set it's start time to 0 (unknown) rather than the end
    of the backfill_window.
 -- Remove the 1024-character limit on lines in batch scripts.
 -- burst_buffer/cray: Round up swap size by configured granularity.
 -- select/cray: Log repeated aeld reconnects.
 -- task/affinity: Disable core-level task binding if more CPUs required than
    available cores.
 -- Preemption/gang scheduling: If a job is suspended at slurmctld restart or
    reconfiguration time, then leave it suspended rather than resume+suspend.
 -- Don't use lower weight nodes for job allocation when topology/tree used.
 -- BGQ - If a cable goes into error state remove the under lying block on
    a dynamic system and mark the block in error on a static/overlap system.
 -- BGQ - Fix regression in 9cc4ae8add7f where blocks would be deleted on
    static/overlap systems when some hardware issue happens when restarting
    the slurmctld.
 -- Log if CLOUD node configured without a resume/suspend program or suspend
    time.
 -- MYSQL - Better locking around g_qos_count which was previously unprotected.
 -- Correct size of buffer used for jobid2str to avoid truncation.
 -- Fix allocation/distribution of tasks across multiple nodes when
    --hint=nomultithread is requested.
 -- If a reservation's nodes value is "all" then track the current nodes in the
    system, even if those nodes change.
 -- Fix formatting if using "tree" option with sreport.
 -- Make it so sreport prints out a line for non-existent TRES instead of
    error message.
 -- Set job's reason to "Priority" when higher priority job in that partition
    (or reservation) can not start rather than leaving the reason set to
    "Resources".
 -- Fix memory corruption when a new non-generic TRES is added to the
    DBD for the first time.  The corruption is only noticed at shutdown.