NEWS

This file describes changes in recent versions of Slurm. It primarily
documents those changes that are of interest to users and administrators.

* Changes in Slurm 14.11.4
==========================

* Changes in Slurm 14.11.3
==========================
 -- Prevent vestigial job record when cancelling a pending job array record.
 -- Fixed squeue core dump.
 -- Fix job array hash table bug, could result in slurmctld infinite loop or
    invalid memory reference.
 -- In srun honor ntasks_per_node before looking at cpu count when the user
    doesn't request a number of tasks.
 -- Fix ghost job when submitting job after all jobids are exhausted.
 -- MySQL - Enhanced coordinator security checks.
 -- Fix for task/affinity if an admin configures a node for having threads
    but then sets CPUs to only represent the number of cores on the node.
 -- Make it so previous versions of salloc/srun work with newer versions
    of Slurm daemons.
 -- Avoid delay on commit for PMI rank 0 to improve performance with some
    MPI implementations.
 -- auth/munge - Correct logic to read old format AccountingStoragePass.
 -- Reset node "RESERVED" state as appropriate when deleting a maintenance
    reservation.
 -- Prevent a job manually suspended from being resumed by gang scheduler once
    free resources are available.
 -- Prevent invalid job array task ID value if a task is started using gang
    scheduling.
 -- Fixes for clean build on FreeBSD.
 -- Fix documentation bugs in slurm.conf.5. DenyAccount should be DenyAccounts.
 -- For backward compatibility with older versions of OMPI not compiled
    with --with-pmi restore the SLURM_STEP_RESV_PORTS in the job environment.
 -- Update the html documentation describing the integration with openmpi.
 -- Fix sacct when searching by nodelist.
 -- Fix cosmetic info statements when dealing with a job array task instead of
    a normal job.
 -- Fix segfault with job arrays.
 -- Correct the sbatch pbs parser to process -j.
 -- BGQ - Put print statement under a DebugFlag.  This was just an oversight.
 -- BLUEGENE - Remove check that would erroneously remove the CONFIGURING
    flag from a job while the job is waiting for a block to boot.
 -- Fix segfault in slurmstepd when job exceeded memory limit.
 -- Fix race condition that could start a job that is dependent upon a job array
    before all tasks of that job array complete.
 -- PMI2 race condition fix.

* Changes in Slurm 14.11.2
==========================
 -- Fix Centos5 compile errors.
 -- Fix issue with association hash not getting the correct index which
    could result in seg fault.
 -- Fix salloc/sbatch -B segfault.
 -- Avoid huge malloc if GRES configured with "Type" and huge "Count".
 -- Fix jobs from starting in overlapping reservations that won't finish before
    a "maint" reservation begins.
 -- When node gets drained while in state mixed display its status as draining
    in sinfo output.
 -- Allow priority/multifactor to work with sched/wiki(2) if all priorities
    have no weight.  This allows for association and QOS decay limits to work.
 -- Fix "squeue --start" to override SQUEUE_FORMAT env variable.
 -- Fix scancel to be able to cancel multiple jobs that are space delimited.
 -- Log Cray MPI job calling exit() without mpi_fini(), but do not treat it as
    a fatal error. This partially reverts logic added in version 14.03.9.
 -- sview - Fix displaying of suspended steps elapsed times.
 -- Increase number of messages that get cached before throwing them away
    when the DBD is down.
 -- Fix jobs from starting in overlapping reservations that won't finish before
    a "maint" reservation begins.
 -- Restore GRES functionality with select/linear plugin. It was broken in
    version  14.03.10.
 -- Fix bug with GRES having multiple types that can cause slurmctld abort.
 -- Fix squeue issue with not recognizing "localhost" in --nodelist option.
 -- Make sure the bitstrings for a partitions Allow/DenyQOS are up to date
    when running from cache.
 -- Add smap support for job arrays and larger job ID values.
 -- Fix possible race condition when attempting to use QOS on a system running
    accounting_storage/filetxt.
 -- Fix issue with accounting_storage/filetxt and job arrays not being printed
    correctly.
 -- In proctrack/linuxproc and proctrack/pgid, check the result of strtol()
    for error condition rather than errno, which might have a vestigial error
    code.
 -- Improve information recording for jobs deferred due to advanced
    reservation.
 -- Exports eio_new_initial_obj to the plugins and initialize kvs_seq on
    mpi/pmi2 setup to support launching.

* Changes in Slurm 14.11.1
==========================
 -- Get libs correct when doing the xtree/xhash make check.
 -- Update xhash/tree make check to work correctly with current code.
 -- Remove the reference 'experimental' for the jobacct_gather/cgroup
    plugin.
 -- Add QOS manipulation examples to the qos.html documentation page.
 -- If 'squeue -w node_name' specifies an unknown host name print
    an error message and return 1.
 -- Fix race condition in job_submit plugin logic that could cause slurmctld to
    deadlock.
 -- Job wait reason of "ReqNodeNotAvail" expanded to identify unavailable nodes
    (e.g. "ReqNodeNotAvail(Unavailable:tux[3-6])").

* Changes in Slurm 14.11.0
==========================
 -- ALPS - Fix issue with core_spec warning.
 -- Allow multiple partitions to be specified in sinfo -p.
 -- Install the service files in /usr/lib/systemd/system.
 -- MYSQL - Add id_array_job and id_resv keys to $CLUSTER_job_table.  THIS
    COULD TAKE A WHILE TO CREATE THE KEYS SO BE PATIENT.
 -- CRAY - Resize bitmaps on a restart and find we have more blades
    than before.
 -- Add new eio API function for removing unused connections.
 -- ALPS - Fix issue where batch allocations weren't correctly confirmed or
    released.
 -- Define DEFAULT_MAX_TASKS_PER_NODE based on MAX_TASKS_PER_NODE from
    slurm.h as per documentation.
 -- Update the FAQ about relocating slurmctld.
 -- In the memory cgroup enable memory.use_hierarchy in the cgroup root.
 -- Export eio.c functions for use by MPI/PMI2.
 -- Add SLURM_CLUSTER_NAME to job environment.

* Changes in Slurm 14.11.0rc3
=============================
 -- Allow envs to override autotools binaries in autogen.sh
 -- Added system services files.
 -- If the jobs pends with DependencyNeverSatisfied keep it pending even after
    the job which it was depending upon was cleaned.
 -- Let operators (in addition to user root and SlurmUser) see job script for
    other user's jobs.
 -- Perl API modified to return node state of MIXED rather than ALLOCATED if
    only some CPUs allocated.
 -- Double Munge connect retry timeout from 1 to 2 seconds.
 -- sview - Remove unneeded code that was resolved globally in commit
    98e24b0dedc.
 -- Collect and report the accounting of the batch step and its children.
 -- Add configure checks for faccessat and eaccess, and make use of one of
    them if available.
 -- Make configure --enable-developer also set --enable-debug
 -- Introduce a SchedulerParameters variable kill_invalid_depend, if set
    then jobs pending with invalid dependency are going to be terminated.
 -- Move spank_user_task() call in slurmstepd after the task_g_pre_launch()
    so that the task affinity information is available to spank.
 -- Make /etc/init.d/slurm script return value 3 when the daemon is
    not running. This is required by Linux Standard Base Core
    Specification 3.1

* Changes in Slurm 14.11.0rc2
=============================
 -- Logs for jobs which are explicitly requeued will say so rather than saying
    that a node in their allocation failed.
 -- Updated the documentation about the remote licenses served by
    the Slurm database.
 -- Insure that slurm_spank_exit() is only called once from srun.
 -- Change the signature of net_set_low_water() to use 4 bytes instead of 8.
 -- Export working_cluster_rec in libslurmdb.so as well as move some function
    definitions needed for drmaa.
 -- If using cons_res or serial cause a fatal in the plugin instead of causing
    the SelectTypeParameters to magically set to CR_CPU.
 -- Enhance task/affinity auto binding to consider tasks * cpus-per-task.
 -- Fix regression the priority/multifactor which would cause memory corruption.
    Issue is only in rc1.
 -- Add PrivateData value of "cloud". If set, powered down nodes in the cloud
    will be visible.
 -- Sched/backfill - Eliminate clearing start_time of running jobs.
 -- Fix various backwards compatibility issues.
 -- If failed to launch a batch job, requeue it in hold.

* Changes in Slurm 14.11.0rc1
=============================
 -- When using cgroup name the batch step as step_batch instead of
    batch_4294967294
 -- Changed LEVEL_BASED priority to be "Fair_Tree"
 -- Port to NetBSD.
 -- BGQ - Add cnode based reservations.
 -- Alongside totalview_jobid implement totalview_stepid available
    to sattach.
 -- Add ability to include other files in slurm.conf based upon the ClusterName.
 -- Update strlcpy to latest upstream version.
 -- Add reservation information in the sacct and sreport output.
 -- Add job priority calculation check for overflow and fix memory leak.
 -- Add SchedulerParameters option of pack_serial_at_end to put serial jobs at
    the end of the available nodes rather than using a best fit algorithm.
 -- Allow regular users to view default sinfo output when
    privatedata=reservations is set.
 -- PrivateData=reservation modified to permit users to view the reservations
    which they have access to (rather then preventing them from seeing ANY
    reservation).
 -- job_submit/lua: Fix job_desc set field logic

* Changes in Slurm 14.11.0pre5
==============================
 -- Fix sbatch --export=ALL, it was treated by srun as a request to explicitly
    export only the environment variable named "ALL".
 -- Improve scheduling of jobs in reservations that overlap other reservations.
 -- Modify sgather to make global file systems easier to configure.
 -- Added sacctmgr reconfig to reread the slurmdbd.conf in the slurmdbd.
 -- Modify scontrol job operations to accept comma delimited list of job IDs.
    Applies to job update, hold, release, suspend, resume, requeue, and
    requeuehold operations.
 -- Refactor job_submit/lua interface. LUA FUNCTIONS NEED TO CHANGE! The