NEWS

This file describes changes in recent versions of Slurm. It primarily
documents those changes that are of interest to users and administrators.

* Changes in Slurm 17.02.0pre4
==============================
 -- Add support for per-partitiion OverTimeLimit configuration.
 -- Add --mem_bind option of "sort" to run zonesort on KNL nodes at step start.
 -- Add LaunchParameters=mem_sort option to configure running of zonesort
    by default at step startup.
 -- Add "FreeSpace" information for each pool to the "scontrol show burstbuffer"
    output. Required changes to the burst_buffer_info_t data structure.
 -- Add new node state flag of NODE_STATE_REBOOT for node reboots triggered by
    "scontrol reboot" commands. Previous logic re-used NODE_STATE_MAINT flag,
    which could lead to inconsistencies. Add "ASAP" option to "scontrol reboot"
    command that will drain a node in order to reboot it as soon as possible,
    then return it to service.
 -- Allow unit conversion routine to convert 1024M to 1G.

* Changes in Slurm 17.02.0pre3
==============================
 -- Add srun host & PID to job step data structures.
 -- Avoid creating duplicate pending step records for the same srun command.
 -- Rewrite srun's logic for pending steps for better efficiency (fewer RPCs).
 -- Added new SchedulerParameters options step_retry_count and step_retry_time
    to control scheduling behaviour of job steps waiting for resources.
 -- Optimize resource allocation logic for --spread-job job option.
 -- Modify cpu_bind and mem_bind map and mask options to accept a repetition
    count to better support large task count. For example:
    "mask_mem:0x0f*2,0xf0*2" is equivalent to "mask_mem:0x0f,0x0f,0xf0,0xf0".
 -- Add suppoprt for --mem_bind=prefer option to prefer, but not restrict memory
    use to the identified NUMA node.
 -- Add mechanism to constrain kernel memory allocation using cgroups. New
    cgroup.conf parameters added: ConstrainKmemSpace, MaxKmemPercent, and
    MinKmemSpace.
 -- Correct invokation of man2html, which previously could cause FreeBSD builds
    to hang.
 -- MYSQL - Unconditionally remove 'ignore' clause from 'alter ignore'.
 -- Modify service files to not start Slurm daemons until after Munge has been
    started.
    NOTE: If you are not using Munge, but are using the "service" scripts to
    start Slurm daemons, then you will need to remove this check from the
    etc/slurm*service scripts.
 -- Do not process SALLOC_HINT, SBATCH_HINT or SLURM_HINT environment variables
    if any of the following salloc, sbatch or srun command line options are
    specified: -B, --cpu_bind, --hint, --ntasks-per-core, or --threads-per-core.
 -- burst_buffer/cray: Accept new jobs on backup slurmctld daemon without access
    to dw_wlm_cli command. No burst buffer actions will take place.
 -- Do not include SLURM_JOB_DERIVED_EC, SLURM_JOB_EXIT_CODE, or
    SLURM_JOB_EXIT_CODE in PrologSlurmctld environment (not available yet).
 -- Cray - set task plugin to fatal() if task/cgroup is not loaded after
    task/cray in the TaskPlugin settings.
 -- Remove separate slurm_blcr package. If Slurm is built with BLCR support,
    the files will now be part of the main Slurm packages.
 -- Replace sjstat, seff and sjobexit RPM packages with a single "contribs"
    package.
 -- Remove long since defunct slurmdb-direct scripts.
 -- Add SbcastParameters configuration option to control default file
    destination directory and compression algorithm.
 -- Add new SchedulerParameter (max_array_tasks) to limit the maximum number of
    tasks in a job array independently from the maximum task ID (MaxArraySize).
 -- Fix issue where number of nodes is not properly allocated when sbatch and
    salloc are requested with -n tasks < hosts from -w hostlist or from -N.
 -- Add infrastructure for submitting federated jobs.

* Changes in Slurm 17.02.0pre2
==============================
 -- Add new RPC (REQUEST_EVENT_LOG) so that slurmd and slurmstepd can log events
    through the slurmctld daemon.
 -- Remove sbatch --bb option. That option was never supported.
 -- Automically cleanup task/cgroup cpuset and devices cgroups after steps are
    done.
 -- Add federation read/write locks.
 -- Limit job purge run time to 1 second at a time.
 -- The database index for jobs is now 64 bit.  If you happen to be close to
    4 billion jobs in your database you will want to update your slurmctld at
    the same time as your slurmdbd to prevent roll over of this variable as
    it is 32 bit previous versions of Slurm.
 -- Optionally lock slurmstepd in memory for performance reasons and to avoid
    possible SIGBUS if the daemon is paged out at the time of a Slurm upgrade
    (changing plugins). Controlled via new LaunchParameters options of
    slurmstepd_memlock and slurmstepd_memlock_all.
 -- Add event trigger on burst buffer errors (see strigger man page,
    --burst_buffer option).
 -- Add job AdminComment field which can only be set by a Slurm administrator.
 -- Add salloc, sbatch and srun option of --delay-boot=<time>, which will
    temporarily delay booting nodes into the desired state for a job in the
    hope of using nodes already in the proper state which will be available at
    a later time.
 -- Add job burst_buffer_state and delay_boot fields to scontrol and squeue
    output. Also add ability to modify delay_boot from scontrol.
 -- Fix for node's available tres array getting filled in with configured gres
    model types.
 -- Log if job --bb option contains any unrecognized content.
 -- Display configured and allocated tres for nodes in scontrol show nodes.
 -- Change all memory values (in MB) to uint64_t to accommodate > 2TB per node.
 -- Add MailDomain option to qualify email addresses.
 -- Refactor the persistent connections within the federation code to use
    the same logic that was found in the slurmdbd.  Now both functionalities
    share the same code.
 -- Remove BlueGene/L and BlueGene/P support.
 -- Add "flag" field to launch_tasks_request_msg. Remove the following fields
    (moved into flags): multi_prog, task_flags, user_managed_io, pty,
    buffered_stdio, and labelio.
 -- Add protocol version to slurmd startup communications for slurmstepd to
    permit changes in the protocol.

* Changes in Slurm 17.02.0pre1
==============================
 -- burst_buffer/cray - Add support for rounding up the size of a buffer reqeust
    if the DataWarp configuration "equalize_fragments" is used.
 -- Remove AIX support.
 -- Rename "in" to "input" in slurm_step_io_fds data structure defined in
    slurm.h. This is needed to avoid breaking Python with by using one of its
    keywords in a Slurm data structure.
 -- Remove eligible_time from jobcomp/elasticsearch.
 -- Fix issue where if no clusters were added but yet a QOS needed to be
    deleted make it possible.
 -- SlurmDBD - change all timestamps to bigint from int to solve Y2038 problem.
 -- Add salloc/sbatch/srun --spread-job to distribute tasks over as many nodes
    as possible. This also treats the --ntasks-per-node option as a maximum
    value.
 -- Add ConstrainKmemSpace to cgroup.conf, defaulting to yes, to allow
    cgroup Kmem enforcement to be disabled while still using ConstrainRAMSpace.
 -- Add support for sbatch --bbf option.
 -- Add burst buffer support for job arrays. Add new SchedulerParameters option
    of bb_array_stage_cnt=# to indicate how many pending tasks of a job array
    should be made available for burst buffer resource allocation.
 -- Fix small memory leak when a job fails to load from state save.
 -- Fix invalid read when attempting to delete clusters from db with running
    jobs.
 -- Fix small memory leak when deleting clusters from db.
 -- Add SLURM_ARRAY_TASK_COUNT environment variable. Total number of tasks in a
    job array (e.g. "--array=2,4,8" will set SLURM_ARRAY_TASK_COUNT=3).
 -- Add new sacctmgr commands: "shutdown" (shutdown the server), "list stats"
    (get server statistics) "clear stats" (clear server statistics).
 -- Restructure job accounting query to use 'id_job in (1, 2, .. )' format
    instead of logically equivalent 'id_job = 1 || id_job = 2 || ..' .
 -- Added start_delay field to jobcomp/elasticsearch.
 -- In order to support federated jobs, the MaxJobID configuration parameter
    default value has been reduced from 2,147,418,112 to 67,043,328 and its
    maximum value is now 67,108,863. Upon upgrading, any pre-existing jobs that
    have a job ID above the new range will continue to run and new jobs will get
    job IDs in the new range.
 -- Added infrastructure for setting up federations in database and establishing
    connections between federation clusters.

* Changes in Slurm 16.05.7
==========================
 -- Fix issue in the priority/multifactor plugin where on a slurmctld restart
    more time than should be allowed would be accounted for.
 -- cray/busrt_buffer - If total_space in a pool decreases, reset used_space
    rather than trying to account for buffer allocations in progress.
 -- cray/busrt_buffer - Fix for double counting of used_space at slurmctld
    startup.
 -- Fix regression in 16.05.6 where if you request multiple cpus per task (-c2)
    and request --ntasks-per-core=1 and only 1 task on the node
    the slurmd would abort on an infinite loop fatal.
 -- cray/busrt_buffer - Internally track both allocated and unusable space.
    The reported UsedSpace in a pool is now the allocated space (previously was
    unusable space). Base available space on whichever value leaves least free
    space.
 -- cray/burst_buffer - Preserve job ID and don't translate to job array ID.
 -- cray/burst_buffer - Update "instance" parsing to match updated dw_wlm_cli
    output.
 -- sched/backfill - Insure we don't try to start a job that was already started
    and requeued by the main scheduling logic.
 -- job_submit/lua - add access to the job features field in job_record.
 -- select/linear plugin modified to better support heterogeneous clusters when
    topology/none is also configured.
 -- Permit cancellation of jobs in configuring state.
 -- acct_gather_energy/rapl - prevent segfault in slurmd from race to gather
    data at slurmd startup.
 -- Integrate node_feature/knl_generic with "hbm" GRES information.
 -- Fix output routines to prevent rounding the TRES values for memory or BB.
 -- switch/cray plugin - fix use after free error.

* Changes in Slurm 16.05.6
==========================
 -- Docs - the correct default value for GroupUpdateForce is 0.
 -- mpi/pmix - improve point to point communication performance.
 -- SlurmDB - include pending jobs in search during 'sacctmgr show runawayjobs'.
 -- Add client side out-of-range checks to --nice flag.
 -- Fix support for sbatch "-W" option, previously eeded to use "--wait".
 -- node_features/knl_cray plugin and capmc_suspend/resume programs modified to
    sleep and retry capmc operations if the Cray State Manager is down. Added
    CapmcRetries configuration parameter to knl_cray.conf.
 -- node_features/knl_cray plugin: Remove any KNL MCDRAM or NUMA features from
    node's configuration if capmc does NOT report the node as being KNL.
 -- node_features/knl_cray plugin: drain any node not reported by
    "capmc node_status" on startup or reconfig.
 -- node_features/knl_cray plugin: Substantially streamline and speed up logic
    to load current node state on reconfigure failure or unexpected node boot.
 -- node_features/knl_cray plugin: Add separate thread to interact with capmc
    in response to unexpected node reboots.
 -- node_features plugin - Add "mode" argument to node_features_p_node_xlate()
    function to fix some bugs updating a node's features using the node update
    RPC.
 -- node_features/knl_cray plugin: If the reconfiguration of nodes for an
    interactive job fails, kill the job (it can't be requeued like a batch job).
 -- Testsuite - Added srun/salloc/sbatch tests with --use-min-nodes option.