NEWS

This file describes changes in recent versions of SLURM. It primarily
documents those changes that are of interest to users and admins.

* Changes in Slurm 2.5.5
========================
 -- Fix for sacctmgr add qos to handle the 'flags' option.
 -- Export SLURM_ environment variables from sbatch, even if "--export"
    option does not explicitly list them.
 -- If node is in more than one partition, correct counting of allocated CPUs.
 -- If step requests more CPUs than possible in specified node count of job
    allocation then return ESLURM_TOO_MANY_REQUESTED_CPUS rather than
    ESLURM_NODES_BUSY and retrying.
 -- CRAY - Fix SLURM_TASKS_PER_NODE to be set correctly.
 -- Accounting - more checks for strings with a possible `'` in it.
 -- sreport - Fix by adding planned down time to utilization reports.
 -- Do not report an error when sstat identifies job steps terminated during
    its execution, but log using debug type message.
 -- Select/cons_res - Permit node removed from job by going down to be returned
    to service and re-used by another job.
 -- Select/cons_res - Tighter packing of job allocations on sockets.
 -- SlurmDBD - fix to allow user root along with the slurm user to register a
    cluster.
 -- Select/cons_res - Fix for support of consecutive node option.

* Changes in Slurm 2.5.4
========================
 -- Fix bug in PrologSlurmctld use that would block job steps until node
    responds.
 -- CRAY - If a partition has MinNodes=0 and a batch job doesn't request nodes
    put the allocation to 1 instead of 0 which prevents the allocation to
    happen.
 -- Better debug when the database is down and using the --cluster option in
    the user commands.
 -- When asking for job states with sacct, default to 'now' instead of midnight
    of the current day.
 -- Fix for handling a test-only job or immediate job that fails while being
    built.
 -- Comment out all of the logic in the job_submit/defaults plugin. The logic
    is only an example and not meant for actual use.
 -- Eliminate configuration file 4096 character line limitation.
 -- More robust logic for tree message forward
 -- BGQ - When cnodes fail in a timeout fashion correctly look up parent
    midplane.
 -- Correct sinfo "%c" (node's CPU count) output value for Bluegene systems.
 -- Backfill - Responsive improvements for systems with large numbers of jobs
    (>5000) and using the SchedulerParameters option bf_max_job_user.
 -- slurmstepd: ensure that IO redirection openings from/to files correctly
    handle interruption
 -- BGQ - Able to handle when midplanes go into Hardware::SoftwareFailure
 -- GRES - Correct tracking of specific resources used after slurmctld restart.
    Counts would previously go negative as jobs terminate and decrement from
    a base value of zero.
 -- Fix for priority/multifactor2 plugin to not assert when configured with
    --enable-debug.
 -- Select/cons_res - If the job request specified --ntasks-per-socket and the
    allocation using is cores, then pack the tasks onto the sockets up to the
    specified value.
 -- BGQ - If a cnode goes into an 'error' state and the block containing the
    cnode does not have a job running on it do not resume the block.
 -- BGQ - Handle blocks that don't free themselves in a reasonable time better.
 -- BGQ - Fix for signaling steps when allocation ends before step.
 -- Fix for backfill scheduling logic with job preemption; starts more jobs.
 -- xcgroup - remove bugs with EINTR management in write calls
 -- jobacct_gather - fix total values to not always == the max values.
 -- Fix for handling node registration messages from older versions without
    energy data.
 -- BGQ - Allow user to request full dimensional mesh.
 -- sdiag command - Correction to jobs started value reported.
 -- Prevent slurmctld assert when invalid change to reservation with running
    jobs is made.
 -- BGQ - If signal is NODE_FAIL allow forward even if job is completing
    and timeout in the runjob_mux trying to send in this situation.
 -- BGQ - More robust checking for correct node, task, and ntasks-per-node
    options in srun, and push that logic to salloc and sbatch.
 -- GRES topology bug in core selection logic fixed.
 -- Fix to handle init.d script for querying status and not return 1 on
    success.

* Changes in SLURM 2.5.3
========================
 -- Gres/gpu plugin - If no GPUs requested, set CUDA_VISIBLE_DEVICES=NoDevFiles.
    This bug was introduced in 2.5.2 for the case where a GPU count was
    configured, but without device files.
 -- task/affinity plugin - Fix bug in CPU masks for some processors.
 -- Modify sacct command to get format from SACCT_FORMAT environment variable.
 -- BGQ - Changed order of library inclusions and fixed incorrect declaration
    to compile correctly on newer compilers
 -- Fix for not building sview if glib exists on a system but not the gtk libs.
 -- BGQ - Fix for handling a job cleanup on a small block if the job has long
    since left the system.
 -- Fix race condition in job dependency logic which can result in invalid
    memory reference.

* Changes in SLURM 2.5.2
========================
 -- Fix advanced reservation recovery logic when upgrading from version 2.4.
 -- BLUEGENE - fix for QOS/Association node limits.
 -- Add missing "safe" flag from print of AccountStorageEnforce option.
 -- Fix logic to optimize GRES topology with respect to allocated CPUs.
 -- Add job_submit/all_partitions plugin to set a job's default partition
    to ALL available partitions in the cluster.
 -- Modify switch/nrt logic to permit build without libnrt.so library.
 -- Handle srun task launch failure without duplicate error messages or abort.
 -- Fix bug in QoS limits enforcement when slurmctld restarts and user not yet
    added to the QOS list.
 -- Fix issue where sjstat and sjobexitmod was installed in 2 different RPMs.
 -- Fix for job request of multiple partitions in which some partitions lack
    nodes with required features.
 -- Permit a job to use a QOS they do not have access to if an administrator
    manually set the job's QOS (previously the job would be rejected).
 -- Make more variables available to job_submit/lua plugin: slurm.MEM_PER_CPU,
    slurm.NO_VAL, etc.
 -- Fix topology/tree logic when nodes defined in slurm.conf get re-ordered.
 -- In select/cons_res, correct logic to allocate whole sockets to jobs. Work
    by Magnus Jonsson, Umea University.
 -- In select/cons_res, correct logic when job removed from only some nodes.
 -- Avoid apparent kernel bug in 2.6.32 which apparently is solved in
    at least 3.5.0.  This avoids a stack overflow when running jobs on
    more than 120k nodes.
 -- BLUEGENE - If we made a block that isn't runnable because of a overlapping
    block, destroy it correctly.
 -- Switch/nrt - Dynamically load libnrt.so from within the plugin as needed.
    This eliminates the need for libnrt.so on the head node.
 -- BLUEGENE - Fix in reservation logic that could cause abort.

* Changes in SLURM 2.5.1
========================
 -- Correction to hostlist sorting for hostnames that contain two numeric
    components and the first numeric component has various sizes (e.g.
    "rack9blade1" should come before "rack10blade1")
 -- BGQ - Only poll on initialized blocks instead of calling getBlocks on
    each block independently.
 -- Fix of task/affinity plugin logic for Power7 processors having hyper-
    threading disabled (cpu mask has gaps).
 -- Fix of job priority ordering with sched/builtin and priority/multifactor.
    Patch from Chris Read.
 -- CRAY - Fix for setting up the aprun for a large job (+2000 nodes).
 -- Fix for race condition related to compute node boot resulting in node being
    set down with reason of "Node <name> unexpectedly rebooted"
 -- RAPL - Fix for handling errors when opening msr files.
 -- BGQ - Fix for salloc/sbatch to do the correct allocation when asking for
    -N1 -n#.
 -- BGQ - in emulation make it so we can pretend to run large jobs (>64k nodes)
 -- BLUEGENE - Correct method to update conn_type of a job.
 -- BLUEGENE - Fix issue with preemption when needing to preempt multiple jobs
    to make one job run.
 -- Fixed issue where if an srun dies inside of an allocation abnormally it
    would of also killed the allocation.
 -- FRONTEND - fixed issue where if a systems nodes weren't defined in the
    slurm.conf with NodeAddr's signals going to a step could be handled
    incorrectly.
 -- If sched/backfill starts a job with a QOS having NO_RESERVE and not job
    time limit, start it with the partition time limit (or one year if the
    partition has no time limit) rather than NO_VAL (140 year time limit);
 -- Alter hostlist logic to allocate large grid dynamically instead of on
    stack.
 -- Change RPC version checks to support version 2.5 slurmctld with version 2.4
    slurmd daemons.
 -- Correct core reservation logic for use with select/serial plugin.
 -- Exit scontrol command on stdin EOF.
 -- Disable job --exclusive option with select/serial plugin.

* Changes in SLURM 2.5.0
========================
 -- Add DenyOnLimit flag for QOS to deny jobs at submission time if they
    request resources that reach a 'Max' limit.
 -- Permit SlurmUser or operator to change QOS of non-pending jobs (e.g.
    running jobs).
 -- BGQ - move initial poll to beginning of realtime interaction, which will
    also cause it to run if the realtime server ever goes away.

* Changes in SLURM 2.5.0-rc2
============================
 -- Modify sbcast logic to survive slurmd daemon restart while file a
    transmission is in progress.
 -- Add retry logic to munge encode/decode calls. This is needed if the munge
    deamon is under very heavy load (e.g. with 1000 slurmd daemons per compute
    node).
 -- Add launch and acct_gather_energy plugins to RPMs.
 -- Restore support for srun "--mpi=list" option.
 -- CRAY - Introduce step accounting for a Cray.
 -- Modify srun to abandon I/O 60 seconds after the last task ends. Otherwise
    an aborted slurmstepd can cause the srun process to hang indefinitely.
 -- ENERGY - RAPL - alter code to close open files (and only open them once
    where needed)
 -- If the PrologSlurmctld fails, then requeue the job an indefinite number
    of times instead of only one time.

* Changes in SLURM 2.5.0-rc1
============================
 -- Added Prolog and Epilog Guide (web page). Based upon work by Jason Sollom,
    Cray Inc. and used by permission.
 -- Restore gang scheduling functionality. Preemptor was not being scheduled.
    Fix for bugzilla #3.
 -- Add "cpu_load" to node information. Populate CPULOAD in node information
    reported to Moab cluster manager.
 -- Preempt jobs only when insufficient idle resources exist to start job,
    regardless of the node weight.
 -- Added priority/multifactor2 plugin based upon ticket distribution system.
    Work by Janne Blomqvist, Aalto University.