NEWS

This file describes changes in recent versions of SLURM. It primarily
documents those changes that are of interest to users and admins.

* Changes in SLURM 2.2.0.pre12
==============================
 -- Log if Prolog or Epilog run for longer than MessageTimeout / 2.
 -- Log the RPC number associated with messages from slurmctld that timeout.
 -- Fix bug in select/cons_res logic when job allocation includes --overcommit
    and --ntasks-per-node options and the node has fewer CPUs than the count
    specified by --ntasks-per-node.
 -- Fix bug in gang scheduling and job preemption logic so that preempted jobs
    get resumed properly after a slurmctld hot-start.
 -- Fix bug in select/linear handling of gang scheduled jobs that could result
    in run_job_cnt underflow error message.
 -- Fix bug in gang scheduling logic to properly support partitions added
    using the scontrol command.
 -- Fix a segmentation fault in sview where the 'excluded_partitions' field
    was set to NULL, caused by the absence of ~/.slurm/sviewrc.
 -- Rewrote some calls to is_user_any_coord() in src/plugins/accounting_storage
    modules to make use of is_user_any_coord()'s return value.
 -- Add configure option of --with=dimensions=#.
 -- Modify srun ping logic so that srun would only be considered not responsive
    if three ping messages were not responded to. Patch from Hongjia Cao (NUDT).
 -- Preserve a node's ReasonTime field after scontrol reconfig command. Patch
    from Hongjia Cao (NUDT).
 -- Added the authority for users with AdminLevel's defined in the SLURM db
    (Operators and Admins) and account coordinators to invoke commands that
    affect jobs, reservations, nodes, etc.
 -- Fix for slurmd restart on completing node with no tasks to get the correct
    state, completing. Patch from Hongjia Cao (NUDT).
 -- Prevent scontrol setting a node's Reason="". Patch from Hongjia Cao (NUDT).
 -- Add new functions hostlist_ranged_string_malloc, 
    hostlist_ranged_string_xmalloc, hostlist_deranged_string_malloc, and
    hostlist_deranged_string_xmalloc which will allocate memory as needed.
 -- Make the slurm commands support both the --cluster and --clusters option.
    Previously, some commands support one of those options, but not the other.
 -- Fix bug when resizing a job that has steps running on some of those nodes.
    Avoid killing the job step on remaining nodes. Patch from Rod Schultz
    (BULL). Also fix bug related to tracking the CPUs allocated to job steps
    on each node after releasing some nodes from the job's allocation.
 -- Applied patch from Rod Schultz / Matthieu Hautreux to keep the Node-to-Host
    cache from becoming corrupted when a hostname cannot be resolved.
 -- Export more symbols in libslurm for job and node state information
    translation (numbers to strings). Patch from Hongia Cao, NUDT.

* Changes in SLURM 2.2.0.pre11
==============================
 -- Permit a regular user to change the partition of a pending job.
 -- Major re-write of the job_submit/lua plugin to pass pointers to available
    partitions and use lua metatables to reference the job and partition fields.
 -- Add support for serveral new trigger types: SlurmDBD failure/restart,
    Database failure/restart, Slurmctld failure/restart.
 -- Add support for SLURM_CLUSTERS environment variable in the sbatch, sinfo,
    squeue commands.
 -- Modify the sinfo and squeue commands to report state of multiple clusters
    if the --clusters option is used.
 -- Added printf __attribute__ qualifiers to info, debug, ... to help prevent
    bad/incorrect parameters being sent to them.  Original patch from
    Eygene Ryabinkin (Russian Research Centre).
 -- Fix bug in slurmctld job completion logic when nodes allocated to a
    completing job are re-booted. Patch from Hongjia Cao (NUDT).
 -- In slurmctld's node record data structure, rename "hilbert_integer" to
    "node_rank".
 -- Add topology/node_rank plugin to sort nodes based upon rank loaded from
    BASIL on Cray computers.
 -- Fix memory leak in the auth/munge and crypto/munge plugins in the case of
    some failure modes.

* Changes in SLURM 2.2.0.pre10
==============================
 -- Fix issue when EnforcePartLimits=yes in slurm.conf all jobs where no nodecnt
    was specified the job would be seen to have maxnodes=0 which would not
    allow jobs to run.
 -- Fix issue where if not suspending a job the gang scheduler does the correct
    kill procedure.
 -- Fixed some issues when dealing with jobs from a 2.1 system so they live
    after an upgrade.
 -- In srun, log if --cpu_bind options are specified, but not supported by the
    current system configuration.
 -- Various Patchs from Hongjia Cao dealing with bugs found in sacctmgr and
    the slurmdbd.
 -- Fix bug in changing the nodes allocated to a running job and some node
    names specified are invalid, avoid invalid memory reference.
 -- Fixed filename substitution of %h and %n based on patch from Ralph Bean
 -- Added better job sorting logic when preempting jobs with qos.
 -- Log the IP address and port number for some communication errors.
 -- Fix bug in select/cons_res when --cpus_per_task option is used, could
    oversubscribe resources.
 -- In srun, do not implicitly set the job's maximum node count based upon a
    required hostlist.
 -- Avoid running the HealthCheckProgram on non-responding nodes rather than
    DOWN nodes.
 -- Fix bug in handling of poll() functions on OS X (SLURM was ignoring POLLIN
    if POLLHUP flag was set at the same time).
 -- Pulled Cray logic out of common/node_select.c into it's own
    select/cray plugin cons_res is the default.  To use linear add 'Linear' to
    SelectTypeParameters.
 -- Fixed bug where resizing jobs didn't correctly set used limits correctly.
 -- Change sched/backfill default time interval to 30 seconds and defer attempt
    to backfill schedule if slurmctld has more than 5 active RPCs. General
    improvements in logic scalability.
 -- Add SchedulerParameters option of default_sched_depth=# to control how
    many jobs on queue should be tested for attempted scheduling when a job
    completes or other routine events. Default value is 100 jobs. The full job
    queue is tested on a less frequent basis. This option can dramatically
    improve performance on systems with thousands of queued jobs.
 -- Gres/gpu now sets the CUDA_VISIBLE_DEVICES environment to control which
    GPU devices should be used for each job or job step and CUDA version 3.1+
    is used. NOTE: SLURM's generic resource support is still under development.
 -- Modify select/cons_res to pack jobs onto allocated nodes differently and
    minimize system fragmentation. For example on nodes with 8 CPUs each, a
    job needing 10 CPUs will now ideally be allocated 8 CPUs on one node and
    2 CPUs on another node. Previously the job would have ideally been
    allocated 5 CPUs on each node, fragmenting the unused resources more.
 -- Modified the behavior of update_job() in job_mgr.c to return when the first
    error is encountered instead of continuing with more job updates.
 -- Removed all references to the following slurm.conf parameters, all of which
    have been removed or replaced since version 2.0 or earlier: HashBase,
    HeartbeatInterval, JobAcctFrequency, JobAcctLogFile (instead use
    AccountingStorageLoc), JobAcctType, KillTree, MaxMemPerTask, and
    MpichGmDirectSupport.
 -- Fix bug in slurmctld restart logic that improperly reported jobs had
    invalid features: "Job 65537 has invalid feature list: fat".
 -- BLUEGENE - Removed thread pool for destroying blocks.  It turns out the
    memory leak we were concerned about for creating and destroying threads
    in a plugin doesn't exist anymore.  This increases throughput dramatically,
    allowing multiple jobs to start at the same time.
 -- BLUEGENE - Removed thread pool for starting and stopping jobs.  For similar
    reasons as noted above.
 -- BLUEGENE - Handle blocks that never deallocate.

* Changes in SLURM 2.2.0.pre9
=============================
 -- sbatch can now submit jobs to multiple clusters and run on the earliest
    available.
 -- Fix bug introduced in pre8 that prevented job dependencies and job
    triggers from working without the --enable-debug configure option.
 -- Replaced slurm_addr with slurm_addr_t
 -- Replaced slurm_fd with slurm_fd_t
 -- Skeleton code added for BlueGeneQ.
 -- Jobs can now be submitted to multiple partitions (job queues) and use the
    one permitting earliest start time.
 -- Change slurmdb_coord_table back to acct_coord_table to keep consistant
    with < 2.1.
 -- Introduced locking system similar to that in the slurmctld for the
    assoc_mgr.
 -- Added ability to change a users name in accounting.
 -- Restore squeue support for "%G" format (group id) accidentally removed in
    2.2.0.pre7.
 -- Added preempt_mode option to QOS.
 -- Added a grouping=individual for sreport size reports.
 -- Added remove_qos logic to jobs running under a QOS that was removed.
 -- scancel now exits with a 1 if any job is non-existant when canceling.
 -- Better handling of select plugins that don't exist on various systems for
    cross cluster communication.  Slurmctld, slurmd, and slurmstepd now only
    load the default select plugin as well.
 -- Better error handling when loading plugins.
 -- Prevent scontrol from aborting if getlogin() returns NULL.
 -- Prevent scontrol segfault when there are hidden nodes.
 -- Prevent srun segfault after task launch failure.
 -- Added job_submit/lua plugin.
 -- Fixed sinfo on a bluegene system to print correctly the output for:
    sinfo -e -o "%9P %6m %.4c %.22F %f"
 -- Add scontrol commands "hold" and "release" to simplify setting a job's
    priority to 0 or 1. Also tests that the job is in pending state.
 -- Increase maximum node list size (for incoming RPC) from 1024 bytes to 64k.
 -- In the backup slurmctld, purge triggers before recovering trigger state to
    avoid duplicate entries.
 -- Fix bug in sacct processing of --fields= option.
 -- Fix bug in checkpoint/blcr for jobs spanning multiple nodes introduced when
    changing some variable names in version 2.2.0.pre5.
 -- Removed the vestigal set_max_cluster_usage() function from the Priority
    Plugin API.
 -- Modify the output of "scontrol show job" for the field ReqS:C:T=. Fields
    not specified by the user will be reported as "*" instead of 65534.
 -- Added DefaultQOS option for an association.
 -- BLUEGENE - Added -B option to the slurmctld to clear created blocks from
    the system on start.
 -- BLUEGENE - Added option to scontrol & sview to recreate existing blocks.
 -- Fixed flags for returning messages to use the correct munge key when going
    cross-cluster.
 -- BLUEGENE - Added option to scontrol & sview to resume blocks in an error
    state instead of just freeing them.
 -- sview patched to allow multiple row selection of jobs, patch from Dan Rusak
 -- Lower default slurmctld server thread count from 1024 to 256. Some systems
    process threads on a last-in first-out basis and the high thread count was
    causing unexpectedly high delays for some RPCs.
 -- Added to sacctmgr the ability for admins to reset the raw usage of a user
    or account
 -- Improved the efficiency of a few lines in sacctmgr

* Changes in SLURM 2.2.0.pre8
=============================
 -- Add DebugFlags parameter of "Backfill" for sched/backfill detailed logging.
 -- Add DebugFlags parameter of "Gang" for detailed logging of gang scheduling
    activities.
 -- Add DebugFlags parameter of "Priority" for detailed logging of priority
    multifactor activities.
 -- Add DebugFlags parameter of "Reservation" for detailed logging of advanced
    reservations.