NEWS

This file describes changes in recent versions of SLURM. It primarily
documents those changes that are of interest to users and admins.

* Changes in SLURM 1.4.0-pre2
=============================
 -- Remove srun's --ctrl-comm-ifhn-addr option (for PMI/MPICH2). It is no
    longer needed.
 -- Modify power save mode so that nodes can be powered off when idle. See
    https://computing.llnl.gov/linux/slurm/power_save.html or 
    "man slurm.conf" (SuspendProgram and related parameters) for more 
    information.
 -- Added configuration parameter PrologSlurmctld, which can be used to boot
    nodes into a particular state for each job. See "man slurm.conf" for 
    details.
 -- Add configuration parameter CompleteTime to control how long to wait for 
    a job's completion before allocating already released resources to pending
    jobs. This can be used to reduce fragmentation of resources. See
    "man slurm.conf" for details.
 -- Make default CryptoType=crypto/munge. OpenSSL is now completely optional.
 -- Make default AuthType=auth/munge rather than auth/none.

* Changes in SLURM 1.4.0-pre1
=============================
 -- Save/restore a job's task_distribution option on slurmctld retart.
    NOTE: SLURM must be cold-started on converstion from version 1.3.x.
 -- Remove task_mem from job step credential (only job_mem is used now).
 -- Remove --task-mem and --job-mem options from salloc, sbatch and srun
    (use --mem-per-cpu or --mem instead).
 -- Remove DefMemPerTask from slurm.conf (use DefMemPerCPU or DefMemPerNode
    instead).
 -- Modify slurm_step_launch API call. Move launch host from function argument
    to element in the data structure slurm_step_launch_params_t, which is
    used as a function argument.
 -- Add state_reason_string to job state with optional details about why
    a job is pending.
 -- Make "scontrol show node" output match scontrol input for some fields
    ("Cores" changed to "CoresPerSocket", etc.).
 -- Add support for a new node state "FUTURE" in slurm.conf. These node records
    are created in SLURM tables for future use without a reboot of the SLURM
    daemons, but are not reported by any SLURM commands or APIs.

* Changes in SLURM 1.3.9
========================
 -- Fix jobs being cancelled by ctrl-C to have correct cancelled state in 
    accounting.
 -- Slurmdbd will only cache user data, made for faster start up
 -- Improved support for job steps in FRONT_END systems
 -- Added support to dump and load association information in the controller
    on start up if slurmdbd is unresponsive
 -- BLUEGENE - Added support for sched/backfill plugin
 -- sched/backfill modified to initiate multiple jobs per cycle.
 -- Increase buffer size in srun to hold task list expressions. Critical 
    for jobs with 16k tasks or more.
 -- Added support for eligible jobs and downed nodes to be sent to accounting
    from the controller the first time accounting is turned on.
 -- Correct srun logic to support --tasks-per-node option without task count.
 -- Logic in place to handle multiple versions of RPCs within the slurmdbd. 
    THE SLURMDBD MUST BE UPGRADED TO THIS VERSION OR IT WILL NOT WORK TALK WITH
    THE SLURMCTLD.  Older versions of the slurmctld will continue to talk to 
    the new slurmdbd.
 -- Add support for new job dependency type: singleton. Only one job from a 
    given user with a given name will execute with this dependency type.
    From Matthieu Hautreux, CEA.
 -- Updated contribs/python/hostlist: See "CHANGES" file in that directory
    for details. From Kent Engstrom, NSC.

* Changes in SLURM 1.3.8
========================
 -- Added PrivateData flags for Users, Usage, and Accounts to Accounting. 
    If using slurmdbd, set in the slurmdbd.conf file. Otherwise set in the 
    slurm.conf file.  See "man slurm.conf" or "man slurmdbd.conf" for details.
 -- Reduce frequency of resending job kill RPCs. Helpful in the event of 
    network problems or down nodes.
 -- Fix memory leak caused under heavy load when running with select/cons_res
    plus sched/backfill.
 -- For salloc, if no local command is specified, execute the user's default
    shell.
 -- BLUEGENE - patch to make sure when starting a job blocks required to be
    freed are checked to make sure no job is running on them.  If one is found
    we will requeue the new job.  No job will be lost.
 -- BLUEGENE - Set MPI environment variables from salloc.
 -- BLUEGENE - Fix threading issue for overlap mode
 -- Reject batch scripts containing DOS linebreaks.
 -- BLUEGENE - Added wait for block boot to salloc

* Changes in SLURM 1.3.7
========================
 -- Add jobid/stepid to MESSAGE_TASK_EXIT to address race condition when 
    a job step is cancelled, another is started immediately (before the 
    first one completely terminates) and ports are reused. 
    NOTE: This change requires that SLURM be updated on all nodes of the
    cluster at the same time. There will be no impact upon currently running
    jobs (they will ignore the jobid/stepid at the end of the message).
 -- Added Python module to process hostslists as used by SLURM. See
    contribs/python/hostlist. Supplied by Kent Engstrom, National
    Supercomputer Centre, Sweden.
 -- Report task termination due to signal (restored functionality present
    in slurm v1.2).
 -- Remove sbatch test for script size being no larger than 64k bytes.
    The current limit is 4GB.
 -- Disable FastSchedule=0 use with SchedulerType=sched/gang. Node 
    configuration must be specified in slurm.conf for gang scheduling now.
 -- For sched/wiki and sched/wiki2 (Maui or Moab scheduler) disable the ability
    of a non-root user to change a job's comment field (used by Maui/Moab for
    storing scheduler state information).
 -- For sched/wiki (Maui) add pending job's future start time to the state
    info reported to Maui.
 -- Improve reliability of job requeue logic on node failure.
 -- Add logic to ping non-responsive nodes even if SlurmdTimeout=0. This permits
    the node to be returned to use when it starts responding rather than 
    remaining in a non-usable state.
 -- Honor HealthCheckInterval values that are smaller than SlurmdTimeout.
 -- For non-responding nodes, log them all on a single line with a hostlist 
    expression rather than one line per node. Frequency of log messages is 
    dependent upon SlurmctldDebug value from 300 seconds at SlurmctldDebug<=3
    to 1 second at SlurmctldDebug>=5.
 -- If a DOWN node is resumed, set its state to IDLE & NOT_RESPONDING and 
    ping the node immediately to clear the NOT_RESPONDING flag.
 -- Log that a job's time limit is reached, but don't sent SIGXCPU.
 -- Fixed gid to be set in slurmstepd when run by root
 -- Changed getpwent to getpwent_r in the slurmctld and slurmd
 -- Increase timeout on most slurmdbd communications to 60 secs (time for
    substantial database updates).
 -- Treat srun option of --begin= with a value of now without a numeric
    component as a failure (e.g. "--begin=now+hours").
 -- Eliminate a memory leak associated with notifying srun of allocated
    nodes having failed.
 -- Add scontrol shutdown option of "slurmctld" to just shutdown the 
    slurmctld daemon and leave the slurmd daemons running.
 -- Do not require JobCredentialPrivateKey or JobCredentialPublicCertificate
    in slurm.conf if using CryptoType=crypto/munge.

* Changes in SLURM 1.3.6
========================
 -- Add new function to get information for a single job rather than always
    getting information for all jobs. Improved performance of some commands. 
    NOTE: This new RPC means that the slurmctld daemons should be updated
    before or at the same time as the compute nodes in order to process it.
 -- In salloc, sbatch, and srun replace --task-mem options with --mem-per-cpu
    (--task-mem will continue to be accepted for now, but is not documented).
    Replace DefMemPerTask and MaxMemPerTask with DefMemPerCPU, DefMemPerNode,
    MaxMemPerCPU and MaxMemPerNode in slurm.conf (old options still accepted
    for now, but mapped to "PerCPU" parameters and not documented). Allocate
    a job's memory memory at the same time that processors are allocated based
    upon the --mem or --mem-per-cpu option rather than when job steps are
    initiated.
 -- Altered QOS in accounting to be a list of admin defined states, an
    account or user can have multiple QOS's now.  They need to be defined using
    'sacctmgr add qos'.  They are no longer an enum.  If none are defined
    Normal will be the QOS for everything.  Right now this is only for use 
    with MOAB.  Does nothing outside of that.
 -- Added spank_get_item support for field S_STEP_CPUS_PER_TASK.
 -- Make corrections in spank_get_item for field S_JOB_NCPUS, previously 
    reported task count rather than CPU count.
 -- Convert configuration parameter PrivateData from on/off flag to have
    separate flags for job, partition, and node data. See "man slurm.conf"
    for details.
 -- Fix bug, failed to load DisableRootJobs configuration parameter.
 -- Altered sacctmgr to always return a non-zero exit code on error and send 
    error messages to stderr.

* Changes in SLURM 1.3.5
========================
 -- Fix processing of auth/munge authtentication key for messages originating 
    in slurmdbd and sent to slurmctld. 
 -- If srun is allocating resources (not within sbatch or salloc) and MaxWait
    is configured to a non-zero value then wait indefinitely for the resource
    allocation rather than aborting the request after MaxWait time.
 -- For Moab only: add logic to reap defunct "su" processes that are spawned by
    slurmd to load user's environment variables.
 -- Added more support for "dumping" account information to a flat file and 
    read in again to protect data incase something bad happens to the database.
 -- Sacct will now report account names for job steps.
 -- For AIX: Remove MP_POERESTART_ENV environment variable, disabling 
    poerestart command. User must explicitly set MP_POERESTART_ENV before 
    executing poerestart.
 -- Put back notification that a job has been allocated resources when it was
    pending.

* Changes in SLURM 1.3.4
========================
 -- Some updates to man page formatting from Gennaro Oliva, ICAR.
 -- Smarter loading of plugins (doesn't stat every file in the plugin dir)
 -- In sched/backfill avoid trying to schedule jobs on DOWN or DRAINED nodes.
 -- forward exit_code from step completion to slurmdbd
 -- Add retry logic to socket connect() call from client which can fail 
    when the slurmctld is under heavy load.
 -- Fixed bug when adding associations to add correctly.
 -- Added support for associations for user root.
 -- For Moab, sbatch --get-user-env option processed by slurmd daemon
    rather than the sbatch command itself to permit faster response
    for Moab.
 -- IMPORTANT FIX: This only effects use of select/cons_res when allocating
    resources by core or socket, not by CPU (default for SelectTypeParameter). 
    We are not saving a pending job's task distribution, so after restarting
    slurmctld, select/cons_res was over-allocating resources based upon an 
    invalid task distribution value. Since we can't save the value without 
    changing the state save file format, we'll just set it to the default 
    value for now and save it in Slurm v1.4. This may result in a slight 
    variation on how sockets and cores are allocated to jobs, but at least