NEWS

This file describes changes in recent versions of Slurm. It primarily
documents those changes that are of interest to users and admins.

* Changes in Slurm 14.11.0pre2
==============================
 -- Added AllowSpecResourcesUsage configuration parameter in slurm.conf. This
    allows jobs to use specialized resources on nodes allocated to them if the
    job designates --core-spec=0.
 -- Add new SchedulerParameters option of build_queue_timeout to throttle how
    much time can be consumed building the job queue for scheduling.
 -- Added HealthCheckNodeState option of "cycle" to cycle through the compute
    nodes over the course of HealthCheckInterval rather than running all at
    the same time.
 -- Add job "reboot" option for Linux clusters. This invokes the configured
    RebootProgram to reboot nodes allocated to a job before it begins execution.
 -- Added squeue -O/--Format option that makes all job and step fields available
    for printing.
 -- Improve database slurmctld entry speed dramatically.
 -- Add "CPUs" count to output of "scontrol show step".
 -- Add support for lua5.2
 -- scancel -b signals only the batch step neither any other step nor any
    children of the shell script.
 -- MySQL - enforce NO_ENGINE_SUBSTITUTION
 -- Added CpuFreqDef configuration parameter in slurm.conf to specify the
    default CPU frequency and governor to be set at job end.
 -- Added support for job email triggers: TIME_LIMIT, TIME_LIMIT_90 (reached
    90% of time limit), TIME_LIMIT_80 (reached 80% of time limit). Applies to
    salloc, sbatch and srun commands.

* Changes in Slurm 14.11.0pre1
==============================
 -- Modify etc/cgroup.release_common.example to set specify full path to the
    scontrol command. Also find cgroup mount point by reading cgroup.conf file.
 -- Improve qsub wrapper support for passing environment variables.
 -- Modify sdiag to report Slurm RPC traffic by user, type, count and time
    consumed.
 -- In select plugins, stop triggering extra logging based upon the debug flag
    CPU_Bind and use SelectType instead.
 -- Added SchedulerParameters options of bf_yield_interval and bf_yield_sleep
    to control how frequently and for how long the backfill scheduler will
    relinquish its locks.
 -- To support larger numbers of jobs when the StateSaveDirectory is on a
    file system that supports a limited number of files in a directory, add a
    subdirectory called "hash.#" based upon the last digit of the job ID.
 -- More gracefully handle missing batch script file. Just kill the job and do
    not drain the compute node.
 -- Add support for allocation of GRES by model type for heterogenous systems
    (e.g. request a Kepler GPU, a Tesla GPU, or a GPU of any type).
 -- Record and enable display of nodes anticipated to be used for pending jobs.
 -- Modify squeue --start option to print the nodes expected to be used for
    pending job (in addition to expected start time, etc.).
 -- Add association hash to the assoc_mgr.
 -- Better logic to handle resized jobs when the DBD is down.
 -- Introduce MemLimitEnforce yes|no in slurm.conf. If set no Slurm will
    not terminate jobs if they exceed requested memory.
 -- Add support for non-consumable generic resources for resources that are
    limited, but can be shared between jobs.
 -- Introduce 5 new Slurm errors in slurm_errno.h related to job to better
    report error conditions.
 -- Modify scontrol to print error message for each array task when updating
    the entire array.
 -- Added gres_drain and gres_used fields to node_info_t.
 -- Added PriorityParameters configuration parameter in slurm.conf.
 -- Introduce automatic job requeue policy based on exit value. See RequeueExit
    and RequeueExitHold descriptions in slurm.conf man page.
 -- Modify slurmd to cache launched job IDs for more responsive job suspend and
    gang scheduling.
 -- Permit jobs steps full control over cpu_bind options if specialized cores
    are included in the job allocation.
 -- Added ChosLoc configuration parameter to specifiy the pathname of the
    Chroot OS tool.
 -- Sent SIGCONT/SIGTERM when a job is selected for preemption with GraceTime
    configured rather than waiting for GraceTime to be reached before notifying
    the job.
 -- Do not resume a job with specialized cores on a node running another job
    with specialized cores (only one can run at a time).
 -- Add specialized core count to job suspend/resume calls.
 -- task/affinity and task/cgroup - Correct specialized core task binding with
    user supplied invalid CPU mask or map.
 -- Add srun --cpu-freq options to set the CPU governor (OnDemand, Performance,
    PowerSave or UserSpace).
 -- Add support for a job step's CPU governor and/or frequency to be reset on
    suspend/resume (or gang scheduling). The default for an idle CPU will now
    be "ondemand" rather than "userspace" with the lowest frequency (to recover
    from hard slurmd failures and support gang scheduling).
 -- Added PriorityFlags option of Calulate_Running to continue recalculating
    the priority of running jobs.
 -- Replace round-robin front-end node selection with least-loaded algorithm.
 -- CRAY - Improve support of XC30 systems when running natively.
 -- Add new node configuration parameters CoreSpecCount, CPUSpecList and
    MemSpecLimit which support the reservation of resources for system use
    with Linux cgroup.
 -- Add child_forked() function to the slurm_acct_gather_profile plugin to
    close open files, leaving application with no extra open file descriptors.
 -- Cray/ALPS system - Enable backup controller to run outside of the Cray to
    accept new job submissions and most other operations on the pending jobs.
 -- Have sacct print job and task array id's for job arrays.
 -- Smooth out fanout logic
 -- If <sys/prctl.h> is present name major threads in slurmctld, for
    example backfill
    thread: slurmctld_bckfl, the rpc manager: slurmctld_rpcmg etc.
    The name can be seen for example using top -H.
 -- sview - Better job_array support.
 -- Provide more precise error message when job allocation can not be satisfied
    (e.g. memory, disk, cpu count, etc. rather than just "node configuration
    not available").
 -- Create a new DebugFlags named TraceJobs in slurm.conf to print detailed
    information about jobs in slurmctld. The information include job ids, state
    and node count.
 -- When a job dependency can never be satisfied do not cancel the job but keep
    pending with reason WAIT_DEP_INVALID (DependencyNeverSatisfied).

* Changes in Slurm 14.03.6
==========================

* Changes in Slurm 14.03.5
==========================
 -- If a srun runs in an exclusive allocation and doesn't use the entire
    allocation and CR_PACK_NODES is set layout tasks appropriately.
 -- Correct Shared field in job state information seen by scontrol, sview, etc.
 -- Print Slurm error string in scontrol update job and reset the Slurm errno
    before each call to the API.
 -- Fix task/cgroup to handle -mblock:fcyclic correctly
 -- Fix for core-based advanced reservations where the distribution of cores
    across nodes is not even.
 -- Fix issue where association maxnodes wouldn't be evaluated correctly if a
    QOS had a GrpNodes set.
 -- GRES fix with multiple files defined per line in gres.conf.
 -- When a job is requeued make sure accounting marks it as such.
 -- Print the state of requeued job as REQUEUED.
 -- Fix if a job's partition was taken away from it don't allow a requeue.
 -- Make sure we lock on the conf when sending slurmd's conf to the slurmstepd.
 -- Fix issue with sacctmgr 'load' not able to gracefully handle bad formatted
    file.
 -- sched/backfill: Correct job start time estimate with advanced reservations.
 -- Error message added when in proctrack/cgroup the step freezer path isn't
    able to be destroyed for debug.
 -- Added extra index's into the database for better performance when
    deleting users.
 -- Fix issue with wckeys when tracking wckeys, but not enforcing them,
    you could get multiple '*' wckeys.
 -- Fix bug which could report to squeue the wrong partition for a running job
    that is submitted to multiple partitions.
 -- Report correct CPU count allocated to job when allocated whole node even if
    not using all CPUs.
 -- If job's constraints cannot be satisfied put it in pending state with reason
    BadConstraints and don't remove it.
 -- sched/backfill - If job started with infinite time limit, set its end_time
    one year in the future.
 -- Clear record of a job's gres when requeued.
 -- Clear QOS GrpUsedCPUs when resetting raw usage if QOS is not using any cpus.
 -- Remove log message left over from debugging.
 -- When using CR_PACK_NODES fix make --ntasks-per-node work correctly.
 -- Report correct partition associated with a step if the job is submitted to
    multiple partitions.
 -- Fix to allow removing of preemption from a QOS
 -- If the proctrack plugins fail to destroy the job container print an error
    message and avoid to loop forever, give up after 120 seconds.
 -- Make srun obey POSIX convention and increase the exit code by 128 when the
    process terminated by a signal.
 -- Sanity check for acct_gather_energy/rapl
 -- If the proctrack plugins fail to destroy the job container print an error
    message and avoid to loop forever, give up after 120 seconds.
 -- If the sbatch command specifies the option --signal=B:signum sent the signal
    to the batch script only.
 -- If we cancel a task and we have no other exit code send the signal and
    exit code.
 -- Added note about InnoDB storage engine being used with MySQL.
 -- Set the job exit code when the job is signaled and set the log level to
    debug2() when processing an already completed job.
 -- Reset diagnostics time stamp when "sdiag --reset" is called.
 -- squeue and scontrol to report a job's "shared" value based upon partition
    options rather than reporting "unknown" if job submission does not use
    --exclusive or --shared option.
 -- task/cgroup - Fix cpuset binding for batch script.
 -- sched/backfill - Fix anomaly that could result in jobs being scheduled out
    of order.
 -- Expand pseudo-terminal size data structure field sizes from 8 to 16 bits.
 -- Set the job exit code when the job is signaled and set the log level to
    debug2() when processing an already completed job.
 -- Distinguish between two identical error messages.
 -- If using accounting_storage/mysql directly without a DBD fix issue with
    start of requeued jobs.
 -- If a job fails because of batch node failure and the job is requeued and an
    epilog complete message comes from that node do not process the batch step
    information since the job has already been requeued because the epilog
    script running isn't guaranteed in this situation.
 -- Change message to note a NO_VAL for return code could of come from node
    failure as well as interactive user.
 -- Modify test4.5 to only look at one partition instead of all of them.
 -- Fix sh5util -u to accept username different from the user that runs the
    command.
 -- Corrections to man pages:salloc.1 sbatch.1 srun.1 nonstop.conf.5
    slurm.conf.5.
 -- Restore srun --pty resize ability.
 -- Have sacctmgr dump cluster handle situations where users or such have
    special characters in their names like ':'

* Changes in Slurm 14.03.4
==========================