NEWS

This file describes changes in recent versions of Slurm. It primarily
documents those changes that are of interest to users and administrators.

* Changes in Slurm 14.11.0pre5
==============================
 -- Fix sbatch --export=ALL, it was treated by srun as a request to explicitly
    export only the environment variable named "ALL".
 -- Improve scheduling of jobs in reservations that overlaps other reservations.

* Changes in Slurm 14.11.0pre4
==============================
 -- Added job array data structure and removed 64k array size restriction.
 -- Added SchedulerParameters options of bf_max_job_array_resv to control how
    many tasks of a job array should have resources reserved for them.
 -- Added more validity checking of incoming job submit requests.
 -- Added srun --export option to set/export specific environment variables.
 -- Scontrol modified to print separate error messages for job arrays with
    different exit codes on the different tasks of the job array. Applies to
    job suspend and resume operations.
 -- Fix race condition in CPU frequency set with job preemption.
 -- Always call select plugin on step termination, even if the job is also
    complete.
 -- Srun executable names beginning with "." will be resolved based upon the
    working directory and path on the compute node rather than the submit node.
 -- Add node state string suffix of "$" to identify nodes in maintenance
    reservation or scheduled for reboot. This applies to scontrol, sinfo,
    and sview commands.
 -- Enable scontrol to clear a nodes's scheduled reboot by setting its state
    to "RESUME".
 -- As per sbatch and srun documentation when the --signal option is used
    signal only the steps and unless, in the case, of a batch job B is
    specified in which case signal only the batch script.
 -- Modify AuthInfo configuration parameter to accept credential lifetime
    option.
 -- Modify crypto/munge plugin to use socket and timeout specified in AuthInfo.
 -- If we have a state for a step on completion put that in the database
    instead of guessing off the exit_code.
 -- Added squeue -P/--priority option that can be used to display pending jobs
    in the same order as used by the Slurm scheduler even if jobs are submitted
    to multiple partitions (job is reported once per usable partition).
 -- Improve the pending reason description for various QOS limits. For each
    QOS limit that causes a job to be pending print its specific reason.
    For example if job pends because of GrpCpus the squeue command will
    print QOSGrpCpuLimit as pending reason.
 -- sched/backfill - Set expected start time of job submitted to multiple
    partitions to the earliest start time on any of the partitions.
 -- Introduce a MAX_BATCH_REQUEUE define that indicates how many times a job
    can be requeued. When the number is reached the job is put on hold
    with reason JobHoldMaxRequeue.
 -- Add sbatch job array option to limit the number of simultaneously running
    tasks from a job array (e.g. "--array=0-15%4").
 -- Implemented a new QOS limit MinCPUs. Users running under a QOS must
    request a minimum number of CPUs which is at least MinCPUs otherwise
    their job will pend.
 -- Introduced a new pending reason WAIT_QOS_MIN_CPUS to reflect the new QOS
    limit.
 -- Job array dependency based upon state is now dependent upon the state of
    the array as a whole (e.g. afterok requires ALL tasks to complete
    sucessfully, afternotok is true if ANY tasks does not complete successfully,
    and after requires all tasks to at least be started).
 -- The srun -u/--unbuffered options set the stdout of the task launched
    by srun to be line buffered.
 -- The srun options -/--label and -u/--unbuffered can be specified together.
    This limitation has been removed.
 -- Provide sacct display of gres accounting information per job.
 -- Change the node status size from uin16_t to uint32_t.

* Changes in Slurm 14.11.0pre3
==============================
 -- Move xcpuinfo.[c|h] to the slurmd since it isn't needed anywhere else
    and will avoid the need for all the daemons to link to libhwloc.
 -- Add memory test to job_submit/partition plugin.
 -- Added new internal Slurm functions xmalloc_nz() and xrealloc_nz(), which do
    not initialize the allocated memory to zero for improved performance.
 -- Modify hostlist function to dynamically allocate buffer space for improved
    performance.
 -- In the job_submit plugin: Remove all slurmctld locks prior to job_submit()
    being called for improved performance. If any slurmctld data structures are
    read or modified, add locks directly in the plugin.
 -- Added PriorityFlag LEVEL_BASED described in doc/html/level_based.shtml
 -- If Fairshare=parent is set on an account, that account's children will be
    effectively reparented for fairshare calculations to the first parent of
    their parent that is not Fairshare=parent.  Limits remain the same,
    only it's fairshare value is affected.

* Changes in Slurm 14.11.0pre2
==============================
 -- Added AllowSpecResourcesUsage configuration parameter in slurm.conf. This
    allows jobs to use specialized resources on nodes allocated to them if the
    job designates --core-spec=0.
 -- Add new SchedulerParameters option of build_queue_timeout to throttle how
    much time can be consumed building the job queue for scheduling.
 -- Added HealthCheckNodeState option of "cycle" to cycle through the compute
    nodes over the course of HealthCheckInterval rather than running all at
    the same time.
 -- Add job "reboot" option for Linux clusters. This invokes the configured
    RebootProgram to reboot nodes allocated to a job before it begins execution.
 -- Added squeue -O/--Format option that makes all job and step fields available
    for printing.
 -- Improve database slurmctld entry speed dramatically.
 -- Add "CPUs" count to output of "scontrol show step".
 -- Add support for lua5.2
 -- scancel -b signals only the batch step neither any other step nor any
    children of the shell script.
 -- MySQL - enforce NO_ENGINE_SUBSTITUTION
 -- Added CpuFreqDef configuration parameter in slurm.conf to specify the
    default CPU frequency and governor to be set at job end.
 -- Added support for job email triggers: TIME_LIMIT, TIME_LIMIT_90 (reached
    90% of time limit), TIME_LIMIT_80 (reached 80% of time limit), and
    TIME_LIMIT_50 (reached 50% of time limit). Applies to salloc, sbatch and
    srun commands.
 -- In slurm.conf add the parameter SrunPortRange=min-max. If this is configured
    then srun will use its dynamic ports only from the configured range.
 -- Make debug_flags 64 bit to handle more flags.

* Changes in Slurm 14.11.0pre1
==============================
 -- Modify etc/cgroup.release_common.example to set specify full path to the
    scontrol command. Also find cgroup mount point by reading cgroup.conf file.
 -- Improve qsub wrapper support for passing environment variables.
 -- Modify sdiag to report Slurm RPC traffic by user, type, count and time
    consumed.
 -- In select plugins, stop triggering extra logging based upon the debug flag
    CPU_Bind and use SelectType instead.
 -- Added SchedulerParameters options of bf_yield_interval and bf_yield_sleep
    to control how frequently and for how long the backfill scheduler will
    relinquish its locks.
 -- To support larger numbers of jobs when the StateSaveDirectory is on a
    file system that supports a limited number of files in a directory, add a
    subdirectory called "hash.#" based upon the last digit of the job ID.
 -- More gracefully handle missing batch script file. Just kill the job and do
    not drain the compute node.
 -- Add support for allocation of GRES by model type for heterogenous systems
    (e.g. request a Kepler GPU, a Tesla GPU, or a GPU of any type).
 -- Record and enable display of nodes anticipated to be used for pending jobs.
 -- Modify squeue --start option to print the nodes expected to be used for
    pending job (in addition to expected start time, etc.).
 -- Add association hash to the assoc_mgr.
 -- Better logic to handle resized jobs when the DBD is down.
 -- Introduce MemLimitEnforce yes|no in slurm.conf. If set no Slurm will
    not terminate jobs if they exceed requested memory.
 -- Add support for non-consumable generic resources for resources that are
    limited, but can be shared between jobs.
 -- Introduce 5 new Slurm errors in slurm_errno.h related to job to better
    report error conditions.
 -- Modify scontrol to print error message for each array task when updating
    the entire array.
 -- Added gres_drain and gres_used fields to node_info_t.
 -- Added PriorityParameters configuration parameter in slurm.conf.
 -- Introduce automatic job requeue policy based on exit value. See RequeueExit
    and RequeueExitHold descriptions in slurm.conf man page.
 -- Modify slurmd to cache launched job IDs for more responsive job suspend and
    gang scheduling.
 -- Permit jobs steps full control over cpu_bind options if specialized cores
    are included in the job allocation.
 -- Added ChosLoc configuration parameter to specifiy the pathname of the
    Chroot OS tool.
 -- Sent SIGCONT/SIGTERM when a job is selected for preemption with GraceTime
    configured rather than waiting for GraceTime to be reached before notifying
    the job.
 -- Do not resume a job with specialized cores on a node running another job
    with specialized cores (only one can run at a time).
 -- Add specialized core count to job suspend/resume calls.
 -- task/affinity and task/cgroup - Correct specialized core task binding with
    user supplied invalid CPU mask or map.
 -- Add srun --cpu-freq options to set the CPU governor (OnDemand, Performance,
    PowerSave or UserSpace).
 -- Add support for a job step's CPU governor and/or frequency to be reset on
    suspend/resume (or gang scheduling). The default for an idle CPU will now
    be "ondemand" rather than "userspace" with the lowest frequency (to recover
    from hard slurmd failures and support gang scheduling).
 -- Added PriorityFlags option of Calulate_Running to continue recalculating
    the priority of running jobs.
 -- Replace round-robin front-end node selection with least-loaded algorithm.
 -- CRAY - Improve support of XC30 systems when running natively.
 -- Add new node configuration parameters CoreSpecCount, CPUSpecList and
    MemSpecLimit which support the reservation of resources for system use
    with Linux cgroup.
 -- Add child_forked() function to the slurm_acct_gather_profile plugin to
    close open files, leaving application with no extra open file descriptors.
 -- Cray/ALPS system - Enable backup controller to run outside of the Cray to
    accept new job submissions and most other operations on the pending jobs.
 -- Have sacct print job and task array id's for job arrays.
 -- Smooth out fanout logic
 -- If <sys/prctl.h> is present name major threads in slurmctld, for
    example backfill
    thread: slurmctld_bckfl, the rpc manager: slurmctld_rpcmg etc.
    The name can be seen for example using top -H.
 -- sview - Better job_array support.
 -- Provide more precise error message when job allocation can not be satisfied
    (e.g. memory, disk, cpu count, etc. rather than just "node configuration
    not available").
 -- Create a new DebugFlags named TraceJobs in slurm.conf to print detailed
    information about jobs in slurmctld. The information include job ids, state
    and node count.
 -- When a job dependency can never be satisfied do not cancel the job but keep
    pending with reason WAIT_DEP_INVALID (DependencyNeverSatisfied).

* Changes in Slurm 14.03.8
==========================