NEWS

This file describes changes in recent versions of Slurm. It primarily
documents those changes that are of interest to users and admins.

* Changes in Slurm 14.11.0pre1
==============================
 -- Modify etc/cgroup.release_common.example to set specify full path to the
    scontrol command. Also find cgroup mount point by reading cgroup.conf file.
 -- Improve qsub wrapper support for passing environment variables.
 -- Modify sdiag to report Slurm RPC traffic by user, type, count and time
    consumed.
 -- In select plugins, stop triggering extra logging based upon the debug flag
    CPU_Bind and use SelectType instead.
 -- Added SchedulerParameters options of bf_yield_interval and bf_yield_sleep
    to control how frequently and for how long the backfill scheduler will
    relinquish its locks.
 -- To support larger numbers of jobs when the StateSaveDirectory is on a
    file system that supports a limited number of files in a directory, add a
    subdirectory called "hash.#" based upon the last digit of the job ID.
 -- More gracefully handle missing batch script file. Just kill the job and do
    not drain the compute node.
 -- Add support for allocation of GRES by model type for heterogenous systems
    (e.g. request a Kepler GPU, a Tesla GPU, or a GPU of any type).
 -- Record and enable display of nodes anticipated to be used for pending jobs.
 -- Modify squeue --start option to print the nodes expected to be used for
    pending job (in addition to expected start time, etc.).
 -- Add association hash to the assoc_mgr.
 -- Better logic to handle resized jobs when the DBD is down.
 -- Introduce MemLimitEnforce yes|no in slurm.conf. If set no Slurm will
    not terminate jobs if they exceed requested memory.
 -- Add support for non-consumable generic resources for resources that are
    limited, but can be shared between jobs.
 -- Introduce 5 new Slurm errors in slurm_errno.h related to job to better
    report error conditions. 
 -- Modify scontrol to print error message for each array task when updating
    the entire array.
 -- Added gres_drain and gres_used fields to node_info_t.
 -- Added PriorityParameters configuration parameter in slurm.conf.
 -- Introduce automatic job requeue policy based on exit value. See RequeueExit
    and RequeueExitHold descriptions in slurm.conf man page.
 -- Modify slurmd to cache launched job IDs for more responsive job suspend and
    gang scheduling.
 -- Permit jobs steps full control over cpu_bind options if specialized cores
    are included in the job allocation.
 -- Added ChosLoc configuration parameter to specifiy the pathname of the
    Chroot OS tool.
 -- Sent SIGCONT/SIGTERM when a job is selected for preemption with GraceTime
    configured rather than waiting for GraceTime to be reached before notifying
    the job.
 -- Do not resume a job with specialized cores on a node running another job
    with specialized cores (only one can run at a time).
 -- Add specialized core count to job suspend/resume calls.
 -- task/cgroup - Correct specialized core task binding with user supplied
    invalid CPU mask or map.

* Changes in Slurm 14.03.4
==========================
 -- Fix issue where not enforcing QOS but a partition either allows or denies
    them.
 -- CRAY - Make switch/cray default when running on a Cray natively.
 -- CRAY - Make job_container/cncu default when running on a Cray natively.
 -- Disable job time limit change if it's preemption is in progress.
 -- Correct logic to properly enforce job preemption GraceTime.
 -- Fix sinfo -R to print each down/drained node once, rather than once per
    partition.
 -- If a job has non-responding node, retry job step create rather than
    returning with DOWN node error.
 -- Support SLURM_CONF path which does not have "slurm.conf" as the file name.
 -- CRAY - make job_container/cncu default when running on a Cray natively
 -- Fix issue where batch cpuset wasn't looked at correctly in
    jobacct_gather/cgroup.
 -- Correct squeue's job node and CPU counts for requeued jobs.
 -- Correct SelectTypeParameters=CR_LLN with job selecition of specific nodes.
 -- Only if ALL of their partitions are hidden will a job be hidden by default.
 -- Run EpilogSlurmctld for a job is killed during slurmctld reconfiguration.

* Changes in Slurm 14.03.3-2
============================
 -- BGQ - Fix issue with uninitialized variable.

* Changes in Slurm 14.03.3
==========================
 -- Correction to default batch output file name. In version 14.03.2 was using
    "slurm_<jobid>_4294967294.out" due to error in job array logic.
 -- In slurm.spec file, replace "Requires cray-MySQL-devel-enterprise" with
    "Requires mysql-devel".

* Changes in Slurm 14.03.2
==========================
 -- Fix race condition if PrologFlags=Alloc,NoHold is used.
 -- Cray - Make NPC only limit running other NPC jobs on shared blades instead
    of limited non NPC jobs.
 -- Fix for sbatch #PBS -m (mail) option parsing.
 -- Fix job dependency bug. Jobs dependent upon multiple other jobs may start
    prematurely.
 -- Set "Reason" field for all elements of a job array on short-circuited
    scheduling for job arrays.
 -- Allow -D option of salloc/srun/sbatch to specify relative path.
 -- Added SchedulerParameter of batch_sched_delay to permit many batch jobs
    to be submitted between each scheduling attempt to reduce overhead of
    scheduling logic.
 -- Added job reason of "SchedTimeout" if the scheduler was not able to reach
    the job to attempt scheduling it.
 -- Add job's exit state and exit code to email message.
 -- scontrol hold/release accepts job name option (in addition to job ID).
 -- Handle when trying to cancel a step that hasn't started yet better.
 -- Handle Max/GrpCPU limits better
 -- Add --priority option to salloc, sbatch and srun commands.
 -- Honor partition priorities over job priorities.
 -- Fix sacct -c when using jobcomp/filetxt to read newer variables
 -- Fix segfault of sacct -c if spaces are in the variables.
 -- Release held job only with "scontrol release <jobid>" and not by resetting
    the job's priority. This is needed to support job arrays better.
 -- Correct squeue command not to merge jobs with state pending and completing
    together.
 -- Fix issue where user is requesting --acctg-freq=0 and no memory limits.
 -- Fix issue with GrpCPURunMins if a job's timelimit is altered while the job
    is running.
 -- Temporary fix for handling our typemap for the perl api with newer perl.
 -- Fix allowgroup on bad group seg fault with the controller.
 -- Handle node ranges better when dealing with accounting max node limits.

* Changes in Slurm 14.03.1-2
==========================
 -- Update configure to set correct version without having to run autogen.sh

* Changes in Slurm 14.03.1
==========================
 -- Add support for job std_in, std_out and std_err fields in Perl API.
 -- Add "Scheduling Configuration Guide" web page.
 -- BGQ - fix check for jobinfo when it is NULL
 -- Do not check cleaning on "pending" steps.
 -- task/cgroup plugin - Fix for building on older hwloc (v1.0.2).
 -- In the PMI implementation by default don't check for duplicate keys.
    Set the SLURM_PMI_KVS_DUP_KEYS if you want the code to check for
    duplicate keys.
 -- Add job submission time to squeue.
 -- Permit user root to propagate resource limits higher than the hard limit
    slurmd has on that compute node has (i.e. raise both current and maximum
    limits).
 -- Fix issue with license used count when doing an scontrol reconfig.
 -- Fix the PMI iterator to not report duplicated keys.
 -- Fix issue with sinfo when -o is used without the %P option.
 -- Rather than immediately invoking an execution of the scheduling logic on
    every event type that can enable the execution of a new job, queue its
    execution. This permits faster execution of some operations, such as
    modifying large counts of jobs, by executing the scheduling logic less
    frequently, but still in a timely fashion.
 -- If the environment variable is greater than MAX_ENV_STRLEN don't
    set it in the job env otherwise the exec() fails.
 -- Optimize scontrol hold/release logic for job arrays.
 -- Modify srun to report an exit code of zero rather than nine if some tasks
    exit with a return code of zero and others are killed with SIGKILL. Only an
    exit code of zero did this.
 -- Fix a typo in scontrol man page.
 -- Avoid slurmctld crash getting job info if detail_ptr is NULL.
 -- Fix sacctmgr add user where both defaultaccount and accounts are specified.
 -- Added SchedulerParameters option of max_sched_time to limit how long the
    main scheduling loop can execute for.
 -- Added SchedulerParameters option of sched_interval to control how frequently
    the main scheduling loop will execute.
 -- Move start time of main scheduling loop timeout after locks are aquired.
 -- Add squeue job format option of "%y" to print a job's nice value.
 -- Update scontrol update jobID logic to operate on entire job arrays.
 -- Fix PrologFlags=Alloc to run the prolog on each of the nodes in the
    allocation instead of just the first.
 -- Fix race condition if a step is starting while the slurmd is being
    restarted.
 -- Make sure a job's prolog has ran before starting a step.
 -- BGQ - Fix invalid memory read when using DefaultConnType in the
    bluegene.conf
 -- Make sure we send node state to the DBD on clean start of controller.
 -- Fix some sinfo and squeue sorting anomalies due to differences in data
    types.
 -- Only send message back to slurmctld when PrologFlags=Alloc is used on a
    Cray/ALPS system, otherwise use the slurmd to wait on the prolog to gate
    the start of the step.
 -- Remove need to check PrologFlags=Alloc in slurmd since we can tell if prolog
    has ran yet or not.
 -- Fix squeue to use a correct macro to check job state.
 -- BGQ - Fix incorrect logic issues if MaxBlockInError=0 in the bluegene.conf.
 -- priority/basic - Insure job priorities continue to decrease when jobs are
    submitted with the --nice option.
 -- Make the PrologFlag=Alloc work on batch scripts
 -- Make PrologFlag=NoHold (automatically sets PrologFlag=Alloc) not hold in
    salloc/srun, instead wait in the slurmd when a step hits a node and the
    prolog is still running.
 -- Added --cpu-freq=highm1 (high minus one) option.
 -- Expand StdIn/Out/Err string length output by "scontrol show job" from 128
    to 1024 bytes.
 -- squeue %F format will now print the job ID for non-array jobs.
 -- Use quicksort for all priority based job sorting, which improves performance
    significantly with large job counts.
 -- If a job has already been released from a held state ignore successive
    release requests.
 -- Fix srun/salloc/sbatch man pages for the --no-kill option.
 -- Add squeue -L/--licenses option to filter jobs by license names.
 -- Handle abort job on node on front end systems without core dumping.
 -- Fix dependency support for job arrays.
 -- When updating jobs verify the update request is not identical to
    the current settings.