NEWS

This file describes changes in recent versions of Slurm. It primarily
documents those changes that are of interest to users and administrators.

* Changes in Slurm 15.08.0pre3
==============================
 -- CRAY - addition of acct_gather_energy/cray plugin.
 -- Add job credential to "Run Prolog" RPC used with a configuration of
    PrologFlags=alloc. This allows the Prolog to be passed identification of
    GPUs allocated to the job.
 -- Add SLURM_JOB_CONSTAINTS to environment variables available to the Prolog.
 -- Added "--mail=stage_out" option to job submission commands to notify user
    when burst buffer state out is complete.

* Changes in Slurm 15.08.0pre2
==============================
 -- Add the environment variables SLURM_JOB_ACCOUNT, SLURM_JOB_QOS
    and SLURM_JOB_RESERVATION in the batch/srun jobs.
 -- Add sview burst buffer display.
 -- Properly enforce partition Shared=YES option. Previously oversubscribing
    resources required gang scheduling to be configured.
 -- Enable per-partition gang scheduling resource resolution (e.g. the partition
    can have SelectTypeParameters=CR_CORE, while the global value is CR_SOCKET).
 -- Make it so a newer version of a slurmstepd can talk to an older srun.
    allocation. Nodes could have been added while waiting for an allocation.
 -- Expanded --cpu-freq parameters to include min-max:governor specifications.
    --cpu-freq now supported on salloc and sbatch.
 -- Add support for optimized job allocations with respect to SGI Hypercube
    topology.
    NOTE: Only supported with select/linear plugin.
    NOTE: The program contribs/sgi/netloc_to_topology can be used to build
    Slurm's topology.conf file.
 -- Remove 64k validation of incoming RPC nodelist size. Validated at 64MB
    when unpacking.
 -- In slurmstepd() add the user primary group if it is not part of the
    groups sent from the client.
 -- Added BurstBuffer field to advanced reservations.
 -- For advanced reservation, replace flag "License_only" with flag "Any_Nodes".
    It can be used to indicate the an advanced reservation resources (licenses
    and/or burst buffers) can be used with any compute nodes.
 -- Allow users to specify the srun --resv-ports as 0 in which case no ports
    will be reserved. The default behaviour is to allocate one port per task.
 -- Interpret a partition configuration of "Nodes=ALL" in slurm.conf as
    including all nodes defined in the cluster.
 -- Added new configuration parameters PowerParameters and PowerPlugin.
 -- Added power management plugin infrastructure.
 -- If job already exceeded one of its QOS/Accounting limits do not
    return error if user modifies QOS unrelated job settings.
 -- Added DebugFlags value of "Power".
 -- When caching user ids of AllowGroups use both getgrnam_r() and getgrent_r()
    then remove eventual duplicate entries.
 -- Remove rpm dependency between slurm-pam and slurm-devel.
 -- Remove support for the XCPU (cluster management) package.
 -- Add Slurmdb::jobs_get() interface to perl api.
 -- Performance improvement when sending data from srun to stepds when
    processing fencing.
 -- Add the feature to specify arbitrary field separator when running
    sacct -p or sacct -P. The command line option is --separator.
 -- Introduce slurm.conf parameter to use Proportional Set Size (PSS) instead
    of RSS to determinate the memory footprint of a job.
    Add an slurm.conf option not to kill jobs that is over memory limit.
 -- Add job submission command options: --sicp (available for inter-cluster
    dependencies) and --power (specify power management options) to salloc,
    sbatch, and srun commands.
 -- Add DebugFlags option of SICP (inter-cluster option logging).
 -- In order to support inter-cluster job dependencies, the MaxJobID
    configuration parameter default value has been reduced from 4,294,901,760
    to 2,147,418,112 and it's maximum value is now 2,147,463,647.
    ANY JOBS WITH A JOB ID ABOVE 2,147,463,647 WILL BE PURGED WHEN SLURM IS
    UPGRADED FROM AN OLDER VERSION!
 -- Add QOS name to the output of a partition in squeue/scontrol/sview/smap.

* Changes in Slurm 15.08.0pre1
==============================
 -- Add sbcast support for file transfer to resources allocated to a job step
    rather than a job allocation.
 -- Change structures with association in them to assoc to save space.
 -- Add support for job dependencies jointed with OR operator (e.g.
    "--depend=afterok:123?afternotok:124").
 -- Add "--bb" (burst buffer specification) option to salloc, sbatch, and srun.
 -- Added configuration parameters BurstBufferParameters and BurstBufferType.
 -- Added burst_buffer plugin infrastructure (needs many more functions).
 -- Make it so when the fanout logic comes across a node that is down we abandon
    the tree to avoid worst case scenarios when the entire branch is down and
    we have to try each serially.
 -- Add better error reporting of invalid partitions at submission time.
 -- Move will-run test for multiple clusters from the sbatch code into the API
    so that it can be used with DRMAA.
 -- If a non-exclusive allocation requests --hint=nomultithread on a
    CR_CORE/SOCKET system lay out tasks correctly.
 -- Avoid including unused CPUs in a job's allocation when cores or sockets are
    allocated.
 -- Added new job state of STOPPED indicating processes have been stopped with a
    SIGSTOP (using scancel or sview), but retain its allocated CPUs. Job state
    returns to RUNNING when SIGCONT is sent (also using scancel or sview).
 -- Added EioTimeout parameter to slurm.conf. It is the number of seconds srun
    waits for slurmstepd to close the TCP/IP connection used to relay data
    between the user application and srun when the user application terminates.
 -- Remove slurmctld/dynalloc plugin as the work was never completed, so it is
    not worth the effort of continued support at this time.
 -- Remove DynAllocPort configuration parameter.
 -- Add advance reservation flag of "replace" that causes allocated resources
    to be replaced with idle resources. This maintains a pool of available
    resources that maintains a constant size (to the extent possible).
 -- Added SchedulerParameters option of "bf_busy_nodes". When selecting
    resources for pending jobs to reserve for future execution (i.e. the job
    can not be started immediately), then preferentially select nodes that are
    in use. This will tend to leave currently idle resources available for
    backfilling longer running jobs, but may result in allocations having less
    than optimal network topology. This option is currently only supported by
    the select/cons_res plugin.
 -- Permit "SuspendTime=NONE" as slurm.conf value rather than only a numeric
    value to match "scontrol show config" output.
 -- Add the 'scontrol show cache' command which displays the associations
    in slurmctld.
 -- Test more frequently for node boot completion before starting a job.
    Provides better responsiveness.
 -- Fix PMI2 singleton initialization.
 -- Permit PreemptType=qos and PreemptMode=suspend,gang to be used together.
    A high-priority QOS job will now oversubscribe resources and gang schedule,
    but only if there are insufficient resources for the job to be started
    without preemption. NOTE: That with PreemptType=qos, the partition's
    Shared=FORCE:# configuration option will permit one job more per resource
    to be run than than specified, but only if started by preemption.
 -- Remove the CR_ALLOCATE_FULL_SOCKET configuration option.  It is now the
    default.
 -- Fix a race condition in PMI2 when fencing counters can be out of sync.
 -- Increase the MAX_PACK_MEM_LEN define to avoid PMI2 failure when fencing
    with large amount of ranks.
 -- Add QOS option to a partition.  This will allow a partition to have
    all the limits a QOS has.  If a limit is set in both QOS the partition
    QOS will override the job's QOS unless the job's QOS has the
    PartitionQOS flag set.
 -- The task_dist_states variable has been split into "flags" and "base"
    components. Added SLURM_DIST_PACK_NODES and SLURM_DIST_NO_PACK_NODES values
    to give user greater control over task distribution. The srun --dist options
    has been modified to accept a "Pack" and "NoPack" option. These options can
    be used to override the CR_PACK_NODE configuration option.

* Changes in Slurm 14.11.5
==========================
 -- Correct the squeue command taking into account that a node can
    have NULL name if it is not in DNS but still in slurm.conf.
 -- Fix slurmdbd regression which would cause a segfault when a node is set
    down with no reason.
 -- BGQ - Fix issue with job arrays not being handled correctly
    in the runjob_mux plugin.
 -- Print FAIR_TREE, if configured, in "scontrol show config" output for
    PriorityFlags.
 -- Add SLURM_JOB_GPUS environment variable to those available in the Prolog.
 -- Load lua-5.2 library if using lua5.2 for lua job submit plugin.
 -- GRES logic: Prevent bad node_offset due to not preserving no_consume flag.

* Changes in Slurm 14.11.4
==========================
 -- Make sure assoc_mgr locks are initialized correctly.
 -- Correct check of enforcement when filling in an association.
 -- Make sacctmgr print out classification correctly for clusters.
 -- Add array_task_str to the perlapi job info.
 -- Fix for slurmctld abort with GRES types configured and no CPU binding.
 -- Fix for GRES scheduling where count > 1 per topology type (or GRES types).
 -- Make CR_ONE_TASK_PER_CORE work correctly with task/affinity.
 -- job_submit/pbs - Fix possible deadlock.
 -- job_submit/lua - Add "alloc_node" to job information available.
 -- Fix memory leak in mysql accounting when usage rollup happens.
 -- If users specify ALL together with other variables using the
    --export sbatch/srun command line option, propagate the users'
    environ to the execution side.
 -- Fix job array scheduling anomaly that can stop scheduling of valid tasks.
 -- Fix perl api tests for libslurmdb to work correctly.
 -- Remove some misleading logs related to non-consumable GRES.
 -- Allow --ignore-pbs to take effect when read as an #SBATCH argument.
 -- Fix Slurmdb::clusters_get() in perl api from not returning information.
 -- Fix TaskPluginParam=Cpusets from logging error message about not being able
    to remove cpuset dir which was already removed by the release_agent.
 -- Fix sorting by time left in squeue.
 -- Fix the file name substitution for job stderr when %A, %a %j and %u
    are specified.
 -- Remove minor warning when compiling slurmstepd.
 -- Fix database resources so they can add new clusters to them after they have
    initially been added.
 -- Use the slurm_getpwuid_r wrapper of getpwuid_r to handle possible
    interrupts.
 -- Correct the scontrol man page and command listing which node states can
    be set by the command.
 -- Stop sacct from printing non-existent stat information for
    Front End systems.
 -- Correct srun and acct_gather.conf man pages, mention Filesystem instead
    of Lustre.
 -- When a job using multiple partition starts send to slurmdbd only
    the partition in which the job runs.
 -- ALPS - Fix depth for MemoryAllocation in BASIL with CLE 5.2.3.
 -- Fix assoc_mgr hash to deal with users that don't have a uid yet when making
    reservations.
 -- When a job uses multiple partition set the environment variable
    SLURM_JOB_PARTITION to be the one in which the job started.
 -- Print spurious message about the absence of cgroup.conf at log level debug2
    instead of info.
 -- Enable CUDA v7.0+ use with a Slurm configuration of TaskPlugin=task/cgroup
    ConstrainDevices=yes (in cgroup.conf). With that configuration
    CUDA_VISIBLE_DEVICES will start at 0 rather than the device number.