Skip to content
Snippets Groups Projects
NEWS 291 KiB
Newer Older
Danny Auble's avatar
Danny Auble committed
This file describes changes in recent versions of Slurm. It primarily
documents those changes that are of interest to users and administrators.

Morris Jette's avatar
Morris Jette committed
* Changes in Slurm 14.11.5
==========================
David Bigagli's avatar
David Bigagli committed
 -- Correct the squeue command taking into account that a node can
    have NULL name if it is not in DNS but still in slurm.conf.
Brian Christiansen's avatar
Brian Christiansen committed
 -- Fix slurmdbd regression which would cause a segfault when a node is set
    down with no reason.
Danny Auble's avatar
Danny Auble committed
* Changes in Slurm 14.11.4
==========================
 -- Make sure assoc_mgr locks are initialized correctly.
 -- Correct check of enforcement when filling in an association.
David Bigagli's avatar
David Bigagli committed
 -- Make sacctmgr print out classification correctly for clusters.
 -- Add array_task_str to the perlapi job info.
Morris Jette's avatar
Morris Jette committed
 -- Fix for slurmctld abort with GRES types configured and no CPU binding.
 -- Fix for GRES scheduling where count > 1 per topology type (or GRES types).
 -- Make CR_ONE_TASK_PER_CORE work correctly with task/affinity.
 -- job_submit/pbs - Fix possible deadlock.
 -- job_submit/lua - Add "alloc_node" to job information available.
 -- Fix memory leak in mysql accounting when usage rollup happens.
 -- If users specify ALL together with other variables using the
    --export sbatch/srun command line option, propagate the users'
    environ to the execution side.
 -- Fix job array scheduling anomaly that can stop scheduling of valid tasks.
 -- Fix perl api tests for libslurmdb to work correctly.
 -- Remove some misleading logs related to non-consumable GRES.
 -- Allow --ignore-pbs to take effect when read as an #SBATCH argument.
 -- Fix Slurmdb::clusters_get() in perl api from not returning information.
 -- Fix TaskPluginParam=Cpusets from logging error message about not being able
    to remove cpuset dir which was already removed by the release_agent.
Brian Christiansen's avatar
Brian Christiansen committed
 -- Fix sorting by time left in squeue.
 -- Fix the file name substitution for job stderr when %A, %a %j and %u
    are specified.
 -- Remove minor warning when compiling slurmstepd.
 -- Fix database resources so they can add new clusters to them after they have
    initially been added.
Danny Auble's avatar
Danny Auble committed
 -- Use the slurm_getpwuid_r wrapper of getpwuid_r to handle possible
    interrupts.
 -- Correct the scontrol man page and command listing which node states can
    be set by the command.
 -- Stop sacct from printing non-existent stat information for
    Front End systems.
David Bigagli's avatar
David Bigagli committed
 -- Correct srun and acct_gather.conf man pages, mention Filesystem instead
    of Lustre.
 -- When a job using multiple partition starts send to slurmdbd only
    the partition in which the job runs.
 -- ALPS - Fix depth for MemoryAllocation in BASIL with CLE 5.2.3.
 -- Fix assoc_mgr hash to deal with users that don't have a uid yet when making
    reservations.
 -- When a job uses multiple partition set the environment variable
    SLURM_JOB_PARTITION to be the one in which the job started.
 -- Print spurious message about the absence of cgroup.conf at log level debug2
    instead of info.
Morris Jette's avatar
Morris Jette committed
 -- Enable CUDA v7.0+ use with a Slurm configuration of TaskPlugin=task/cgroup
    ConstrainDevices=yes (in cgroup.conf). With that configuration
    CUDA_VISIBLE_DEVICES will start at 0 rather than the device number.
 -- Fix job array logic that can cause slurmctld to abort.
 -- Report job "shared" field properly in scontrol, squeue, and sview.
 -- If a job is requeued because of RequeueExit or RequeueExitHold sent event
    REQUEUED to slurmdbd.
 -- Fix build if hwloc is in non-standard location.
 -- Fix slurmctld job recovery logic which could cause the last task in a job
    array to be lost.
 -- Fix slurmctld initialization problem which could cause requeue of the last
    task in a job array to fail if executed prior to the slurmctld loading
    the maximum size of a job array into a variable in the job_mgr.c module.
 -- Fix fatal in controller when deleting a user association of a user which
    had been previously removed from the system.
 -- MySQL - If a node state and reason are the same on a node state change
    don't insert a new row in the event table.
 -- Fix issue with "sreport cluster AccountUtilizationByUser" when using
    PrivateData=users.
 -- Fix perlapi tests for libslurm perl module.
Danny Auble's avatar
Danny Auble committed

Danny Auble's avatar
Danny Auble committed
* Changes in Slurm 14.11.3
==========================
Danny Auble's avatar
Danny Auble committed
 -- Prevent vestigial job record when canceling a pending job array record.
David Bigagli's avatar
David Bigagli committed
 -- Fixed squeue core dump.
Morris Jette's avatar
Morris Jette committed
 -- Fix job array hash table bug, could result in slurmctld infinite loop or
    invalid memory reference.
 -- In srun honor ntasks_per_node before looking at cpu count when the user
    doesn't request a number of tasks.
 -- Fix ghost job when submitting job after all jobids are exhausted.
 -- MySQL - Enhanced coordinator security checks.
 -- Fix for task/affinity if an admin configures a node for having threads
    but then sets CPUs to only represent the number of cores on the node.
 -- Make it so previous versions of salloc/srun work with newer versions
    of Slurm daemons.
 -- Avoid delay on commit for PMI rank 0 to improve performance with some
    MPI implementations.
 -- auth/munge - Correct logic to read old format AccountingStoragePass.
 -- Reset node "RESERVED" state as appropriate when deleting a maintenance
    reservation.
 -- Prevent a job manually suspended from being resumed by gang scheduler once
    free resources are available.
 -- Prevent invalid job array task ID value if a task is started using gang
    scheduling.
 -- Fixes for clean build on FreeBSD.
 -- Fix documentation bugs in slurm.conf.5. DenyAccount should be DenyAccounts.
 -- For backward compatibility with older versions of OMPI not compiled
    with --with-pmi restore the SLURM_STEP_RESV_PORTS in the job environment.
 -- Update the html documentation describing the integration with openmpi.
 -- Fix sacct when searching by nodelist.
 -- Fix cosmetic info statements when dealing with a job array task instead of
    a normal job.
 -- Fix segfault with job arrays.
David Bigagli's avatar
David Bigagli committed
 -- Correct the sbatch pbs parser to process -j.
 -- BGQ - Put print statement under a DebugFlag.  This was just an oversight.
 -- BLUEGENE - Remove check that would erroneously remove the CONFIGURING
    flag from a job while the job is waiting for a block to boot.
 -- Fix segfault in slurmstepd when job exceeded memory limit.
Morris Jette's avatar
Morris Jette committed
 -- Fix race condition that could start a job that is dependent upon a job array
    before all tasks of that job array complete.
Danny Auble's avatar
Danny Auble committed
 -- PMI2 race condition fix.
Morris Jette's avatar
Morris Jette committed
* Changes in Slurm 14.11.2
==========================
 -- Fix Centos5 compile errors.
 -- Fix issue with association hash not getting the correct index which
    could result in seg fault.
 -- Fix salloc/sbatch -B segfault.
Morris Jette's avatar
Morris Jette committed
 -- Avoid huge malloc if GRES configured with "Type" and huge "Count".
Brian Christiansen's avatar
Brian Christiansen committed
 -- Fix jobs from starting in overlapping reservations that won't finish before
    a "maint" reservation begins.
 -- When node gets drained while in state mixed display its status as draining
    in sinfo output.
 -- Allow priority/multifactor to work with sched/wiki(2) if all priorities
    have no weight.  This allows for association and QOS decay limits to work.
Brian Christiansen's avatar
Brian Christiansen committed
 -- Fix "squeue --start" to override SQUEUE_FORMAT env variable.
 -- Fix scancel to be able to cancel multiple jobs that are space delimited.
 -- Log Cray MPI job calling exit() without mpi_fini(), but do not treat it as
    a fatal error. This partially reverts logic added in version 14.03.9.
 -- sview - Fix displaying of suspended steps elapsed times.
 -- Increase number of messages that get cached before throwing them away
    when the DBD is down.
 -- Fix jobs from starting in overlapping reservations that won't finish before
    a "maint" reservation begins.
 -- Restore GRES functionality with select/linear plugin. It was broken in
    version  14.03.10.
Morris Jette's avatar
Morris Jette committed
 -- Fix bug with GRES having multiple types that can cause slurmctld abort.
 -- Fix squeue issue with not recognizing "localhost" in --nodelist option.
 -- Make sure the bitstrings for a partitions Allow/DenyQOS are up to date
    when running from cache.
 -- Add smap support for job arrays and larger job ID values.
 -- Fix possible race condition when attempting to use QOS on a system running
    accounting_storage/filetxt.
 -- Fix issue with accounting_storage/filetxt and job arrays not being printed
    correctly.
 -- In proctrack/linuxproc and proctrack/pgid, check the result of strtol()
    for error condition rather than errno, which might have a vestigial error
    code.
Danny Auble's avatar
Danny Auble committed
 -- Improve information recording for jobs deferred due to advanced
    reservation.
 -- Exports eio_new_initial_obj to the plugins and initialize kvs_seq on
    mpi/pmi2 setup to support launching.
Danny Auble's avatar
Danny Auble committed
* Changes in Slurm 14.11.1
==========================
 -- Get libs correct when doing the xtree/xhash make check.
 -- Update xhash/tree make check to work correctly with current code.
David Bigagli's avatar
David Bigagli committed
 -- Remove the reference 'experimental' for the jobacct_gather/cgroup
    plugin.
 -- Add QOS manipulation examples to the qos.html documentation page.
 -- If 'squeue -w node_name' specifies an unknown host name print
    an error message and return 1.
 -- Fix race condition in job_submit plugin logic that could cause slurmctld to
    deadlock.
 -- Job wait reason of "ReqNodeNotAvail" expanded to identify unavailable nodes
    (e.g. "ReqNodeNotAvail(Unavailable:tux[3-6])").
* Changes in Slurm 14.11.0
==========================
 -- ALPS - Fix issue with core_spec warning.
 -- Allow multiple partitions to be specified in sinfo -p.
David Bigagli's avatar
David Bigagli committed
 -- Install the service files in /usr/lib/systemd/system.
 -- MYSQL - Add id_array_job and id_resv keys to $CLUSTER_job_table.  THIS
    COULD TAKE A WHILE TO CREATE THE KEYS SO BE PATIENT.
 -- CRAY - Resize bitmaps on a restart and find we have more blades
    than before.
 -- Add new eio API function for removing unused connections.
 -- ALPS - Fix issue where batch allocations weren't correctly confirmed or
    released.
 -- Define DEFAULT_MAX_TASKS_PER_NODE based on MAX_TASKS_PER_NODE from
    slurm.h as per documentation.
 -- Update the FAQ about relocating slurmctld.
 -- In the memory cgroup enable memory.use_hierarchy in the cgroup root.
Hongjia Cao's avatar
Hongjia Cao committed
 -- Export eio.c functions for use by MPI/PMI2.
 -- Add SLURM_CLUSTER_NAME to job environment.
Morris Jette's avatar
Morris Jette committed
* Changes in Slurm 14.11.0rc3
=============================
 -- Allow envs to override autotools binaries in autogen.sh
 -- Added system services files.
Morris Jette's avatar
Morris Jette committed
 -- If the jobs pends with DependencyNeverSatisfied keep it pending even after
    the job which it was depending upon was cleaned.
 -- Let operators (in addition to user root and SlurmUser) see job script for
    other user's jobs.
 -- Perl API modified to return node state of MIXED rather than ALLOCATED if
Loading
Loading full blame...