Skip to content
Snippets Groups Projects
NEWS 141 KiB
Newer Older
Christopher J. Morrone's avatar
Christopher J. Morrone committed
This file describes changes in recent versions of SLURM. It primarily
documents those changes that are of interest to users and admins.
* Changes in SLURM 2.5.0.pre1
=============================
 -- Add new output to "scontrol show configuration" of LicensesUsed. Output is
    "name:used/total"
Danny Auble's avatar
Danny Auble committed
* Changes in SLURM 2.4.0.rc2
=============================
 -- Cray - Improve support for zero compute note resource allocations.
    Partition used can now be configured with no nodes nodes.
 -- BGQ - make it so srun -i<taskid> works correctly.
 -- Fix parse_uint32/16 to complain if a non-digit is given.
 -- Add SUBMITHOST to job state passed to Moab vial sched/wiki2. Patch by Jon
    Bringhurst (LANL).
 -- BGQ - Fix issue when running with AllowSubBlockAllocations=Yes without
    compiling with --enable-debug
Danny Auble's avatar
Danny Auble committed

* Changes in SLURM 2.4.0.rc1
Morris Jette's avatar
Morris Jette committed
=============================
Morris Jette's avatar
Morris Jette committed
 -- Improve task binding logic by making fuller use of HWLOC library,
    especially with respect to Opteron 6000 series processors. Work contributed
    by Komoto Masahiro.
 -- Add new configuration parameter PriorityFlags, based upon work by
    Carles Fenoy (Barcelona Supercomputer Center).
 -- Modify the step completion RPC between slurmd and slurmstepd in order to
    eliminate a possible deadlock. Based on work by Matthieu Hautreux, CEA.
 -- Change the owner of slurmctld and slurmdbd log files to the appropriate
    user. Without this change the files will be created by and owned by the
    user starting the daemons (likely user root).
 -- Reorganize the slurmstepd logic in order to better support NFS and
    Kerberos credentials via the AUKS plugin. Work by Matthieu Hautreux, CEA.
 -- Fix bug in allocating GRES that are associated with specific CPUs. In some
    cases the code allocated first available GRES to job instead of allocating
    GRES accessible to the specific CPUs allocated to the job.
 -- spank: Add callbacks in slurmd: slurm_spank_slurmd_{init,exit}
    and job epilog/prolog: slurm_spank_job_{prolog,epilog}
 -- spank: Add spank_option_getopt() function to api
 -- Change resolution of switch wait time from minutes to seconds.
 -- Added CrpCPUMins to the output of sshare -l for those using hard limit
    accounting.  Work contributed by Mark Nelson.
 -- Added mpi/pmi2 plugin for complete support of pmi2 including acquiring
    additional resources for newly launched tasks. Contributed by Hongjia Cao,
    NUDT.
 -- BGQ - fixed issue where if a user asked for a specific node count and more
    tasks than possible without overcommit the request would be allowed on more
    nodes than requested.
 -- Add support for new SchedulerParameters of bf_max_job_user, maximum number
    of jobs to attempt backfilling per user. Work by Bjørn-Helge Mevik,
    University of Oslo.
 -- BLUEGENE - fixed issue where MaxNodes limit on a partition only limited
    larger than midplane jobs.
 -- Added cpu_run_min to the output of sshare --long.  Work contributed by
    Mark Nelson.
 -- BGQ - allow regular users to resolve Rack-Midplane to AXYZ coords.
 -- Add sinfo output format option of "%R" for partition name without "*"
    appended for default partition.
 -- Cray - Add support for zero compute note resource allocation to run batch
    script on front-end node with no ALPS reservation. Useful for pre- or post-
    processing.
 -- Support for cyclic distribution of cpus in task/cgroup plugin from Martin
    Perry, Bull.
 -- GrpMEM limit for QOSes and associations added Patch from Bjørn-Helge Mevik,
    University of Oslo.
 -- Various performance improvements for up to 500% higher throughput depending
    upon configuration. Work supported by the Oak Ridge National Laboratory
    Extreme Scale Systems Center.
 -- Added jobacct_gather/cgroup plugin.  It is not advised to use this in
    production as it isn't currently complete and doesn't provide an equivalent
    substitution for jobacct_gather/linux yet. Work by Martin Perry, Bull.
Morris Jette's avatar
Morris Jette committed

Morris Jette's avatar
Morris Jette committed
* Changes in SLURM 2.4.0.pre4
=============================
 -- Add logic to cache GPU file information (bitmap index mapping to device
    file number) in the slurmd daemon and transfer that information to the
    slurmstepd whenever a job step is initiated. This is needed to set the
    appropriate CUDA_VISIBLE_DEVICES environment variable value when the
    devices are not in strict numeric order (e.g. some GPUs are skipped).
    Based upon work by Nicolas Bigaouette.
Danny Auble's avatar
Danny Auble committed
 -- BGQ - Remove ability to make a sub-block with a geometry with one or more
    of it's dimensions of length 3.  There is a limitation in the IBM I/O
    subsystem that is problematic with multiple sub-blocks with a dimension
    of length 3, so we will disallow them to be able to be created.  This
    mean you if you ask the system for an allocation of 12 c-nodes you will
    be given 16.  If this is ever fix in BGQ you can remove this patch.
 -- BLUEGENE - Better handling blocks that go into error state or deallocate
    while jobs are running on them.
 -- BGQ - fix for handling mix of steps running at same time some of which
    are full allocation jobs, and others that are smaller.
 -- BGQ - fix for core dump after running multiple sub-block jobs on static
    blocks.
 -- BGQ - fixed sync issue where if a job finishes in SLURM but not in mmcs
    for a long time after the SLURM job has been flushed from the system
    we don't have to worry about rebooting the block to sync the system.
 -- BGQ - In scontrol/sview node counts are now displayed with
    CnodeCount/CnodeErrCount so to point out there are cnodes in an error state
    on the block.  Draining the block and having it reboot when all jobs are
    gone will clear up the cnodes in Software Failure.
 -- Change default SchedulerParameters max_switch_wait field value from 60 to
    300 seconds.
 -- BGQ - catch errors from the kill option of the runjob client.
 -- BLUEGENE - make it so the epilog runs until slurmctld tells it the job is
    gone.  Previously it had a timelimit which has proven to not be the right
    thing.
 -- FRONTEND - fix issue where if a compute node was in a down state and
    an admin updates the node to idle/resume the compute nodes will go
    instantly to idle instead of idle* which means no response.
Danny Auble's avatar
Danny Auble committed
 -- Fix regression in 2.4.0.pre3 where number of submitted jobs limit wasn't
    being honored for QOS.
 -- Cray - Enable logging of BASIL communications with environment variables.
    Set XML_LOG to enable logging. Set XML_LOG_LOC to specify path to log file
    or "SLURM" to write to SlurmctldLogFile or unset for "slurm_basil_xml.log".
    Patch from Steve Tronfinoff, CSCS.
 -- FRONTEND - if a front end unexpectedly reboots kill all jobs but don't
    mark front end node down.
 -- FRONTEND - don't down a front end node if you have an epilog error
 -- BLUEGENE - if a job has an epilog error don't down the midplane it was
    running on.
 -- BGQ - added new DebugFlag (NoRealTime) for only printing debug from
    state change while the realtime server is running.
 -- Fix multi-cluster mode with sview starting on a non-bluegene cluster going
    to a bluegene cluster.
 -- BLUEGENE - ability to show Rack Midplane name of midplanes in sview and
    scontrol.
* Changes in SLURM 2.4.0.pre3
=============================
 -- Let a job be submitted even if it exceeds a QOS limit. Job will be left
    in a pending state until the QOS limit or job parameters change. Patch by
    Phil Eckert, LLNL.
Morris Jette's avatar
Morris Jette committed
 -- Add sacct support for the option "--name". Work by Yuri D'Elia, Center for
    Biomedicine, EURAC Research, Italy.
Danny Auble's avatar
Danny Auble committed
 -- BGQ - handle preemption.
 -- Add an srun shepard process to cancel a job and/or step of the srun process
    is killed abnormally (e.g. SIGKILL).
 -- BGQ - handle deadlock issue when a nodeboard goes into an error state.
 -- BGQ - more thorough handling of blocks with multiple jobs running on them.
 -- Fix man2html process to compile in the build directory instead of the
    source dir.
 -- Behavior of srun --multi-prog modified so that any program arguments
    specified on the command line will be appended to the program arguments
    specified in the program configuration file.
jette's avatar
jette committed
 -- Add new command, sdiag, which reports a variety of job scheduling
    statistics. Based upon work by Alejandro Lucero Palau, BSC.
 -- BLUEGENE - Added DefaultConnType to the bluegene.conf file.  This makes it
    so you can specify any connection type you would like (TORUS or MESH) as
    the default in dynamic mode.  Previously it always defaulted to TORUS.
 -- Made squeue -n and -w options more consistent with salloc, sbatch, srun,
    and scancel. Patch by Don Lipari, LLNL.
 -- Have sacctmgr remove user records when no associations exist for that user.
jette's avatar
jette committed
 -- Several header file changes for clean build with NetBSD. Patches from
    Aleksej Saushev.
 -- Fix for possible deadlock in accounting logic: Avoid calling
    jobacct_gather_g_getinfo() until there is data to read from the socket.
 -- Fix race condition that could generate "job_cnt_comp underflow" errors on
    front-end architectures.
 -- BGQ - Fix issue where a system with missing cables could cause core dump.
Morris Jette's avatar
Morris Jette committed
* Changes in SLURM 2.4.0.pre2
=============================
 -- CRAY - Add support for GPU memory allocation using SLURM GRES (Generic
    RESource) support. Work by Steve Trofinoff, CSCS.
 -- Add support for job allocations with multiple job constraint counts. For
    example: salloc -C "[rack1*2&rack2*4]" ... will allocate the job 2 nodes
    from rack1 and 4 nodes from rack2. Support for only a single constraint
    name been added to job step support.
 -- BGQ - Remove old method for marking cnodes down.
 -- BGQ - Remove BGP images from view in sview.
 -- BGQ - print out failed cnodes in scontrol show nodes.
 -- BGQ - Add srun option of "--runjob-opts" to pass options to the runjob
    command.
 -- FRONTEND - handle step launch failure better.
 -- BGQ - Added a mutex to protect the now changing ba_system pointers.
 -- BGQ - added new functionality for sub-block allocations - no preemption
    for this yet though.
 -- Add --name option to squeue to filter output by job name. Patch from Yuri
    D'Elia.
 -- BGQ - Added linking to runjob client libary which gives support to totalview
    to use srun instead of runjob.
 -- Add numeric range checks to scontrol update options. Patch from Phil
    Eckert, LLNL.
 -- Add ReconfigFlags configuration option to control actions of "scontrol
    reconfig". Patch from Don Albert, Bull.
 -- BGQ - handle reboots with multiple jobs running on a block.
 -- BGQ - Add message handler thread to forward signals to runjob process.
* Changes in SLURM 2.4.0.pre1
=============================
 -- BGQ - use the ba_geo_tables to figure out the blocks instead of the old
    algorithm.  The improves timing in the worst cases and simplifies the code
    greatly.
 -- BLUEGENE - Change to output tools labels from BP to Midplane
    (i.e. BP List -> MidplaneList).
 -- BLUEGENE - read MPs and BPs from the bluegene.conf
 -- Modify srun's SIGINT handling logic timer (two SIGINTs within one second) to
    be based microsecond rather than second timer.
 -- Modify advance reservation to accept multiple specific block sizes rather
    than a single node count.
 -- Permit administrator to change a job's QOS to any value without validating
Loading
Loading full blame...