Skip to content
Snippets Groups Projects
NEWS 213 KiB
Newer Older
Christopher J. Morrone's avatar
Christopher J. Morrone committed
This file describes changes in recent versions of SLURM. It primarily
documents those changes that are of interest to users and admins.
* Changes in SLURM 2.1.0-pre2
=============================
 -- Added support for smap to query off node name for display.
 -- Slurmdbd modified to set user ID and group ID to SlurmUser if started as 
    user root.
 -- Configuration parameter ControlMachine changed to  accept multiple comma-
    separated hostnames for support of some high-availability architectures.
 -- ALTERED API CALL slurm_get_job_steps 0 has been changed to NO_VAL for both
    job and step id to recieve all jobs/steps.  Please make adjustments to
    your code.
 -- salloc's --wait=<secs> option deprecated by --immediate=<secs> option to 
    match the srun command.
* Changes in SLURM 2.1.0-pre1
=============================
 -- Slurmd notifies slurmctld of node boot time to better clean up after node
    reboots.
 -- Slurmd sends node registration information repeatedly until successful
    transmit.
 -- Change job_state in job structure to dedicate 8-bits to state flags. 
    Added macros to get state information (IS_JOB_RUNNING(job_ptr), etc.)
 -- Added macros to get node state information (IS_NODE_DOWN(node_ptr), etc).
 -- Added support for Solaris. Patch from David Hoppner.
 -- Rename "slurm-aix-federation-<version>.rpm" to just 
    "slurm-aix-<version>.rpm" (federation switch plugin may not be present).
 -- Eliminated the redundant squeue output format and sort options of 
    "%o" and "%b". Use "%D" and "%S" formats respectively. Also eliminated 
    "%X" and "%Y" and "%Z" formats. Use "%z" instead.
 -- Added mechanism for SPANK plugins to set environment variables for
    Prolog, Epilog, PrologSLurmctld and EpilogSlurmctld programs using
    the functions spank_get_job_env, spank_set_job_env, and 
    spank_unset_job_env. See "man spank" for more information.
 -- Completed the work to begun in 2.0.0 to standardize on using '-Q' as the
    --quiet flag for all the commands.
 -- BLUEGENE - sinfo and sview now display correct cpu counts for partitions
 -- Cleaned up the cons_res plugin.  It now uses a ptr to a part_record
    instead of having to do strcmp's to find the correct one.
 -- Pushed most all the plugin specific info in src/common/node_select.c 
    into the respected plugin.
 -- BLUEGENE - closed some corner cases where a block could had been removed 
    while a job was waiting for it to become ready because an underlying 
    part of the block was put into an error state.
 -- Modify sbcast logic to prevent a user from moving files to nodes they
    have not been allocated (this would be possible in previous versions
    only by hacking the sbcast code).
 -- Add contribs/sjstat script (Perl tool to report job state information).
    Put into new RPM: sjstat.
 -- Add sched/wiki2 (Moab) JOBMODIFY command support for VARIABLELIST option
    to set supplemental environment variables for pending batch jobs.
 -- BLUEGENE - add support for scontrol show blocks.
 -- Added support for job step time limits.
* Changes in SLURM 2.0.3
========================
 -- Add reservation creation/update flag of Ignore_Jobs to enable the creation
    of a reservation that overlaps jobs expected to still be running when
    the reservation starts. This would be especially useful to reserve all 
    nodes for system maintenence without adjusting time limits of running
    jobs before creating the reservation. Without this flag, nodes allocated
    jobs expected to running when the reservation begins can not be placed 
    into a reservation.
 -- In task/affinity plugin, add layer of abstraction to logic translating
    block masks to physical machine masks. Patch from Matthieu Hautreux, CEA.
 -- Fix for setting the node_bitmap in a job to NULL if the job does not 
    start correctly when expected to start.
 -- Fixed bug in srun --pty logic. Output from the task was split up 
    arbitrarily into stdout and stderr streams, and sometimes was printed 
    out of order.
 -- If job requests minimum and maximum node count range with select/cons_res,
    try to satisfy the higher value (formerly only allocated the minimum).
 -- Fix for checking for a non-existant job when querying steps
 -- For job steps with the --exclusive option, base initial time in 
    exponential back-off be partly based upon the process ID for better 
    performance with many job steps started at the same time.
 -- Fix for correct step ordering in sview.
 -- Support optional argument to srun and salloc --immediate option. Specify
    timeout value in seconds for job or step to be allocated resources.
* Changes in SLURM 2.0.2
========================
 -- Fix, don't remove job details when a job is cancelled while pending.
 -- Do correct type for mktime so garbage isn't returned on 64bit systems 
    for accounting archival.
 -- Better checking in sacctmgr to avoid infinite loops.
 -- Fix minor memory leak in fake_slurm_step_layout_create()
 -- Fix node weight (scheduling priority) calculation for powered down
    nodes. Patch from Hongjia Cao, NUDT.
 -- Fix node suspend/resume rate calculations. Patch from Hongjia Cao, NUDT.
 -- Change calculations using ResumeRate and SuspendRate to provide higher
    resolution.
 -- Log the IP address for incoming messages having an invalid protocol 
    version number.
 -- Fix for sacct to show jobs that start the same second as the sacct
    command is issued.
 -- BLUEGENE - Fix for -n option to work on correct cpu counts for each 
    midplane instead of treating -n as a c-node count.
 -- salloc now sets SLURM_NTASKS_PER_NODE if --ntasks-per-node option is set.
 -- Fix select/linear to properly set a job's count of allocated processors
    (all processors on the allocated nodes).
 -- Fix select/cons_res to allocate proper CPU count when --ntasks-per-node
    option is used without a task count in the job request.
 -- Insure that no node is allocated to a job for which the CPU count is less
    than --ntasks-per-node * --cpus-per-task.
 -- Correct AllocProcs reported by "scontrol show node" when ThreadsPerCore
    is greater than 1 and select/cons_res is used.
 -- Fix scontrol show config for accounting information when values are 
    not set in the slurm.conf.
 -- Added a set of SBATCH_CPU_BIND* and SBATCH_MEM_BIND* env variables to keep
    jobsteps launched from within a batch script from inheriting the CPU and
    memory affinity that was applied to the batch script. Patch from Matthieu
    Hautreux, CEA.
 -- Ignore the extra processors on a node above configured size if either 
    sched/gang or select/cons_res is configured.
 -- Fix bug in tracking memory allocated on a node for select/cons_res plugin.
 -- Fixed a race condition when writing labelled output with a file per task
    or per node, which potentially closed a file before all data was written.
 -- BLUEGENE - Fix, for if a job comes in spanning both less than and 
    over 1 midplane in size we check the connection type appropriately.
 -- Make sched/backfill properly schedule jobs with constraints having node 
    counts. NOTE: Backfill of jobs with constraings having exclusive OR 
    operators are not fully supported.  
 -- If srun is cancelled by SIGINT, set the job state to cancelled, not 
    failed.
 -- BLUEGENE - Fix, for if you are setting an subbp into an error mode 
    where the subbp stated isn't the first ionode in a nodecard.
 -- Fix for backfill to not core when checking shared nodes.
 -- Fix for scontrol to not core when hitting just return in interactive mode.
 -- Improve sched/backfill logic with respect to shared nodes (multiple jobs
    per node).
 -- In sched/wiki (Maui interface) add job info fields QOS, RCLASS, DMEM and
    TASKSPERNODE. Patch from Bjorn-Helge Mevik, University of Oslo.
* Changes in SLURM 2.0.1
========================
 -- Fix, truncate time of start and end for job steps in sacct.
 -- Initialize all messages to slurmdbd. Previously uninitialized string could
    cause slurmctld to fail with invalid memory reference.
 -- BLUEGENE - Fix, for when trying to finish a torus on a block already 
    visited.  Even though this may be possible electrically this isn't valid
    in the under lying infrastructure.
 -- Fix, in mysql plugins change mediumints to int to support full 32bit 
    numbers.
 -- Add sinfo node state filtering support for NO_RESPOND, POWER_SAVE, FAIL, 
    MAINT, DRAINED and DRAINING states. The state filter of DRAIN still maps
    to any node in either DRAINED or DRAINING state.
 -- Fix reservation logic when job requests specific nodes that are already
    in some reservation the job can not use.
 -- Fix recomputation of a job's end time when allocated nodes which are
    being powered up. The end time would be set in the past if the job's
    time limit was INFINITE, resulting in it being prematurely terminated.
 -- Permit regular user to change the time limit of his pending jobs up to
    the partition's limit.
 -- Fix "-Q" (quiet) option for salloc and sbatch which was previously 
    ignored.
 -- BLUEGENE - fix for finding odd shaped blocks in dynamic mode.
 -- Fix logic supporting SuspendRate and ResumeRate configuration parameters.
    Previous logic was changing state of one too many nodes per minute.
 -- Save new reservation state file on shutdown (even if no changes).
 -- Fix, when partitions are deleted the sched and select plugins are notified.
 -- Fix for slurmdbd to create wckeyid's when they don't exist
 -- Fix linking problem that prevented checkpoint/aix from working.
Danny Auble's avatar
Danny Auble committed
* Changes in SLURM 2.0.0
 -- Fix for bluegene systems to be able to create 32 node blocks with only 
    16 psets defined in dynamic layout mode.
 -- Improve srun_cr handling of child srun forking. Patch from Hongjia Cao, 
    NUDT.
 -- Configuration parameter ResumeDelay replaced by SuspendTimeout and 
    ResumeTimeout.
 -- BLUEGENE - sview/sinfo now displays correct cnode numbers for drained nodes
    or blocks in error state.
 -- Fix some batch job launch bugs when powering up suspended nodes.
 -- Added option '-T' for sacct to truncate time of start and end and set
    default of --starttime to Midnight of current day.
Danny Auble's avatar
Danny Auble committed

* Changes in SLURM 2.0.0-rc2
============================
 -- Change fanout logic to start on calling node instead of first node in 
    message nodelist.
 -- Fix bug so that smap builds properly on Sun Constellation system.
 -- Filter white-space out from node feature specification.
Danny Auble's avatar
Danny Auble committed
 -- Fixed issue with duration not being honored when updating start time in 
    reservations.
 -- Fix bug in sched/wiki and sched/wiki2 plugins for reporting job resource
    allocation properly when node names are configured out of sort order 
    with more than one numeric suffix (e.g. "tux10-1" is configured after 
    "tux5-1").
 -- Avoid re-use of job_id (if specified at submit time) when the existing
    job is in completing state (possible race condition with Moab).
 -- Added SLURM_DISTRIBUTION to env for salloc.
 -- Add support for "scontrol takeover" command for backup controller to 
    assume control immediately. Patch from Matthieu Hautreux, CEA.
 -- If srun is unable to communicate with the slurmd tasks are now marked as 
    failed with the controller.
 -- Fixed issues with requeued jobs not being accounted for correctly in 
    the accounting.
Loading
Loading full blame...