Skip to content
Snippets Groups Projects
NEWS 189 KiB
Newer Older
Christopher J. Morrone's avatar
Christopher J. Morrone committed
This file describes changes in recent versions of SLURM. It primarily
documents those changes that are of interest to users and admins.
Morris Jette's avatar
Morris Jette committed
* Changes in Slurm 2.6.1
========================
 -- slurmdbd - Allow job derived ec and comments to be modified by non-root
    users.
 -- Fix issue with job name being truncated to 24 chars when sending a mail
    message.
 -- Fix minor issues with spec file, missing files and including files
    erroneously on a bluegene system.
 -- sacct - fix --name and --partition options when using
    accounting_storage/filetxt.
 -- squeue - Remove extra whitespace of default printout.
 -- BGQ - added head ppcfloor as an include dir when building.
 -- BGQ - Better debug messages in runjob_mux plugin.
David Bigagli's avatar
David Bigagli committed
 -- PMI2 Updated the Makefile.am to build a versioned library.
 -- CRAY - Fix srun --mem_bind=local option with launch/aprun.
 -- PMI2 Corrected buffer size computation in the pmi2_api.c module.
 -- GRES accounting data wrong in database: gres_alloc, gres_req, and gres_used
    fields were empty if the job was not started immediately.
 -- Fix sbatch and srun task count logic when --ntasks-per-node specified,
    but no explicit task count.
 -- Corrected the hdf5 profile user guide and the acct_gather.conf
    documentation.
 -- IPMI - Fix Math bug getting new wattage.
David Bigagli's avatar
David Bigagli committed
 -- Corrected the AcctGatherProfileType documentation in slurm.conf
 -- Corrected the sh5util program to print the header in the csv file 
    only once, set the debug messages at debug() level, make the argument 
    check case insensitive and avoid printing duplicate \n.
 -- If cannot collect energy values send message to the controller
    to drain the node and log error slurmd log file.
 -- Handle complete removal of CPURunMins time at the end of the job instead
    of at multifactor poll.
 -- sview - Add missing debug_flag options.
 -- PGSQL - Notes about Postgres functionality being removed in the next
    version of Slurm.
Morris Jette's avatar
Morris Jette committed

* Changes in Slurm 2.6.0
========================
 -- Fix it so bluegene and serial systems don't get warnings over new NODEDATA
    enum.
 -- When a job is aborted send a message for any tasks that have completed.
 -- Correction to memory per CPU calculation on system with threads and
    allocating cores or sockets.
 -- Requeue batch job if it's node reboots (used to abort the job).
 -- Enlarge maximum size of srun's hostlist file.
 -- IPMI - Fix first poll to get correct consumed_energy for a step.
jette's avatar
jette committed
 -- Correction to job state recovery logic that could result in assert failure.
 -- Record partial step accounting record if allocated nodes fail abnormally.
 -- Accounting - fix issue where PrivateData=jobs or users could potentially
    show information to users that had no associations on the system.
 -- Make PrivateData in slurmdbd.conf case insensitive.
 -- sacct/sstat - Add format option ConsumedEnergyRaw to print full energy
    values.
* Changes in Slurm 2.6.0rc2
===========================
 -- HDF5 - Fix issue with Ubuntu where HDF5 development headers are
    overwritten by the parallel versions thus making it so we need handle
    both cases.
 -- ACCT_GATHER - handle suspending correctly for polling threads.
 -- Make SLURM_DISTRIBUTION env var hold both types of distribution if
    specified.
 -- Remove hardcoded /usr/local from slurm.spec.
 -- Modify slurmctld locking to improve performance under heavy load with
    very large numbers of batch job submissions or job cancellations.
 -- sstat - Fix issue where if -j wasn't given allow last argument to be checked
    for as the job/step id.
 -- IPMI - fix adjustment on poll when using EnergyIPMICalcAdjustment.
* Changes in Slurm 2.6.0rc1
===========================
 -- Added helper script for launching symmetric and MIC-only MPI tasks within
    SLURM (in contribs/mic/mpirun-mic).
 -- Change maximum delay for state save from 2 secs to 5 secs. Make timeout
    configurable at build time by defining SAVE_MAX_WAIT.
 -- Modify slurmctld data structure locking to interleave read and write
    locks rather than always favor write locks over read locks.
 -- Added sacct format option of "ALL" to print all fields.
 -- Deprecate the SchedulerParameters value of "interval" use "bf_interval"
    instead as documented.
 -- Add acct_gather_profile/hdf5 to profile jobs with hdf5
 -- Added MaxCPUsPerNode partition configuration parameter. This can be
    especially useful to schedule systems with GPUs.
 -- Permit "scontrol reboot_node" for nodes in MAINT reservation.
 -- Added "PriorityFlags" value of "SMALL_RELATIVE_TO_TIME". If set, the job's
    size component will be based upon not the job size alone, but the job's
    size divided by it's time limit.
 -- Added sbatch option "--ignore-pbs" to ignore "#PBS" options in the batch
    script.
 -- Rename slurm_step_ctx_params_t field from "mem_per_cpu" to "pn_min_memory".
    Job step now accepts memory specification in either per-cpu or per-node
    basis.
 -- Add ability to specify host repitition count in the srun hostfile (e.g.
    "host1*2" is equivalent to "host1,host1").
Danny Auble's avatar
Danny Auble committed

Morris Jette's avatar
Morris Jette committed
* Changes in Slurm 2.6.0pre3
============================
 -- Add milliseconds to default log message header (both RFC 5424 and ISO 8601
    time formats). Disable milliseconds logging using the configure
    parameter "--disable-log-time-msec". Default time format changes to
    ISO 8601 (without time zone information). Specify "--enable-rfc5424time"
    to restore the time zone information.
 -- Add username (%u) to the filename pattern in the batch script.
 -- Added options for front end nodes of AllowGroups, AllowUsers, DenyGroups,
    and DenyUsers.
 -- Fix sched/backfill logic to initiate jobs with maximum time limit over the
    partition limit, but the minimum time limit permits it to start.
 -- gres/gpu - Fix for gres.conf file with multiple files on a single line
    using a slurm expression (e.g. "File=/dev/nvidia[0-1]").
 -- Replaced ipmi.conf with generic acct_gather.conf file for all acct_gather
    plugins.  For those doing development to use this follow the model set
    forth in the acct_gather_energy_ipmi plugin.
Danny Auble's avatar
Danny Auble committed
 -- Added more options to update a step's information
 -- Add DebugFlags=ThreadID which will print the thread id of the calling
    thread.
 -- CRAY - Allocate whole node (CPUs) in reservation despite what the
    user requests.  We have found any srun/aprun afterwards will work on a
    subset of resources.
Morris Jette's avatar
Morris Jette committed
* Changes in Slurm 2.6.0pre2
Morris Jette's avatar
Morris Jette committed
============================
 -- Do not purge inactive interactive jobs that lack a port to ping (added
    for MR+ operation).
 -- Advanced reservations with hostname and core counts now supports asymetric
    reservations (e.g. specific different core count for each node).
 -- Added slurmctld/dynalloc plugin for MapReduce+ support.
 -- Added "DynAllocPort" configuration parameter.
 -- Added partition paramter of SelectTypeParameters to override system-wide
    value.
 -- Added cr_type to partition_info data structure.
 -- Added allocated memory to node information available (within the existing
    select_nodeinfo field of the node_info_t data structure). Added Allocated
    Memory to node information displayed by sview and scontrol commands.
 -- Make sched/backfill the default scheduling plugin rather than sched/builtin
    (FIFO).
 -- Added support for a job having different priorities in different partitions.
 -- Added new SchedulerParameters configuration parameter of "bf_continue"
    which permits the backfill scheduler to continue considering jobs for
    backfill scheduling after yielding locks even if new jobs have been
    submitted. This can result in lower priority jobs from being backfill
    scheduled instead of newly arrived higher priority jobs, but will permit
    more queued jobs to be considered for backfill scheduling.
 -- Added support to purge reservation records from accounting.
 -- Cray - Add support for Basil 1.3
Morris Jette's avatar
Morris Jette committed
* Changes in SLURM 2.6.0pre1
============================
 -- Add "state" field to job step information reported by scontrol.
 -- Notify srun to retry step creation upon completion of other job steps
    rather than polling. This results in much faster throughput for job step
    execution with --exclusive option.
 -- Added "ResvEpilog" and "ResvProlog" configuration parameters to execute a
    program at the beginning and end of each reservation.
 -- Added "slurm_load_job_user" function. This is a variation of
    "slurm_load_jobs", but accepts a user ID argument, potentially resulting
    in substantial performance improvement for "squeue --user=ID"
 -- Added "slurm_load_node_single" function. This is a variation of
    "slurm_load_nodes", but accepts a node name argument, potentially resulting
    in substantial performance improvement for "sinfo --nodes=NAME".
 -- Added "HealthCheckNodeState" configuration parameter identify node states
    on which HealthCheckProgram should be executed.
 -- Remove sacct --dump --formatted-dump options which were deprecated in
    2.5.
 -- Added support for job arrays (phase 1 of effort). See "man sbatch" option
    -a/--array for details.
 -- Add new AccountStorageEnforce options of 'nojobs' and 'nosteps' which will
    allow the use of accounting features like associations, qos and limits but
    not keep track of jobs or steps in accounting.
 -- Cray - Add new cray.conf parameter of "AlpsEngine" to specify the
    communication protocol to be used for ALPS/BASIL.
 -- select/cons_res plugin: Correction to CPU allocation count logic in for
    cores without hyperthreading.
 -- Added new SelectTypeParameter value of "CR_ALLOCATE_FULL_SOCKET".
 -- Added PriorityFlags value of "TICKET_BASED" and merged priority/multifactor2
    plugin into priority/multifactor plugin.
 -- Add "KeepAliveTime" configuration parameter controlling how long sockets
    used for srun/slurmstepd communications are kept alive after disconnect.
 -- Added SLURM_SUBMIT_HOST to salloc, sbatch and srun job environment.
 -- Added SLURM_ARRAY_TASK_ID to environment of job array.
 -- Added squeue --array/-r option to optimize output for job arrays.
 -- Added "SlurmctldPlugstack" configuration parameter for generic stack of
    slurmctld daemon plugins.
 -- Removed contribs/arrayrun tool. Use native support for job arrays.
 -- Modify default installation locations for RPMs to match "make install":
    _prefix /usr/local
    _slurm_sysconfdir %{_prefix}/etc/slurm
    _mandir %{_prefix}/share/man
    _infodir %{_prefix}/share/info
 -- Add acct_gather_energy/ipmi which works off freeipmi for energy gathering
Morris Jette's avatar
Morris Jette committed
* Changes in Slurm 2.5.8
========================
 -- Fix for slurmctld segfault on NULL front-end reason field.
 -- Avoid gres step allocation errors when a job shrinks in size due to either
    down nodes or explicit resizing. Generated slurmctld errors of this type:
    "step_test ... gres_bit_alloc is NULL"
 -- Fix bug that would leak memory and over-write the AllowGroups field if on
    "scontrol reconfig" when AllowNodes is manually changed using scontrol.
Loading
Loading full blame...