Skip to content
Snippets Groups Projects
NEWS 213 KiB
Newer Older
Christopher J. Morrone's avatar
Christopher J. Morrone committed
This file describes changes in recent versions of SLURM. It primarily
documents those changes that are of interest to users and admins.
* Changes in Slurm 14.03.0pre6
==============================
 -- Modify slurmstepd to log messages according to the LogTimeFormat
    parameter in slurm.conf.
 -- Insure that overlapping reservations do not oversubscribe available
    licenses.
 -- Added core specialization logic to select/cons_res plugin.
 -- When using FastSchedule = 1 the nodes with less than configured resources
    are not longer set DOWN, they are set to DRAIN instead.
 -- Modified 'sacctmgr show associations' command to show GrpCPURunMins
    by default.
 -- Replace the hostlist_push() function with a more efficient hostlist_push_host().
 -- Modify the reading of lustre file system statistics to print more information
    when debug and when io error occur.
Danny Auble's avatar
Danny Auble committed
* Changes in Slurm 14.03.0pre5
==============================
 -- Added squeue format option of "%X" (core specialization count).
 -- Added core specialization web page (just a start for now).
David Bigagli's avatar
David Bigagli committed
 -- Added the SLURM_ARRAY_JOB_ID and SLURM_ARRAY_TASK_ID
    in epilogi slurmctld environment.
 -- Fix bug in job step allocation failing due to memory limit.
 -- Modify the pbsnodes script to reflect its output on a TORQUE system.
 -- Add ability to clear a node's DRAIN flag using scontrol or sview by setting
    it's state to "UNDRAIN". The node's base state (e.g. "DOWN" or "IDLE") will
    not be changed.
 -- Modify the output of 'scontrol show partition' by displaying
    DefMemPerCPU=UNLIMITED and MaxMemPerCPU=UNLIMITED when these limits are
    configured as 0.
 -- mpirun-mic - Major re-write of the command wrapper for Xeon Phi use.
 -- Add new configuration parameter of AuthInfo to specify port used by
    authentication plugin.
David Bigagli's avatar
David Bigagli committed
 -- Fixed conditional rpm compiling.
 -- Corrected slurmstepd ident name when loggind to syslog.
 -- Fixed sh5util loop when there are no node-step files.
 -- Add SLURM_CLUSTER_NAME to environment variables passed to PrologSlurmctld,
    Prolog, EpilogSlurmctld, and Epilog
 -- Add the idea of running a prolog right when an allocation happens
    instead of when running on the node for the first time.
 -- If user runs 'scontrol reconfig' but hostnames or the host count changes
    the slurmctld throws a fatal error.
 -- gres.conf - Add "NodeName" specification so that a single gres.conf file
    can be used for a heterogeneous cluster.
 -- Add flag to accounting RPC to indicate if job data is packed or not.
 -- After all srun tasks have terminated on a node close the stdout/stderr
    channel with the slurmstepd on that node.
 -- In case of i/o error with slurmstepd log an error message and abort the
    job.
 -- Add --test-only option to sbatch command to validate the script and options.
    The response includes expected start time and resources to be allocated.
Danny Auble's avatar
Danny Auble committed
* Changes in Slurm 14.03.0pre4
==============================
 -- Remove the ThreadID documentation from slurm.conf. This functionality has
    been obsoleted by the LogTimeFormat.
 -- Sched plugins - rename global and plugin functions names for consistency
    with other plugin types.
 -- BGQ - Added RebootQOSList option to bluegene.conf to allow an implicate
    reboot of a block if only jobs in the list are running on it when cnodes
    go into a failure state.
 -- Correct task count of pending job steps.
 -- Improve limit enforcement for jobs, set RLIMIT_RSS, RLIMIT_AS and/or
    RLIMIT_DATA to enforce memory limit.
 -- Pending job steps will have step_id of INFINITE rather than NO_VAL and
    will be reported as "TBD" by scontrol and squeue commands.
 -- Add logic so PMI_Abort or PMI2_Abort can propagate an exit code.
 -- Added SlurmdPlugstack configuration parameter.
 -- Added PriorityFlag DEPTH_OBLIVIOUS to have the depth of an association
    not effect it's priorty.
 -- Multi-thread the sinfo command (one thread per partition).
Matthias Jurenz's avatar
Matthias Jurenz committed
 -- Added sgather tool to gather files from a job's compute nodes into a
    central location.
 -- Added configuration parameter FairShareDampeningFactor to offer a greater
    priority range based upon utilization.
 -- Change MaxArraySize and job's array_task_id from 16-bit to 32-bit field.
Morris Jette's avatar
Morris Jette committed
    Additional Slurm enhancements are be required to support larger job arrays.
 -- Added -S/--core-spec option to salloc, sbatch and srun commands to reserve
    specialized cores for system use. Modify scontrol and sview to get/set
Morris Jette's avatar
Morris Jette committed
    the new field. No enforcement exists yet for these new options.
    struct job_info / slurm_job_info_t: Added core_spec
    struct job_descriptorjob_desc_msg_t: Added core_spec
Danny Auble's avatar
Danny Auble committed
* Changes in Slurm 14.03.0pre3
==============================
 -- Do not set SLURM_NODEID environment variable on front-end systems.
 -- Convert bitmap functions to use int32_t instead of int in data structures
    and function arguments. This is to reliably enable use of bitmaps containing
    up to 4 billion elements. Several data structures containing index values
    were also changed from data type int to int32_t:
    - Struct job_info / slurm_job_info_t: Changed exc_node_inx, node_inx, and
      req_node_inx from type int to type int32_t
    - job_step_info_t: Changed node_inx from type int to type int32_t
    - Struct partition_info / partition_info_t: Changed node_inx from type int
      to type int32_t
    - block_job_info_t: Changed cnode_inx from type int to type int32_t
    - block_info_t: Changed ionode_inx and mp_inx from type int to type int32_t
    - Struct reserve_info / reserve_info_t: Changed node_inx from type int to
      type int32_t
 -- Modify qsub wrapper output to match torque command output, just print the
    job ID rather than "Submitted batch job #"
 -- Change Slurm error string for ESLURM_MISSING_TIME_LIMIT from
    "Missing time limit" to
    "Time limit specification required, but not provided"
 -- Change salloc job_allocate error message header from
    "Failed to allocate resources" to
    "Job submit/allocate failed"
 -- Modify slurmctld message retry logic to support Cray cold-standby SDB.
Danny Auble's avatar
Danny Auble committed
* Changes in Slurm 14.03.0pre2
==============================
 -- Added "JobAcctGatherParams" configuration parameter. Value of "NoShare"
    disables accounting for shared memory.
 -- Added fields to "scontrol show job" output: boards_per_node,
    sockets_per_board, ntasks_per_node, ntasks_per_board, ntasks_per_socket,
    ntasks_per_core, and nice.
 -- Add squeue output format options for job command and working directory
    (%o and %Z respectively).
 -- Add stdin/out/err to sview job output.
 -- Add new job_state of JOB_BOOT_FAIL for job terminations due to failure to
    boot it's allocated nodes or BlueGene block.
 -- CRAY - Add SelectTypeParameters NHC_NO_STEPS and NHC_NO which will disable
    the node health check script for steps and allocations respectfully.
 -- Reservation with CoreCnt: Avoid possible invalid memory reference.
 -- Add new error code for attempt to create a reservation with duplicate name.
 -- Validate that a hostlist file contains text (i.e. not a binary).
 -- switch/generic - propagate switch information from srun down to slurmd and
    slurmstepd.
 -- CRAY - Do not package Slurm's libpmi or libpmi2 libraries. The Cray version
    of those libraries must be used.
 -- Added a new option to the scontrol command to view licenses that are
    configured in use and avalable. 'scontrol show licenses'.
 -- MySQL - Made Slurm compatible with 5.6
Danny Auble's avatar
Danny Auble committed
* Changes in Slurm 14.03.0pre1
==============================
 -- sview - improve scalability
 -- Add task pointer to the task_post_term() function in task plugins. The
    terminating task's PID is available in task->pid.
 -- Move select/cray to select/alps
 -- Defer sending SIGKILL signal to processes while core dump in progress.
 -- Added JobContainerPlugin configuration parameter and plugin infrastructure.
 -- Added partition configuration parameters AllowAccounts, AllowQOS,
 -- The rpmbuild option for a cray system with ALPS has changed from
    %_with_cray to %_with_cray_alps.
 -- The log file timestamp format can now be selected at runtime via the
    LogTimeFormat configuration option. See the slurm.conf and slurmdbd.conf
 -- Added switch/generic plugin to a job's convey network topology.
 -- BLUEGENE - If block is in 'D' state or has more cnodes in error than
    MaxBlockInError set the job wait reason appropriately.
 -- API use: Generate an error return rather than fatal error and exit if the
    configuraiton file is absent or invalid. This will permit Slurm APIs to be
    more reliably used by other programs.
 -- Add support for load-based scheduling, allocate jobs to nodes with the
    largest number of available CPUs. Added SchedulingParameters paramter of
    "CR_LLN" and partition parameter of "LLN=yes|no".
 -- Added job_info() and step_info() functions to the gres plugins to extract
    plugin specific fields from the job's or step's GRES data structure.
 -- Added sbatch --signal option of "B:" to signal the batch shell rather than
    only the spawned job steps.
 -- Added sinfo and squeue format option of "%all" to print all fields available
    for the data type with a vertical bar separating each field.
 -- Add mechanism for job_submit plugin to generate error message for srun,
Morris Jette's avatar
Morris Jette committed
    salloc or sbatch to stderr. New argument added to job_submit function in
 -- Add StdIn, StdOut, and StdErr paths to job information dumped with
    "scontrol show job".
 -- Permit Slurm administrator to submit a batch job as any user.
 -- Set a job's RLIMIT_AS limit based upon it's memory limit and VsizeFactor
    configuration value.
 -- Make jobacct_gather/cgroup work correctly and also make all jobacct_gather
    plugins more maintainable.
 -- Proctrack/pgid - Add support for proctrack_p_plugin_get_pids() function.
 -- Sched/backfill - Change default max_job_bf parameter from 50 to 100.
 -- Added -I|--item-extract option to sh5util to extract data item from series.
Morris Jette's avatar
Morris Jette committed
* Changes in Slurm 2.6.6
========================
 -- sched/backfill - Fix bug that could result in failing to reserve resources
    for high priority jobs.
 -- Correct job RunTime if requeued from suspended state.
 -- Reset job priority from zero (held) on manual resume from suspend state.
 -- If FastSchedule=0 then do not DOWN a node with low memory or disk size.
Danny Auble's avatar
Danny Auble committed
 -- Remove vestigial note.
 -- Update sshare.1 man page making it consistent with sacctmgr.1.
 -- Increase the PW_BUF_SIZE so the getgrnam_r() can process large user groups.
 -- Do not reset a job's priority when the slurmctld restarts if previously
    set to some specific value.
 -- sview - Fix regression where the Node tab wasn't able to add/remove columns.
Morris Jette's avatar
Morris Jette committed
* Changes in Slurm 2.6.5
========================
 -- Correction to hostlist parsing bug introduced in v2.6.4 for hostlists with
    more than one numeric range in brackets (e.g. rack[0-3]_blade[0-63]").
 -- Add notification if using proctrack/cgroup and task/cgroup when oom hits.
Loading
Loading full blame...