Skip to content
Snippets Groups Projects
NEWS 228 KiB
Newer Older
Danny Auble's avatar
Danny Auble committed
This file describes changes in recent versions of Slurm. It primarily
documents those changes that are of interest to users and admins.
* Changes in Slurm 14.03.1
==========================
 -- Add support for job std_in, std_out and std_err fields in Perl API.
 -- Add "Scheduling Configuration Guide" web page.
 -- BGQ - fix check for jobinfo when it is NULL
 -- Do not check cleaning on "pending" steps.
Unknown's avatar
Unknown committed
 -- task/cgroup plugin - Fix for building on older hwloc (v1.0.2).
 -- In the PMI implementation by default don't check for duplicate keys.
    Set the SLURM_PMI_KVS_DUP_KEYS if you want the code to check for
    duplicate keys.
 -- Add job submission time to squeue.
 -- Permit user root to propagate resource limits higher than the hard limit
    slurmd has on that compute node has (i.e. raise both current and maximum
    limits).
 -- Fix issue with license used count when doing an scontrol reconfig.
 -- Fix the PMI iterator to not report duplicated keys.
 -- Fix issue with sinfo when -o is used without the %P option.
 -- Rather than immediately invoking an execution of the scheduling logic on
    every event type that can enable the execution of a new job, queue its
    execution. This permits faster execution of some operations, such as
    modifying large counts of jobs, by executing the scheduling logic less
    frequently, but still in a timely fashion.
 -- If the environment variable is greater than MAX_ENV_STRLEN don't
    set it in the job env otherwise the exec() fails.
* Changes in Slurm 14.03.0
==========================
 -- job_submit/lua: Fix invalid memory reference if script returns error message
    for user.
Morris Jette's avatar
Morris Jette committed
 -- Add logic to sleep and retry if slurm.conf can't be read.
 -- Reset a node's CpuLoad value at least once each SlurmdTimeout seconds.
 -- Scheduler enhancements for reservations: When a job needs to run in
    reservation, but can not due to busy resources, then do not block all jobs
    in that partition from being scheduled, but only the jobs in that
    reservation.
 -- Export "SLURM*" environment variables from sbatch even if --export=NONE.
 -- When recovering node state if the Slurm version is 2.6 or 2.5 set the
    protocol version to be SLURM_2_5_PROTOCOL_VERSION which is the minimum
    supported version.
 -- Update the scancel man page documenting the -s option.
 -- Update sacctmgr man page documenting how to modify account's QOS.
David Bigagli's avatar
David Bigagli committed
 -- Fix for sjstat which currently does not print >1TB memory values correctly.
 -- Change xmalloc()/xfree() to malloc()/free() in hostlist.c for better
    performance.
 -- Update squeue.1 man page describing the SPECIAL_EXIT state.
 -- Added scontrol option of errnumstr to return error message given a slurm
    error number.
 -- If srun invoked with the --multi-prog option, but no task count, then use
    the task count provided in the MPMD configuration file.
 -- Prevent sview abort on some systems when adding or removing columns to the
    display for nodes, jobs, partitions, etc.
Morris Jette's avatar
Morris Jette committed
 -- Add job array hash table for improved performance.
 -- Make AccountingStorageEnforce=all not include nojobs or nosteps.
 -- Added sacctmgr mod qos set RawUsage=0.
 -- Modify hostlist functions to accept more than two numeric ranges (e.g.
    "row[1-3]rack[0-8]slot[0-63]")
* Changes in Slurm 14.03.0rc1
Morris Jette's avatar
Morris Jette committed
==============================
 -- Fixed typos in srun_cr man page.
 -- Run job scheduling logic immediately when nodes enter service.
 -- Added sbatch '--parsable' option to output only the job id number and the
    cluster name separated by a semicolon. Errors will still be displayed.
Morris Jette's avatar
Morris Jette committed
 -- Added failure management "slurmctld/nonstop" plugin.
 -- Prevent jobs being killed when a checkpoint plugin is enabled or disabled.
 -- Update the documentation about SLURM_PMI_KVS_NO_DUP_KEYS environment
    variable.
 -- select/cons_res bug fix for range of node counts with --cpus-per-task
    option (e.g. "srun -N2-3 -c2 hostname" would allocate 2 CPUs on the first
    node and 0 CPUs on the second node).
 -- Change reservation flags field from 16 to 32-bits.
 -- Add reservation flag value of "FIRST_CORES".
 -- Added the idea of Resources to the database.  Framework for handling
    license servers outside of Slurm.
 -- When starting the slurmctld only send past job/node state information to
    accounting if running for the first time (should speed up startup
    dramatically on systems with lots of nodes or lots of jobs).
 -- Compile and run on FreeBSD 8.4.
 -- Make job array expressions more flexible to accept multiple step counts in
    the expression (e.g. "--array=1-10:2,50-60:5,123").
 -- switch/cray - add state save/restore logic tracking allocated ports.
 -- SchedulerParameters - Replace max_job_bf with bf_max_job_start (both will
    work for now).
 -- Add SchedulerParameters options of preempt_reorder_count and
    preempt_strict_order.
 -- Make memory types in acct_gather uint64_t to handle systems with more than
    4TB of memory on them.
 -- BGQ - --export=NONE option for srun to make it so only the SLURM_JOB_ID
    and SLURM_STEP_ID env vars are set.
 -- Munge plugins - Add sleep between retries if can't connect to socket.
 -- Added DebugFlags value of "License".
 -- Added --enable-developer which will give you -Werror when compiling.
 -- Fix for job request with GRES count of zero.
David Bigagli's avatar
David Bigagli committed
 -- Fix a potential memory leak in hostlist.
 -- Job array dependency logic: Cache results for major performance improvement.
 -- Modify squeue to support filter on job states Special_Exit and Resizing.
 -- Defer purging job record until after EpilogSlurmctld completes.
 -- Add -j option for jobid to sbcast.
 -- Fix handling RPCs from a 14.03 slurmctld to a 2.6 slurmd
* Changes in Slurm 14.03.0pre6
==============================
 -- Modify slurmstepd to log messages according to the LogTimeFormat
    parameter in slurm.conf.
 -- Insure that overlapping reservations do not oversubscribe available
    licenses.
 -- Added core specialization logic to select/cons_res plugin.
 -- Added whole_node field to job_resources structure and enable gang scheduling
    for jobs with core specialization.
 -- When using FastSchedule = 1 the nodes with less than configured resources
    are not longer set DOWN, they are set to DRAIN instead.
 -- Modified 'sacctmgr show associations' command to show GrpCPURunMins
    by default.
 -- Replace the hostlist_push() function with a more efficient
    hostlist_push_host().
 -- Modify the reading of lustre file system statistics to print more
    information when debug and when io error occur.
 -- Add specialized core count field to job credential data.
    NOTE: This changes the communications protocol from other pre-releases of
    version 14.03. All programs must be cancelled and daemons upgraded from
    previous pre-releases of version 14.03. Upgrades from version 2.6 or earlier
    can take place without loss of jobs
 -- Add version number to node and front-end configuration information visible
    using the scontrol tool.
 -- Add idea of a RESERVED flag for node state so idle resources are marked
    not "idle" when in a reservation.
 -- Added core specialization plugin infrastructure.
jette's avatar
jette committed
 -- Added new job_submit/trottle plugin to control the rate at which a user
    can submit jobs.
 -- CRAY - added network performance counters option.
 -- Allow scontrol suspend/resume to accept jobid in the format jobid_taskid
    to suspend/resume array elements.
Morris Jette's avatar
Morris Jette committed
 -- In the slurmctld job record, split "shared" variable into "share_res" (share
    resource) and "whole_node" fields.
 -- Fix the format of SLURM_STEP_RESV_PORTS. It was generated incorrectly
    when using the hostlist_push_host function and input surrounded by [].
 -- Modify the srun --slurmd-debug option to accept debug string tags
    (quiet, fatal, error, info verbose) beside the numerical values.
 -- Fix the bug where --cpu_bind=map_cpu is interpreted as mask_cpu.
 -- Update the documentation egarding the state of cpu frequencies after
    a step using --cpu-freq completes.
Danny Auble's avatar
Danny Auble committed
 -- CRAY - Fix issue when a job is requeued and nhc is still running as it is
    being scheduled to run again.  This would erase the previous job info
    that was still needed to clean up the nodes from the previous job run.
    (Bug 526).
 -- Set SLURM_JOB_PARTITION environment variable set for all job allocations.
 -- Set SLURM_JOB_PARTITION environment variable for Prolog program.
 -- Added SchedulerParameters option of partition_job_depth to limit scheduling
    logic depth by partition.
 -- Handle the case in which errno is not reset to 0 after calling
    getgrent_r(), which causes the controller to core dump.
Danny Auble's avatar
Danny Auble committed
* Changes in Slurm 14.03.0pre5
==============================
 -- Added squeue format option of "%X" (core specialization count).
 -- Added core specialization web page (just a start for now).
David Bigagli's avatar
David Bigagli committed
 -- Added the SLURM_ARRAY_JOB_ID and SLURM_ARRAY_TASK_ID
jette's avatar
jette committed
    in epilog slurmctld environment.
 -- Fix bug in job step allocation failing due to memory limit.
 -- Modify the pbsnodes script to reflect its output on a TORQUE system.
 -- Add ability to clear a node's DRAIN flag using scontrol or sview by setting
    it's state to "UNDRAIN". The node's base state (e.g. "DOWN" or "IDLE") will
    not be changed.
 -- Modify the output of 'scontrol show partition' by displaying
    DefMemPerCPU=UNLIMITED and MaxMemPerCPU=UNLIMITED when these limits are
    configured as 0.
 -- mpirun-mic - Major re-write of the command wrapper for Xeon Phi use.
 -- Add new configuration parameter of AuthInfo to specify port used by
    authentication plugin.
 -- Fixed conditional RPM compiling.
jette's avatar
jette committed
 -- Corrected slurmstepd ident name when logging to syslog.
 -- Fixed sh5util loop when there are no node-step files.
 -- Add SLURM_CLUSTER_NAME to environment variables passed to PrologSlurmctld,
    Prolog, EpilogSlurmctld, and Epilog
 -- Add the idea of running a prolog right when an allocation happens
    instead of when running on the node for the first time.
 -- If user runs 'scontrol reconfig' but hostnames or the host count changes
    the slurmctld throws a fatal error.
 -- gres.conf - Add "NodeName" specification so that a single gres.conf file
    can be used for a heterogeneous cluster.
 -- Add flag to accounting RPC to indicate if job data is packed or not.
 -- After all srun tasks have terminated on a node close the stdout/stderr
    channel with the slurmstepd on that node.
 -- In case of i/o error with slurmstepd log an error message and abort the
    job.
 -- Add --test-only option to sbatch command to validate the script and options.
    The response includes expected start time and resources to be allocated.
Danny Auble's avatar
Danny Auble committed
* Changes in Slurm 14.03.0pre4
==============================
 -- Remove the ThreadID documentation from slurm.conf. This functionality has
    been obsoleted by the LogTimeFormat.
 -- Sched plugins - rename global and plugin functions names for consistency
    with other plugin types.
 -- BGQ - Added RebootQOSList option to bluegene.conf to allow an implicate
    reboot of a block if only jobs in the list are running on it when cnodes
    go into a failure state.
Loading
Loading full blame...