Skip to content
Snippets Groups Projects
NEWS 464 KiB
Newer Older
David Bigagli's avatar
David Bigagli committed
This file describes changes in recent versions of Slurm. It primarily
documents those changes that are of interest to users and administrators.

* Changes in Slurm 17.11.0pre2
==============================
Morris Jette's avatar
Morris Jette committed
 -- Initial work for heterogeneous job support (complete solution in v17.11):
    * Modified salloc, sbatch and srun commands to parse command line, job
Morris Jette's avatar
Morris Jette committed
      script and environment variables to recognize requests for heterogeneous
      jobs. Same commands also modified to set environment variables describing
      each component of the heterogeneous job.
    * Modified job allocate, batch job submit and job "will-run" requests to
      pass a list of job specifications and get a list of responses.
Morris Jette's avatar
Morris Jette committed
    * Modify slurmctld daemon to process a heterogeneous job request and create
      multiple job records as needed.
    * Added new fields to job record: pack_job_id, pack_job_offset and
      pack_job_set (set of job IDs). Added to slurmctld state save/restore
      logic and job information reported.
    * Display new job fields in "scontrol show job" output.
    * Modify squeue command to display heterogeneous job records using "#+#"
      format. The squeue --job=# output lists all components of a heterogeneous
      job.
Morris Jette's avatar
Morris Jette committed
    * Modify scancel logic to cancel all components of a heterogeneous job with
      a single request/RPC.
    * Configuration parameter DebugFlags value of "HeteroJobs" added.
    * Job requeue and suspend/resume modified to operate on all components of
Morris Jette's avatar
Morris Jette committed
      a heterogeneous job with a single request/RPC.
    * New web page added to describe heterogeneous jobs.
    * Descriptions of new API added to man pages.
    * Modified email notifications to only operate on the first job component.
Morris Jette's avatar
Morris Jette committed
    * Purge heterogeneous job records at the same time and not by individual
Morris Jette's avatar
Morris Jette committed
    * Modified logic for heterogeneous jobs submitted to multiple clusters
      ("--clusters=...") so the job will be routed to the cluster that is
      expected to start all components earliest.
Morris Jette's avatar
Morris Jette committed
    * Modified srun to create multiple job steps for heterogeneous job
      allocations.
    * Modified launch plugin to accept a pointer to job step options structure
      rather than work from a single/common data structure.
 -- Improve backfill scheduling algorithm with respect to starting jobs as soon
    as possible while avoiding advanced reservations.
 -- Work for heterogeneous job support (complete solution in v17.11):
    * Add pointer to job option structure to job_step_create_allocation()
      function.
    * Parallelize task launch for heterogeneous job allocations (initial work).
    * Make packjobid, packjoboffset, and packjobidset fields available in squeue
      output.
    * Modify smap command to display heterogeneous job records using "#+#"
      format.
* Changes in Slurm 17.11.0pre1
==============================
 -- Interpet all format options in output/error file to log prolog errors. Prior
    logic only supported "%j" (job ID) option.
 -- Add the configure option --with-shared-libslurm which will link to
    libslurm.so instead of libslurm.o thus reducing the footprint of all the
    binaries.
 -- In switch plugin, added plugin_id symbol to plugins and wrapped
    switch_jobinfo_t with dynamic_plugin_data_t in interface calls in
    order to pass switch information between clusters with different switch
    types.
 -- Switch naming of acct_gather_infiniband to acct_gather_interconnect
Morris Jette's avatar
Morris Jette committed
 -- Make it so you can "stack" the interconnect plugins.
 -- Add a last_sched_eval timestamp to record when a job was last evaluated
    by the main scheduler or backfill.
 -- Add scancel "--hurry" option to avoid staging out any burst buffer data.
 -- Simplify the sched plugin interface.
 -- Add new advanced reservation flags of "weekday" (repeat on each weekday;
    Monday through Friday) and "weekend" (repeat on each weekend day; Saturday
    and Sunday).
 -- Add new advanced reservation flag of "flex", which permits jobs requesting
    the reservation to begin prior to the reservation's start time and use
    resources inside or outside of the reservation. A typical use case is to
Morris Jette's avatar
Morris Jette committed
    prevent jobs not explicitly requesting the reservation from using those
    reserved resources rather than forcing jobs requesting the reservation to
    use those resources in the time frame reserved.
 -- Add NoDecay flag to QOS.
Morris Jette's avatar
Morris Jette committed
 -- Node "OS" field expanded from "sysname" to "sysname release version" (e.g.
    change from "Linux" to
    "Linux 4.8.0-28-generic #28-Ubuntu SMP Sat Feb 8 09:15:00 UTC 2017").
 -- jobcomp/elasticsearch - Add "job_name" and "wc_key" fields to stored
    information.
Morris Jette's avatar
Morris Jette committed
 -- jobcomp/filetxt - Add ArrayJobId, ArrayTaskId, ReservationName, Gres,
    Account, QOS, WcKey, Cluster, SubmitTime, EligibleTime, DerivedExitCode and
    ExitCode.
 -- scontrol modified to report core IDs for reservation containing individual
    cores.
 -- MYSQL - Get rid of table join during rollup which speeds up the process
    dramatically on large job/step tables.
 -- Add ability to define features on clusters for directing federated jobs to
    different clusters.
 -- Add new RPC to process multiple federation RPCs in a single communication.
 -- Modify slurm_load_jobs() function to load job information from all clusters
    in a federation.
 -- Add squeue --local and --sibling options to modify filtering of jobs on
    federated clusters.
 -- Add SchedulerParameters option of bf_max_job_user_part to specifiy the
    maximum number of jobs per user for any single partition. This differs from
    bf_max_job_user in that a separate counter is applied to each partition
    rather than having a single counter per user applied to all partitions.
 -- Modify backfill logic so that bf_max_job_user, bf_max_job_part and
    bf_max_job_user_part options can all be used independently of each other.
 -- Add sprio -p/--partition option to filter jobs by partition name.
 -- Add partition name to job priority factor response message.
 -- Add sprio --local and --sibling options for use in federation of clusters.
 -- Add sprio "%c" format to print cluster name in federation mode.
 -- Modify sinfo logic to provided unified view of all nodes and partitions
    in a federation, add --local option to only report local state information
    even in a cluster, print cluster name with "%V" format option, and
    optionally sort by cluster name.
 -- If a task in a parallel job fails and it was launched with the
Morris Jette's avatar
Morris Jette committed
    --kill-on-bad-exit option then terminate the remaining tasks using the
    SIGCONT, SIGTERM and SIGKILL signals rather than just sending SIGKILL.
 -- Include submit_time when doing the sort for job scheduling.
 -- Modify sacct to report all jobs in federation by default. Also add --local
    option.
 -- Modify sacct to accept "--cluster all" option (in addition to the old
    "--cluster -1", which is still accepted).
 -- Modify sreport to report all jobs in federation by default. Also add --local
    option.
 -- sched/backfill: Improve assoc_limit_stop configuration parameter support.
 -- KNL features: Always keep active and available features in the same order:
    first site-specific features, next MCDRAM modes, last NUMA modes.
 -- Changed default ProctrackType to cgroup.
 -- Add "cluster_name" field to node_info_t and partition_info_t data structure.
    It is filled in only when the cluster is part of a federation and
    SHOW_FEDERATION flag used.
 -- Functions slurm_load_node() slurm_load_partitions() modified to show all
    nodes/partitions in a federation when the SHOW_FEDERATION flag is used.
 -- Add federated views to sview.
 -- Add --federation option to sacct, scontrol, sinfo, sprio, squeue, sreport to
    show a federated view. Will show local view by default.
 -- Add FederationParameters=fed_display slurm.conf option to configure status
    commands to display a federated view by default if the cluster is a member
    of a federation.
 -- Log the down nodes whenever slurmctld restarts.
 -- Report that "CPUs" plus "Boards" in node configuration invalid only if the
    CPUs value is not equal to the total thread count.
 -- Extend the output of the seff utility to also include the job's wall-clock
    time.
 -- Add bf_max_time to SchedulerParameters.
 -- Add bf_max_job_assoc to SchedulerParameters.
 -- Add new SchedulerParameters option bf_window_linear to control the rate at
    which the backfill test window expands. This can be used on a system with
    a modest number of running jobs (hundreds of jobs) to help prevent expected
    start times of pending jobs to get pushed forward in time. On systems with
    large numbers of running jobs, performance of the backfill scheduler will
    suffer and fewer jobs will be evaluated.
 -- Improve scheduling logic with respect to license use and node reboots.
 -- CRAY - Alter algorithm to come up with the SLURM_ID_HASH.
Brian Christiansen's avatar
Brian Christiansen committed
 -- Implement federated scheduling and federated status outputs.
Morris Jette's avatar
Morris Jette committed
* Changes in Slurm 17.02.6
==========================
 -- Fix configurator.easy.html to output the SelectTypeParameters line.
Tim Wickberg's avatar
Tim Wickberg committed
* Changes in Slurm 17.02.5
==========================
 -- Prevent segfault if a job was blocked from running by a QOS that is then
    deleted.
 -- Improve selection of jobs to preempt when there are multiple partitions
    with jobs subject to preemption.
 -- Only set kmem limit when ConstrainKmemSpace=yes is set in cgroup.conf.
 -- Fix bug in task/affinity that could result in slurmd fatal error.
 -- Increase number of jobs that are tracked in the slurmd as finishing at one
    time.
 -- Note when a job finishes in the slurmd to avoid a race when launching a
    batch job takes longer than it takes to finish.
 -- Improve slurmd startup on large systems (> 10000 nodes)
 -- Add LaunchParameters option of cray_net_exclusive to control whether all
    jobs on the cluster have exclusive access to their assigned nodes.
 -- Make sure srun inside an allocation gets --ntasks-per-[core|socket]
    set correctly.
 -- Only make the extern step at job creation.
 -- Fix for job step task layout with --cpus-per-task option.
 -- Fix --ntasks-per-core option/environment variable parsing to set
    the requested value, instead of always setting one (srun).
 -- Correct error message when ClusterName in configuration files does not match
    the name in the slurmctld daemon's state save file.
 -- Better checking when a job is finishing to avoid underflow on job's
    submitted to a QOS/association.
 -- Handle partition QOS submit limits correctly when a job is submitted to
    more than 1 partition or when the partition is changed with scontrol.
 -- Performance boost for when Slurm is dealing with credentials.
 -- Fix race condition which could leave a stepd hung on shutdown.
 -- Add lua support for opensuse.
* Changes in Slurm 17.02.4
==========================
 -- Do not attempt to schedule jobs after changing the power cap if there are
    already many active threads.
 -- Job expansion example in FAQ enhanced to demonstrate operation in
    heterogeneous environments.
 -- Prevent scontrol crash when operating on array and no-array jobs at once.
 -- knl_cray plugin: Log incomplete capmc output for a node.
 -- knl_cray plugin: Change capmc parsing of mcdram_pct from string to number.
 -- Remove log files from test20.12.
 -- When rebooting a node and using the PrologFlags=alloc make sure the
    prolog is ran after the reboot.
 -- node_features/knl_generic - If a node is rebooted for a pending job, but
    fails to enter the desired NUMA and/or MCDRAM mode then drain the node and
Loading
Loading full blame...