Skip to content
Snippets Groups Projects
NEWS 475 KiB
Newer Older
David Bigagli's avatar
David Bigagli committed
This file describes changes in recent versions of Slurm. It primarily
documents those changes that are of interest to users and administrators.

Morris Jette's avatar
Morris Jette committed
* Changes in Slurm 17.11.0pre3
==============================
 -- Added the following jobcomp/script environment variables: CLUSTER,
    DEPENDENCY, DERIVED_EC, EXITCODE, GROUPNAME, QOS, RESERVATION, USERNAME.
    The format of LIMIT (job time limit) has been modified to D-HH:MM:SS.
 -- Fix QOS usage factor applying to individual TRES run minute usage.
 -- Print numbers using exponential format if required to fit in allocated
    field width. The sacctmgr and sshare commands are impacted.
 -- Make it so a backup DBD doesn't attempt to create database tables and
    relies on the primary to do so.
 -- By default have Slurm dynamically link to libslurm.so instead of static
    linking.  If static linking is desired configure with
    --without-shared-libslurm.
 -- Change --workdir in sbatch to be --chdir as in all other commands (salloc,
    srun).
 -- Add WorkDir to the job record in the database.
 -- Make the UsageFactor of a QOS work when a qos has the nodecay flag.
 -- Add MaxQueryTimeRange option to slurmdbd.conf to limit accounting query
    ranges when fetching job records.
 -- Add LaunchParameters=batch_step_set_cpu_freq to allow the setting of the cpu
    frequency on the batch step.
 -- CRAY - Fix statically linked applications to CRAY's PMI.
 -- Fix - Raise an error back to the user when trying to update currently
    unsupported core-based reservations.
 -- Do not print TmpDisk space as part of 'slurmd -C' line.
 -- Fix to test MaxMemPerCPU/Node partition limits when scheduling, previously
    only checked on submit.
 -- Work for heterogeneous job support (complete solution in v17.11):
    * Set SLURM_PROCID environment variable to reflect global task rank (needed
      by MPI).
    * Set SLURM_NTASKS environment variable to reflect global task count (needed
      by MPI).
    * In srun, if only some steps are allocated and one step allocation fails,
      then delete all allocated steps.
    * Get SPANK plungins working with heterogeneous jobs. The
      spank_init_post_opt() function is executed once per job component.
    * Modify sbcast command and srun's --bcast option to support heterogeneous
      jobs.
    * Set more environment variables for MPI: SLURM_GTIDS and SLURM_NODEID.
Morris Jette's avatar
Morris Jette committed
    * Prevent a heterogeneous job allocation from including the same nodes in
      multiple components (required by MPI jobs spanning components).
    * Modify step create logic so that call components of a heterogeneous job
      launched by a single srun command have the same step ID value.
 -- Modify output of "--mpi=list" to avoid duplicates for version numbers in
    mpi/pmix plugin names.
* Changes in Slurm 17.11.0pre2
==============================
Morris Jette's avatar
Morris Jette committed
 -- Initial work for heterogeneous job support (complete solution in v17.11):
    * Modified salloc, sbatch and srun commands to parse command line, job
Morris Jette's avatar
Morris Jette committed
      script and environment variables to recognize requests for heterogeneous
      jobs. Same commands also modified to set environment variables describing
      each component of the heterogeneous job.
    * Modified job allocate, batch job submit and job "will-run" requests to
      pass a list of job specifications and get a list of responses.
Morris Jette's avatar
Morris Jette committed
    * Modify slurmctld daemon to process a heterogeneous job request and create
      multiple job records as needed.
    * Added new fields to job record: pack_job_id, pack_job_offset and
      pack_job_set (set of job IDs). Added to slurmctld state save/restore
      logic and job information reported.
    * Display new job fields in "scontrol show job" output.
    * Modify squeue command to display heterogeneous job records using "#+#"
      format. The squeue --job=# output lists all components of a heterogeneous
      job.
Morris Jette's avatar
Morris Jette committed
    * Modify scancel logic to cancel all components of a heterogeneous job with
      a single request/RPC.
    * Configuration parameter DebugFlags value of "HeteroJobs" added.
    * Job requeue and suspend/resume modified to operate on all components of
Morris Jette's avatar
Morris Jette committed
      a heterogeneous job with a single request/RPC.
    * New web page added to describe heterogeneous jobs.
    * Descriptions of new API added to man pages.
    * Modified email notifications to only operate on the first job component.
Morris Jette's avatar
Morris Jette committed
    * Purge heterogeneous job records at the same time and not by individual
Morris Jette's avatar
Morris Jette committed
    * Modified logic for heterogeneous jobs submitted to multiple clusters
      ("--clusters=...") so the job will be routed to the cluster that is
      expected to start all components earliest.
Morris Jette's avatar
Morris Jette committed
    * Modified srun to create multiple job steps for heterogeneous job
      allocations.
    * Modified launch plugin to accept a pointer to job step options structure
      rather than work from a single/common data structure.
 -- Improve backfill scheduling algorithm with respect to starting jobs as soon
    as possible while avoiding advanced reservations.
 -- Add URG as an option to 'scancel --signal'.
 -- Check if the buffer returned from slurm_persist_msg_pack() isn't NULL.
 -- Modify all daemons to re-open log files on receipt of SIGUSR2 signal. This
    is much than using SIGHUP to re-read the configuration file and rebuild
    various tables.
 -- Add PrivateData=events configuration parameter
 -- Work for heterogeneous job support (complete solution in v17.11):
    * Add pointer to job option structure to job_step_create_allocation()
      function used by srun.
    * Parallelize task launch for heterogeneous job allocations (initial work).
    * Make packjobid, packjoboffset, and packjobidset fields available in squeue
      output.
    * Modify smap command to display heterogeneous job records using "#+#"
      format.
    * Add srun --pack-group and --mpi-combine options to control job step
      launch behaviour (not fully implemented).
    * Add pack job component ID to srun --label output (e.g. "P0 1:" for
      job component 0 and task 1).
    * jobcomp/elasticsearch: Add pack_job_id and pack_job_offset fields.
    * sview: Modified to display pack job information.
    * Major re-write of task state container logic to support for list of
      containers rather than one container per srun command.
    * Add some regression tests.
    * Add srun pack job environment variables when performing job allocation.
 -- Set Reason=dependency over Reason=JobArrayTaskLimit for pending jobs.
 -- Add slurm.conf configuration parameters SlurmctldSyslogDebug and
    SlurmdSyslogDebug to control which messages from the slurmctld and slurmd
    daemons get written to syslog.
 -- Add slurmdbd.conf configuration parameter DebugLevelSyslog to control which
    messages from the slurmdbd daemon get written to syslog.
 -- Fix handling of GroupUpdateForce option.
Morris Jette's avatar
Morris Jette committed
 -- Work for heterogeneous job support (complete solution in v17.11):
    * Add support to sched/backfill for concurrent allocation of all pack job
      components including support of --time-min option.
    * Defer initiation of a heterogeneous job until a components can be started
      at the same time, taking into consideration association and QOS limits
      for the job as a whole.
    * Perform limit check on heterogeneous job as a whole at submit time to
      reject jobs that will never be able to run.
Morris Jette's avatar
Morris Jette committed
    * Add pack_job_id and pack_job_offset to accounting database.
    * Modified sacct to accept pack job ID specification using "#+#" notation.
    * Modified sstat to accept pack job ID specification using "#+#" notation.
 -- Clear a job's "wait reason" value of BeginTime" after that time has passed.
    Previously a readon of "BeginTime" could be reported long after the job's
    requested begin time had passed.
 -- Split group_info in slurm_ctl_conf_t into group_force and group_time.
 -- Work for heterogeneous job support (complete solution in v17.11):
    * Fix I/O race condition on step termination for srun launching multiple
      pack job groups.
    * If prolog is running when attempting to signal a step, then return EAGAIN
      and retry rather than simply returning SLURM_ERROR and aborting.
Morris Jette's avatar
Morris Jette committed
    * Modify launch/slurm plugin to signal all components of a pack job rather
      than just the one (modify to use a list of step context records).
    * Add logic to support srun --mpi-combine option.
    * Set up debugger data structures.
    * Disable cancellation of individual component while the job is pending.
    * Modify scontrol job hold/release and update to operate with heterogeneous
      job id specification (e.g. "scontrol hold 123+4").
    * If srun lacks application specification for some component, the next one
      specified will be used for earlier components.
* Changes in Slurm 17.11.0pre1
==============================
 -- Interpet all format options in output/error file to log prolog errors. Prior
    logic only supported "%j" (job ID) option.
 -- Add the configure option --with-shared-libslurm which will link to
    libslurm.so instead of libslurm.o thus reducing the footprint of all the
    binaries.
 -- In switch plugin, added plugin_id symbol to plugins and wrapped
    switch_jobinfo_t with dynamic_plugin_data_t in interface calls in
    order to pass switch information between clusters with different switch
    types.
 -- Switch naming of acct_gather_infiniband to acct_gather_interconnect
Morris Jette's avatar
Morris Jette committed
 -- Make it so you can "stack" the interconnect plugins.
 -- Add a last_sched_eval timestamp to record when a job was last evaluated
    by the main scheduler or backfill.
 -- Add scancel "--hurry" option to avoid staging out any burst buffer data.
 -- Simplify the sched plugin interface.
 -- Add new advanced reservation flags of "weekday" (repeat on each weekday;
    Monday through Friday) and "weekend" (repeat on each weekend day; Saturday
    and Sunday).
 -- Add new advanced reservation flag of "flex", which permits jobs requesting
    the reservation to begin prior to the reservation's start time and use
    resources inside or outside of the reservation. A typical use case is to
Morris Jette's avatar
Morris Jette committed
    prevent jobs not explicitly requesting the reservation from using those
    reserved resources rather than forcing jobs requesting the reservation to
    use those resources in the time frame reserved.
 -- Add NoDecay flag to QOS.
Morris Jette's avatar
Morris Jette committed
 -- Node "OS" field expanded from "sysname" to "sysname release version" (e.g.
    change from "Linux" to
    "Linux 4.8.0-28-generic #28-Ubuntu SMP Sat Feb 8 09:15:00 UTC 2017").
 -- jobcomp/elasticsearch - Add "job_name" and "wc_key" fields to stored
    information.
Morris Jette's avatar
Morris Jette committed
 -- jobcomp/filetxt - Add ArrayJobId, ArrayTaskId, ReservationName, Gres,
    Account, QOS, WcKey, Cluster, SubmitTime, EligibleTime, DerivedExitCode and
    ExitCode.
 -- scontrol modified to report core IDs for reservation containing individual
    cores.
 -- MYSQL - Get rid of table join during rollup which speeds up the process
    dramatically on large job/step tables.
 -- Add ability to define features on clusters for directing federated jobs to
    different clusters.
 -- Add new RPC to process multiple federation RPCs in a single communication.
 -- Modify slurm_load_jobs() function to load job information from all clusters
    in a federation.
 -- Add squeue --local and --sibling options to modify filtering of jobs on
    federated clusters.
 -- Add SchedulerParameters option of bf_max_job_user_part to specifiy the
    maximum number of jobs per user for any single partition. This differs from
    bf_max_job_user in that a separate counter is applied to each partition
    rather than having a single counter per user applied to all partitions.
 -- Modify backfill logic so that bf_max_job_user, bf_max_job_part and
    bf_max_job_user_part options can all be used independently of each other.
 -- Add sprio -p/--partition option to filter jobs by partition name.
Loading
Loading full blame...