Skip to content
Snippets Groups Projects
NEWS 486 KiB
Newer Older
David Bigagli's avatar
David Bigagli committed
This file describes changes in recent versions of Slurm. It primarily
documents those changes that are of interest to users and administrators.

* Changes in Slurm 17.11.0rc3
==============================
 -- Fix extern step to wait until launched before allowing job to start.
 -- Add missing locks around figuring out TRES when clean starting the
    slurmctld.
 -- Cray modulefile: avoid removing /usr/bin from path on module unload.
 -- Make reoccurring reservations show up in the database.
 -- Adjust related resources (cpus, tasks, gres, mem, etc.) when updating
    NumNodes with scontrol.
 -- Don't initialize MPI plugins for batch or extern steps.`
 -- slurm.spec - do not install a slurm.conf file under /etc/ld.so.conf.d.
 -- X11 forwarding - fix keepalive message generation code.
Morris Jette's avatar
Morris Jette committed
 -- If heterogeneous job step is unable to acquire MPI reserved ports then
    avoid referencing NULL pointer. Retry assigning ports ONLY for
    non-heterogeneous job steps.
 -- If any acct_gather_*_init fails fatal instead of error and keep going.
 -- launch/slurm plugin - Avoid using global variable for heterogeneous job
    steps, which could corrupt memory.
* Changes in Slurm 17.11.0rc2
==============================
Morris Jette's avatar
Morris Jette committed
 -- Prevent slurmctld abort with NodeFeatures=knl_cray and non-KNL nodes lacking
    any configured features.
 -- The --cpu_bind and --mem_bind options have been renamed to --cpu-bind
    and --mem-bind for consistency with the rest of Slurm's options. Both
    old and new syntaxes are supported for now.
 -- Add slurmdb_connection_commit to the slurmdb api to commit when needed.
 -- Add the federation api's to the slurmdb.h file.
 -- Add job functions to the db_api.
 -- Fix sacct to always use the db_api instead of sometimes calling functions
    directly.
 -- Fix sacctmgr to always use the db_api instead of sometimes calling functions
    directly.
 -- Fix sreport to always use the db_api instead of sometimes calling functions
    directly.
 -- Make global uid to the db_api to minimize calls to getuid().
Morris Jette's avatar
Morris Jette committed
 -- Add support for HWLOC version 2.0.
 -- Added more validation logic for updates to node features.
 -- Added node_features_p_node_update_valid() function to node_features plugin.
 -- If a job is held due to bad constraints and a node's features change then
    test the job again to see if can run with the new features.
 -- Added node_features_p_changible_feature() function to node_features plugin.
 -- Avoid rebooting a node if a job's requested feature is not under the control
    of the node_features plugin and is not currently active.
 -- node_features/knl_generic plugin: Do not clear a node's non-KNL features
    specified in slurm.conf.
 -- Added SchedulerParameters configuration option "disable_hetero_steps" to
    disable job steps that span multiple components of a heterogeneous job.
    Disabled by default except with mpi/none plugin. This limitation to be
    removed in Slurm version 18.08.
* Changes in Slurm 17.11.0rc1
Morris Jette's avatar
Morris Jette committed
==============================
 -- Added the following jobcomp/script environment variables: CLUSTER,
    DEPENDENCY, DERIVED_EC, EXITCODE, GROUPNAME, QOS, RESERVATION, USERNAME.
    The format of LIMIT (job time limit) has been modified to D-HH:MM:SS.
 -- Fix QOS usage factor applying to individual TRES run minute usage.
 -- Print numbers using exponential format if required to fit in allocated
    field width. The sacctmgr and sshare commands are impacted.
 -- Make it so a backup DBD doesn't attempt to create database tables and
    relies on the primary to do so.
 -- By default have Slurm dynamically link to libslurm.so instead of static
    linking.  If static linking is desired configure with
    --without-shared-libslurm.
 -- Change --workdir in sbatch to be --chdir as in all other commands (salloc,
    srun).
 -- Add WorkDir to the job record in the database.
 -- Make the UsageFactor of a QOS work when a qos has the nodecay flag.
 -- Add MaxQueryTimeRange option to slurmdbd.conf to limit accounting query
    ranges when fetching job records.
 -- Add LaunchParameters=batch_step_set_cpu_freq to allow the setting of the cpu
    frequency on the batch step.
 -- CRAY - Fix statically linked applications to CRAY's PMI.
 -- Fix - Raise an error back to the user when trying to update currently
    unsupported core-based reservations.
 -- Do not print TmpDisk space as part of 'slurmd -C' line.
 -- Fix to test MaxMemPerCPU/Node partition limits when scheduling, previously
    only checked on submit.
 -- Work for heterogeneous job support (complete solution in v17.11):
    * Set SLURM_PROCID environment variable to reflect global task rank (needed
      by MPI).
    * Set SLURM_NTASKS environment variable to reflect global task count (needed
      by MPI).
    * In srun, if only some steps are allocated and one step allocation fails,
      then delete all allocated steps.
    * Get SPANK plungins working with heterogeneous jobs. The
      spank_init_post_opt() function is executed once per job component.
    * Modify sbcast command and srun's --bcast option to support heterogeneous
      jobs.
    * Set more environment variables for MPI: SLURM_GTIDS and SLURM_NODEID.
Morris Jette's avatar
Morris Jette committed
    * Prevent a heterogeneous job allocation from including the same nodes in
      multiple components (required by MPI jobs spanning components).
    * Modify step create logic so that call components of a heterogeneous job
      launched by a single srun command have the same step ID value.
 -- Modify output of "--mpi=list" to avoid duplicates for version numbers in
    mpi/pmix plugin names.
 -- Allow nodes to be rebooted while in a maintenance reservation.
 -- Show nodes as down even when nodes are in a maintenance reservation.
 -- Harden the slurmctld HA stack to mitigate certain split-brain issues.
 -- Work for heterogeneous job support (complete solution in v17.11):
    * Add burst buffer support.
    * Remove srun's --mpi-combine option (always combined).
    * Add SchedulerParameters configuration option "enable_hetero_steps" to
      enable job steps that span multiple components of a heterogeneous job.
      Disabled by default as most MPI implementations and Slurm configurations
      are not currently supported. Limitation to be removed in Slurm version
      18.08.
    * Synchronize application launch across multiple components with debugger.
Morris Jette's avatar
Morris Jette committed
    * Modify slurm_kill_job_step() to cancel all components of a heterogeneous
      job step (used by MPI).
    * Set SLURM_JOB_NUM_NODES environment variable as needed by MVAPICH.
    * Base time limit upon the time that the latest job component is available
      (after all nodes in all components booted and ready for use).
 -- Add cluster name to smail tool email header.
 -- Speedup arbitrary distribution algorithm.
 -- Modify "srun --mpi=list" output to match valid option input by removing the
    "mpi/" prefix on each line of output.
 -- Automatically set the reservation's partition for the job if not the
    cluster default.
 -- mpi/pmi2 plugin - vestigial pointer could be referenced at shutdown with
    invalid memory reference resulting.
 -- Fix to _is_gres_cnt_zero() return false for improper input string
 -- Cleanup all pthread_create calls and replace with new slurm_thread_create
    macro.
 -- Removed obsolete MPI plugins. Remaining options are openmpi, pmi2, pmix.
 -- Removed obsolete checkpoint/poe plugin.
 -- Process spank environment variable options before processing spank command
    line options. Spank plugins should be able to handle option callbacks being
    called multiple times.
 -- Add support for specialized cores with task/affinity plugin (previously
    only supported with task/cgroup plugin).
 -- Add "TaskPluginParam=SlurmdOffSpec" option that will prevent the Slurm
    compute node daemons (slurmd and slurmstepd) from executing on specialized
    cores.
 -- CRAY - Make native mode default, use --disable-native-cray to use ALPS
    instead of native Slurm.
 -- Add ability to prevent suspension of some count of nodes in a specified
    range using the SuspendExcNodes configuration parameter.
 -- Add SLURM_WCKEY to PrologSlurmctld and EpilogSlurmctld  environment.
 -- Return user response string in response to successful job allocation request
    not only on failure. Set in LUA using function 'slurm.user_msg("STRING")'.
 -- Add 'scontrol write batch_script <jobid>' command to retrieve the batch
    script for a given job.
 -- Remove option to display the batch script as part of 'scontrol show job'.
 -- On native Cray system the configured RebootProgram is executed on on the
    head node by the slurmctld daemon rather than by the slurmd daemons on the
    compute nodes. The "capmc_resume" program from "contribs/cray" can be used.
 -- Modify "scontrol top" command to accept a comma separated list of job IDs
    as an argument rather than a single job ID.
 -- Add MemorySwappiness value to cgroup.conf.
 -- Add new "billing" TRES which allows jobs to be limited based on the job's
    billable TRES calculated by the job's partition's TRESBillingWeights.
 -- sbatch - force line-buffered output so 'sbatch -W' returns the jobid
    over a piped output immediately.
 -- Regular user use of "scontrol top" command is now diabled. Use the
    configuration parameter "SchedulerParameters=enable_user_top" to enable
    that functionality. The configuration parameter
    "SchedulerParameters=disable_user_top" will be silently ignored.
 -- Add -TALL to sreport.
 -- Removed unused SlurmdPlugstack option and associated framework.
 -- Correct logic for line continuation in srun --multi-prog file.
 -- Add DBD Agent queue size to sdiag output.
 -- Add running job count to sdiag output.
 -- Print unix timestamps next to ASCII timestamps in sdiag output.
 -- In a job allocation spanning KNL and non-KNL nodes and requiring a reboot,
    do not attempt to set default NUMA or MCDRAM modes on non-KNL nodes.
 -- Change default to let pending jobs run outside of reservation after
    reservation is gone to put jobs in held state. Added NO_HOLD_JOBS_AFTER_END
    reservation flag to use old default.
 -- When creating a reservation, validate the CoreCnt specification matches
    the number of nodes listed.
 -- When creating a reservation, correct logic to ignoring job allocations on
    request.
 -- Deprecate BLCR plugin, and do not build by default.
 -- Change sreport report titles from "Use" to "Usage"
* Changes in Slurm 17.11.0pre2
==============================
Morris Jette's avatar
Morris Jette committed
 -- Initial work for heterogeneous job support (complete solution in v17.11):
    * Modified salloc, sbatch and srun commands to parse command line, job
Morris Jette's avatar
Morris Jette committed
      script and environment variables to recognize requests for heterogeneous
      jobs. Same commands also modified to set environment variables describing
      each component of the heterogeneous job.
    * Modified job allocate, batch job submit and job "will-run" requests to
      pass a list of job specifications and get a list of responses.
Morris Jette's avatar
Morris Jette committed
    * Modify slurmctld daemon to process a heterogeneous job request and create
      multiple job records as needed.
    * Added new fields to job record: pack_job_id, pack_job_offset and
      pack_job_set (set of job IDs). Added to slurmctld state save/restore
      logic and job information reported.
    * Display new job fields in "scontrol show job" output.
    * Modify squeue command to display heterogeneous job records using "#+#"
      format. The squeue --job=# output lists all components of a heterogeneous
      job.
Morris Jette's avatar
Morris Jette committed
    * Modify scancel logic to cancel all components of a heterogeneous job with
      a single request/RPC.
    * Configuration parameter DebugFlags value of "HeteroJobs" added.
Loading
Loading full blame...