Skip to content
Snippets Groups Projects
Select Git revision
  • 2452cfaf56b98fdfc57aa28b6b64c0750bc4870e
  • slurm_metricq default protected
2 results

README

Blame
  • NEWS 349.95 KiB
    This file describes changes in recent versions of Slurm. It primarily
    documents those changes that are of interest to users and administrators.
    
    * Changes in Slurm 15.08.5
    ==========================
     -- Prevent "scontrol update job" from updating jobs that have already finished.
     -- Show requested TRES in "squeue -O tres" when job is pending.
     -- Backfill scheduler: Test association and QOS node limits before reserving
        resources for pending job.
     -- burst_buffer/cray: If teardown operations fails, sleep and retry.
     -- Clean up the external pids when using the PrologFlags=Contain feature
        and the job finishes.
     -- burst_buffer/cray: Support file staging when job lacks job-specific buffer
        (i.e. only persistent burst buffers).
     -- Added srun option of --bcast to copy executable file to compute nodes.
     -- Fix for advanced reservation of burst buffer space.
     -- BurstBuffer/cray: Add logic to terminate dw_wlm_cli child processes at
        shutdown.
     -- If job can't be launch or requeued, then terminate it.
     -- BurstBuffer/cray: Enable clearing of burst buffer string on completed job
        as a means of recovering from a failure mode.
     -- Fix wrong memory free when parsing SrunPortRange=0-0 configuration.
     -- BurstBuffer/cray: Fix job record purging if cancelled from pending state.
     -- BGQ - Handle database throw correctly when syncing users on blocks.
     -- MySQL - Make sure we don't have a NULL string returned when not
        requesting any specific association.
     -- sched/backfill: If max_rpc_cnt is configured and the backlog of RPCs has
        not cleared after yielding locks, then continue to sleep.
     -- Preserve the job dependency description displayed in 'scontrol show job'
        even if the dependee jobs was terminated and cleaned causing the
        dependent to never run because of DependencyNeverSatisfied.
     -- Correct job task count calculation if only node count and ntasks-per-node
        options supplied.
     -- Make sure the association manager converts any string to be lower case
        as all the associations from the database will be lower case.
     -- Sanity check for xcgroup_delete() to verify incoming parameter is valid.
     -- Fix formatting for sacct with variables that switched from uint32_t to
        uint64_t.
     -- Fix a typo in sacct man page.
     -- Set up extern step to track any childern of an ssh if it leaves anything
        else behind.
     -- Prevent slurmdbd divide by zero if no associations defined at rollup time.
     -- Multifactor - Add sanity check to make sure pending jobs are handled
        correctly when PriorityFlags=CALCULATE_RUNNING is set.
     -- Add slurmdb_find_tres_count_in_string() to slurm db perl api.
     -- Make lua dlopen() conditional on version found at build.
     -- sched/backfill - Delay backfill scheduler for completing jobs only if
        CompleteWait configuration parameter is set (make code match documentation).
     -- Release a job's allocated licenses only after epilog runs on all nodes
        rather than at start of termination process.
     -- Cray job NHC delayed until after burst buffer released and epilog completes
        on all allocated nodes.
     -- Fix abort of srun if using PrologFlags=NoHold
     -- Let devices step_extern cgroup inherit attributes of job cgroup.
     -- Add new hook to Task plugin to be able to put adopted processes in the
        step_extern cgroups.
     -- Fix AllowUsers documentation in burst_buffer.conf man page. Usernames are
        comma separated, not colon delimited.
     -- Fix issue with time limit not being set correctly from a QOS when a job
        requests no time limit.
     -- Various CLANG fixes.
    
    * Changes in Slurm 15.08.4
    ==========================
     -- Fix typo for the "devices" cgroup subsystem in pam_slurm_adopt.c
     -- Fix TRES_MAX flag to work correctly.
     -- Improve the systemd startup files.
     -- Added burst_buffer.conf flag parameter of "TeardownFailure" which will
        teardown and remove a burst buffer after failed stage-in or stage-out.
        By default, the buffer will be preserved for analysis and manual teardown.