Skip to content
Snippets Groups Projects
NEWS 368 KiB
Newer Older
David Bigagli's avatar
David Bigagli committed
This file describes changes in recent versions of Slurm. It primarily
documents those changes that are of interest to users and administrators.

* Changes in Slurm 15.08.10
===========================
 -- Fix issue where if a slurmdbd rollup lasted longer than 1 hour the
    rollup would effectively never run again.
 -- Make error message in the pmi2 code to debug as the issue can be expected
    and retries are done making the error message a little misleading.
 -- Power/cray: Don't specify NID list to Cray APIs. If any of those nodes are
    not in a ready state, the API returned an error for ALL nodes rather than
    valid data for nodes in ready state.
 -- Fix potential divide by zero when tree_width=1.
 -- checkpoint/blcr plugin: Fix memory leak.
 -- If using PrologFlags=contain: Don't launch the extern step if a job is
    cancelled while launching.
* Changes in Slurm 15.08.9
==========================
 -- BurstBuffer/cray - Defer job cancellation or time limit while "pre-run"
    operation in progress to avoid inconsistent state due to multiple calls
    to job termination functions.
 -- Fix issue with resizing jobs and limits not be kept track of correctly.
 -- BGQ - Remove redeclaration of job_read_lock.
 -- BGQ - Tighter locks around structures when nodes/cables change state.
 -- Make it possible to change CPUsPerTask with scontrol.
 -- Make it so scontrol update part qos= will take away a partition QOS from
    a partition.
 -- Fix issue where SocketsPerBoard didn't translate to Sockets when CPUS=
    was also given.
 -- Add note to slurm.conf man page about setting "--cpu_bind=no" as part
    of SallocDefaultCommand if a TaskPlugin is in use.
 -- Set correct reason when a QOS' MaxTresMins is violated.
 -- Insure that a job is completely launched before trying to suspend it.
Tim Wickberg's avatar
Tim Wickberg committed
 -- Remove historical presentations and design notes. Only distribute
    maintained doc/html and doc/man directories.
 -- Remove duplicate xmalloc() in task/cgroup plugin.
 -- Backfill scheduler to validate correct job partition for job submitted to
    multiple partitions.
 -- Force close on exec on first 256 file descriptors when launching a
    slurmstepd to close potential open ones.
 -- Step GRES value changed from type "int" to "int64_t" to support larger
    values.
 -- Fix getting reservations to database when database is down.
 -- Fix issue with sbcast not doing a correct fanout.
 -- Fix issue where steps weren't always getting the gres/tres involved.
 -- Fixed double read lock on getting job's gres/tres.
 -- Fix display for RoutePlugin parameter to display the correct value.
 -- Fix route/topology plugin to prevent segfault in sbcast when in use.
 -- Fix Cray slurmconfgen_smw.py script to use nid as nid, not nic.
Morris Jette's avatar
Morris Jette committed
 -- Fix Cray NHC spawning on job requeue. Previous logic would leave nodes
    allocated to a requeued job as non-usable on job termination.
 -- burst_buffer/cray plugin: Prevent a requeued job from being restarted while
    file stage-out is still in progress. Previous logic could restart the job
    and not perform a new stage-in.
 -- Fix job array formatting to allow return [0-100:2] display for arrays with
    step functions rather than [0,2,4,6,8,...] .
 -- FreeBSD - replace Linux-specific set_oom_adj to avoid errors in slurmd log.
 -- Add option for TopologyParam=NoInAddrAnyCtld to make the slurmctld listen
    on only one port like TopologyParam=NoInAddrAny does for everything else.
 -- Fix burst buffer plugin to prevent corruption of the CPU TRES data when bb
    is not set as an AccountingStorageTRES type.
 -- Surpress error messages in acct_gather_energy/ipmi plugin after repeated
    failures.
 -- Change burst buffer use completion email message from
    "SLURM Job_id=1360353 Name=tmp Staged Out, StageOut time 00:01:47" to
    "SLURM Job_id=1360353 Name=tmp StageOut/Teardown time 00:01:47"
 -- Generate burst buffer use completion email immediately afer teardown
    completes rather than at job purge time (likely minutes later).
 -- Fix issue when adding a new TRES to AccountingStorageTRES for the first
    time.
 -- Update gang scheduling tables when job manually suspended or resumed. Prior
    logic could mess up job suspend/resume sequencing.
 -- Update gang scheduling data structures when job changes in size.
 -- Associations - prevent hash table corruption if uid initially unset for
    a user, which can cause slurmctld to crash if that user is deleted.
 -- Avoid possibly aborting srun on SIGSTOP while creating the job step due to
    threading bug.
 -- Fix deadlock issue with burst_buffer/cray when a newly created burst
    buffer is found.
 -- burst_buffer/cray: Set environment variables just before starting job rather
    than at job submission time to reflect persistent buffers created or
    modified while the job is pending.
 -- Fix check of per-user qos limits on the initial run by a user.
Morris Jette's avatar
Morris Jette committed
 -- Fix gang scheduling resource selection bug which could prevent multiple jobs
    from being allocated the same resources. Bug was introduced in 15.08.6.
 -- Don't print the Rgt value of an association from the cache as it isn't
    kept up to date.
 -- burst_buffer/cray - If the pre-run operation fails then don't issue
    duplicate job cancel/requeue unless the job is still in run state. Prevents
    jobs hung in COMPLETING state.
 -- task/cgroup - Fix bug in task binding to CPUs.
* Changes in Slurm 15.08.8
==========================
 -- Backfill scheduling properly synchronized with Cray Node Health Check.
    Prior logic could result in highest priority job getting improperly
    postponed.
 -- Make it so daemons also support TopologyParam=NoInAddrAny.
 -- If scancel is operating on large number of jobs and RPC responses from
    slurmctld daemon are slow then introduce a delay in sending the cancel job
    requests from scancel in order to reduce load on slurmctld.
 -- Remove redundant logic when updating a job's task count.
 -- MySQL - Fix querying jobs with reservations when the id's have rolled.
 -- Perl - Fix use of uninitialized variable in slurm_job_step_get_pids.
 -- Launch batch job requsting --reboot after the boot completes.
 -- Move debug messages like "not the right user" from association manager
    to debug3 when trying to find the correct association.
 -- Fix incorrect logic when querying assoc_mgr information.
 -- Move debug messages to debug3 notifying a gres_bit_alloc was NULL for
    gres types without a file.
 -- Sanity Check Patch to setup variables for RAPL if in a race for it.
 -- GRES - Fix minor typecast issues.
 -- burst_buffer/cray - Increase size of intermediate variable used to store
    buffer byte size read from DW instance from 32 to 64-bits to avoid overflow
    and reporting invalid buffer sizes.
 -- Allow an existing reservation with running jobs to be modified without
    Flags=IGNORE_JOBS.
 -- srun - don't attempt to execve() a directory with a name matching the
    requested command
 -- Do not automatically relocate an advanced reservation for individual cores
    that spans multiple nodes when nodes in that reservation go down (e.g.
    a 1 core reservation on node "tux1" will be moved if node "tux1" goes
    down, but a reservation containing 2 cores on node "tux1" and 3 cores on
    "tux2" will not be moved node "tux1" goes down). Advanced reservations for
    whole nodes will be moved by default for down nodes.
 -- Avoid possible double free of memory (and likely abort) for slurmctld in
    background mode.
 -- contribs/cray/csm/slurmconfgen_smw.py - avoid including repurposed compute
    nodes in configs.
 -- Support AuthInfo in slurmdbd.conf that is different from the value in
    slurm.conf.
 -- Fix build on FreeBSD 10.
 -- Fix hdf5 build on ppc64 by using correct fprintf formatting for types.
 -- Fix cosmetic printing of NO_VALs in scontrol show assoc_mgr.
 -- Fix perl api for newer perl versions.
 -- Fix for jobs requesting cpus-per-task (eg. -c3) that exceed the number of
    cpus on a core.
 -- Remove unneeded perl files from the .spec file.
 -- Flesh out filters for scontrol show assoc_mgr.
 -- Add function to remove assoc_mgr_info_request_t members without freeing
    structure.
Tim Wickberg's avatar
Tim Wickberg committed
 -- Fix build on some non-glibc systems by updating includes.
Morris Jette's avatar
Morris Jette committed
 -- Add new PowerParameters options of get_timeout and set_timeout. The default
    set_timeout was increased from 5 seconds to 30 seconds. Also re-read current
    power caps periodically or after any failed "set" operation.
 -- Fix slurmdbd segfault when listing users with blank user condition.
 -- Save the ClusterName to a file in SaveStateLocation, and use that to
    verify the state directory belongs to the given cluster at startup to avoid
    corruption from multiple clusters attempting to share a state directory.
 -- MYSQL - Fix issue when rerolling monthly data to work off correct time
    period.  This would only hit you if you rerolled a 15.08 prior to this
    commit.
 -- If FastSchedule=0 is used make sure TRES are set up correctly in accounting.
 -- Fix sreport's truncation of columns with large TRES and not using
    a parsing option.
 -- Make sure count of boards are restored when slurmctld has option -R.
 -- When determine if a job can fit into a TRES time limit after resources
    have been selected set the time limit appropriately if the job didn't
    request one.
 -- Fix inadequate locks when updating a partition's TRES.
 -- Add new assoc_limit_continue flag to SchedulerParameters.
 -- Avoid race in acct_gather_energy_cray if energy requested before available.
 -- MYSQL - Avoid having multiple default accounts when a user is added to
    a new account and making it a default all at once.
* Changes in Slurm 15.08.7
==========================
 -- sched/backfill: If a job can not be started within the configured
    backfill_window, set it's start time to 0 (unknown) rather than the end
    of the backfill_window.
 -- Remove the 1024-character limit on lines in batch scripts.
 -- burst_buffer/cray: Round up swap size by configured granularity.
 -- select/cray: Log repeated aeld reconnects.
 -- task/affinity: Disable core-level task binding if more CPUs required than
    available cores.
 -- Preemption/gang scheduling: If a job is suspended at slurmctld restart or
    reconfiguration time, then leave it suspended rather than resume+suspend.
 -- Don't use lower weight nodes for job allocation when topology/tree used.
 -- BGQ - If a cable goes into error state remove the under lying block on
    a dynamic system and mark the block in error on a static/overlap system.
 -- BGQ - Fix regression in 9cc4ae8add7f where blocks would be deleted on
    static/overlap systems when some hardware issue happens when restarting
    the slurmctld.
 -- Log if CLOUD node configured without a resume/suspend program or suspend
    time.
 -- MYSQL - Better locking around g_qos_count which was previously unprotected.
 -- Correct size of buffer used for jobid2str to avoid truncation.
 -- Fix allocation/distribution of tasks across multiple nodes when
    --hint=nomultithread is requested.
 -- If a reservation's nodes value is "all" then track the current nodes in the
    system, even if those nodes change.
 -- Fix formatting if using "tree" option with sreport.
 -- Make it so sreport prints out a line for non-existent TRES instead of
    error message.
Morris Jette's avatar
Morris Jette committed
 -- Set job's reason to "Priority" when higher priority job in that partition
    (or reservation) can not start rather than leaving the reason set to
    "Resources".
 -- Fix memory corruption when a new non-generic TRES is added to the
    DBD for the first time.  The corruption is only noticed at shutdown.
Loading
Loading full blame...