Newer
Older
This file describes changes in recent versions of Slurm. It primarily
documents those changes that are of interest to users and administrators.
* Changes in Slurm 14.11.4
==========================
* Changes in Slurm 14.11.3
==========================
-- Prevent vestigial job record when cancelling a pending job array record.
-- Fix job array hash table bug, could result in slurmctld infinite loop or
invalid memory reference.
-- In srun honor ntasks_per_node before looking at cpu count when the user
doesn't request a number of tasks.
-- Fix ghost job when submitting job after all jobids are exhausted.
-- MySQL - Enhanced coordinator security checks.
-- Fix for task/affinity if an admin configures a node for having threads
but then sets CPUs to only represent the number of cores on the node.
-- Make it so previous versions of salloc/srun work with newer versions
of Slurm daemons.
-- Avoid delay on commit for PMI rank 0 to improve performance with some
MPI implementations.
-- auth/munge - Correct logic to read old format AccountingStoragePass.
-- Reset node "RESERVED" state as appropriate when deleting a maintenance
reservation.
-- Prevent a job manually suspended from being resumed by gang scheduler once
free resources are available.
-- Prevent invalid job array task ID value if a task is started using gang
scheduling.
-- Fix documentation bugs in slurm.conf.5. DenyAccount should be DenyAccounts.
-- For backward compatibility with older versions of OMPI not compiled
with --with-pmi restore the SLURM_STEP_RESV_PORTS in the job environment.
-- Update the html documentation describing the integration with openmpi.
-- Fix sacct when searching by nodelist.
-- Fix cosmetic info statements when dealing with a job array task instead of
a normal job.
-- BGQ - Put print statement under a DebugFlag. This was just an oversight.
-- BLUEGENE - Remove check that would erroneously remove the CONFIGURING
flag from a job while the job is waiting for a block to boot.
-- Fix segfault in slurmstepd when job exceeded memory limit.
-- Fix race condition that could start a job that is dependent upon a job array
before all tasks of that job array complete.
* Changes in Slurm 14.11.2
==========================
-- Fix issue with association hash not getting the correct index which
could result in seg fault.
-- Avoid huge malloc if GRES configured with "Type" and huge "Count".
-- Fix jobs from starting in overlapping reservations that won't finish before
a "maint" reservation begins.
-- When node gets drained while in state mixed display its status as draining
in sinfo output.
-- Allow priority/multifactor to work with sched/wiki(2) if all priorities
have no weight. This allows for association and QOS decay limits to work.
-- Fix "squeue --start" to override SQUEUE_FORMAT env variable.

Brian Christiansen
committed
-- Fix scancel to be able to cancel multiple jobs that are space delimited.
-- Log Cray MPI job calling exit() without mpi_fini(), but do not treat it as
a fatal error. This partially reverts logic added in version 14.03.9.
-- sview - Fix displaying of suspended steps elapsed times.
-- Increase number of messages that get cached before throwing them away
when the DBD is down.
-- Fix jobs from starting in overlapping reservations that won't finish before
a "maint" reservation begins.
-- Restore GRES functionality with select/linear plugin. It was broken in
version 14.03.10.
-- Fix bug with GRES having multiple types that can cause slurmctld abort.

Brian Christiansen
committed
-- Fix squeue issue with not recognizing "localhost" in --nodelist option.
-- Make sure the bitstrings for a partitions Allow/DenyQOS are up to date
when running from cache.
-- Add smap support for job arrays and larger job ID values.
-- Fix possible race condition when attempting to use QOS on a system running
accounting_storage/filetxt.
-- Fix issue with accounting_storage/filetxt and job arrays not being printed
correctly.
-- In proctrack/linuxproc and proctrack/pgid, check the result of strtol()
for error condition rather than errno, which might have a vestigial error
code.
-- Improve information recording for jobs deferred due to advanced
reservation.
-- Exports eio_new_initial_obj to the plugins and initialize kvs_seq on
mpi/pmi2 setup to support launching.
* Changes in Slurm 14.11.1
==========================
-- Get libs correct when doing the xtree/xhash make check.
-- Update xhash/tree make check to work correctly with current code.
-- Remove the reference 'experimental' for the jobacct_gather/cgroup
plugin.
-- Add QOS manipulation examples to the qos.html documentation page.
-- If 'squeue -w node_name' specifies an unknown host name print
an error message and return 1.
-- Fix race condition in job_submit plugin logic that could cause slurmctld to
deadlock.
-- Job wait reason of "ReqNodeNotAvail" expanded to identify unavailable nodes
(e.g. "ReqNodeNotAvail(Unavailable:tux[3-6])").
* Changes in Slurm 14.11.0
==========================
-- ALPS - Fix issue with core_spec warning.
-- Allow multiple partitions to be specified in sinfo -p.
-- Install the service files in /usr/lib/systemd/system.
-- MYSQL - Add id_array_job and id_resv keys to $CLUSTER_job_table. THIS
COULD TAKE A WHILE TO CREATE THE KEYS SO BE PATIENT.
-- CRAY - Resize bitmaps on a restart and find we have more blades
than before.
-- Add new eio API function for removing unused connections.
-- ALPS - Fix issue where batch allocations weren't correctly confirmed or
released.
-- Define DEFAULT_MAX_TASKS_PER_NODE based on MAX_TASKS_PER_NODE from
slurm.h as per documentation.
-- Update the FAQ about relocating slurmctld.
-- In the memory cgroup enable memory.use_hierarchy in the cgroup root.
-- Add SLURM_CLUSTER_NAME to job environment.
* Changes in Slurm 14.11.0rc3
=============================
-- Allow envs to override autotools binaries in autogen.sh
-- If the jobs pends with DependencyNeverSatisfied keep it pending even after
the job which it was depending upon was cleaned.
-- Let operators (in addition to user root and SlurmUser) see job script for
other user's jobs.
-- Perl API modified to return node state of MIXED rather than ALLOCATED if
only some CPUs allocated.
-- Double Munge connect retry timeout from 1 to 2 seconds.
-- sview - Remove unneeded code that was resolved globally in commit
98e24b0dedc.
-- Collect and report the accounting of the batch step and its children.
-- Add configure checks for faccessat and eaccess, and make use of one of
them if available.
-- Make configure --enable-developer also set --enable-debug
-- Introduce a SchedulerParameters variable kill_invalid_depend, if set
then jobs pending with invalid dependency are going to be terminated.
-- Move spank_user_task() call in slurmstepd after the task_g_pre_launch()
so that the task affinity information is available to spank.
-- Make /etc/init.d/slurm script return value 3 when the daemon is
not running. This is required by Linux Standard Base Core
Specification 3.1
* Changes in Slurm 14.11.0rc2
=============================
-- Logs for jobs which are explicitly requeued will say so rather than saying
that a node in their allocation failed.
-- Updated the documentation about the remote licenses served by
the Slurm database.
-- Insure that slurm_spank_exit() is only called once from srun.
-- Change the signature of net_set_low_water() to use 4 bytes instead of 8.
-- Export working_cluster_rec in libslurmdb.so as well as move some function
definitions needed for drmaa.
-- If using cons_res or serial cause a fatal in the plugin instead of causing
the SelectTypeParameters to magically set to CR_CPU.
-- Enhance task/affinity auto binding to consider tasks * cpus-per-task.
-- Fix regression the priority/multifactor which would cause memory corruption.
Issue is only in rc1.
-- Add PrivateData value of "cloud". If set, powered down nodes in the cloud
will be visible.
-- Sched/backfill - Eliminate clearing start_time of running jobs.
-- Fix various backwards compatibility issues.
-- If failed to launch a batch job, requeue it in hold.
* Changes in Slurm 14.11.0rc1
=============================
-- When using cgroup name the batch step as step_batch instead of
batch_4294967294
-- Changed LEVEL_BASED priority to be "Fair_Tree"
-- BGQ - Add cnode based reservations.
-- Alongside totalview_jobid implement totalview_stepid available
to sattach.
-- Add ability to include other files in slurm.conf based upon the ClusterName.
-- Add reservation information in the sacct and sreport output.
-- Add job priority calculation check for overflow and fix memory leak.
-- Add SchedulerParameters option of pack_serial_at_end to put serial jobs at
the end of the available nodes rather than using a best fit algorithm.
-- Allow regular users to view default sinfo output when
privatedata=reservations is set.
-- PrivateData=reservation modified to permit users to view the reservations
which they have access to (rather then preventing them from seeing ANY
reservation).
-- job_submit/lua: Fix job_desc set field logic
* Changes in Slurm 14.11.0pre5
==============================
-- Fix sbatch --export=ALL, it was treated by srun as a request to explicitly
export only the environment variable named "ALL".
-- Improve scheduling of jobs in reservations that overlap other reservations.
-- Modify sgather to make global file systems easier to configure.
-- Added sacctmgr reconfig to reread the slurmdbd.conf in the slurmdbd.
-- Modify scontrol job operations to accept comma delimited list of job IDs.
Applies to job update, hold, release, suspend, resume, requeue, and
requeuehold operations.
-- Refactor job_submit/lua interface. LUA FUNCTIONS NEED TO CHANGE! The
Loading
Loading full blame...