Newer
Older
This file describes changes in recent versions of Slurm. It primarily
documents those changes that are of interest to users and administrators.
* Changes in Slurm 15.08.0rc2
==============================
-- Fix issue with frontend systems (outside ALPs or BlueGene) where srun
wouldn't get the correct protocol version to launch a step.
-- Fix for message aggregation return rpcs where none of the messages are
intended for the head of the tree.
-- Fix segfault in sreport when there was no response from the dbd.
-- ALPS - Fix compile to not link against -ljob and -lexpat with every lib
or binary.
-- Fix testing for CR_Memory when CR_Memory and CR_ONE_TASK_PER_CORE are used
with select/linear.
-- When restarting or reconfiging the slurmctld, if job is completing handle
accounting correctly to avoid meaningless errors about overflow.
-- Add AccountingStorageTRES to scontrol show config
-- MySQL - Fix minor memory leak if a connection ever goes away whist using it.
-- Make complete_batch_script RPC work with message aggregation.
-- Do not count slurmctld threads waiting in a "throttle" lock against the
daemon's thread limit as they are not contending for resources.
-- Modify slurmctld outgoing RPC logic to support more parallel tasks (up to
85 RPCs and 256 pthreads; the old logic supported up to 21 RPCs and 256
threads). This change can dramatically improve performance for RPCs
operating on small node counts.
-- Increase total backfill scheduler run time in stats_info_response_msg data
structure from 32 to 64 bits in order to prevent overflow.
-- Add NoInAddrAny option to TopologyParam in the slurm.conf which allows to
bind to the interface of return of gethostname instead of any address on
the node which avoid RSIP issues in Cray systems. This is most likely
useful in other systems as well.
-- Fix memory leak in Slurm::load_jobs perl api call.
-- Added --noconvert sacct option which allows values to be displayed in their
original unit types.
-- Fix spelling of node_rescrs to node_resrcs in Perl API.
-- Fix node state race condition, UNKNOWN->IDLE without configuration info.
-- Cray: Disable LDAP references from slurmstepd on job launch due for
improved scalability.
-- Remove srun "read header error" due to application termination race
condition.
-- Optimize sacct queries with additional db indexes.
-- Add SLURM_TOPO_LEN env variable for scontrol show topology.
-- Add free_mem to node information.
-- Fix abort of batch launch if prolog is running, wait for prolog instead.

Brian Christiansen
committed
-- Fix case where job would get the wrong cpu count when using
--ntasks-per-core and --cpus-per-task together.
-- Add TRESBillingWeights to partitions in slurm.conf which allows taking into
consideration any TRES Type when calculating the usage of a job.
-- Add PriorityWeightTRES slurm.conf option to be able to configure priority
factors for TRES types.
* Changes in Slurm 15.08.0pre6
==============================
-- Add scontrol options to view and modify layouts tables.
-- Add MsgAggregationParams which controls a reverse tree to the slurmctld
which can be used to aggregate messages to the slurmctld into a single
message to reduce communication to the slurmctld. Currently only epilog
complete messages and node registration messages use this logic.
-- Add sacct and squeue options to print trackable resources.
-- Add sacctmgr option to display trackable resources.
-- If an salloc or srun command is executed on a "front-end" configuration,
that job will be assigned a slurmd shepherd daemon on the same host as used
to execute the command when possible rather than an slurmd daemon on an
arbitrary front-end node.
-- Add srun --accel-bind option to control how tasks are bound to GPUs and NIC
Generic RESources (GRES).
-- gres/nic plugin modified to set OMPI_MCA_btl_openib_if_include environment
variable based upon allocated devices (usable with OpenMPI and Melanox).
-- Make it so info options for srun/salloc/sbatch print with just 1 -v instead
of 4.
-- Add "no_backup_scheduling" SchedulerParameter to prevent jobs from being
scheduled when the backup takes over. Jobs can be submitted, modified and
cancelled while the backup is in control.
-- Enable native Slurm backup controller to reside on an external Cray node
when the "no_backup_scheduling" SchedulerParameter is used.
-- Removed TICKET_BASED fairshare. Consider using the FAIR_TREE algorithm.
-- Disable advanced reservation "REPLACE" option on IBM Bluegene systems.
-- Add support for control distribution of tasks across cores (in addition
to existing support for nodes and sockets, (e.g. "block", "cyclic" or
"fcyclic" task distribution at 3 levels in the hardware rather than 2).
-- Create db index on <cluster>_assoc_table.acct. Deleting accounts that didn't
have jobs in the job table could take a long time.
-- The performance of Profiling with HDF5 is improved. In addition, internal
structures are changed to make it easier to add new profile types,
particularly energy sensors. sh5util will continue to work with either
format.
-- Add partition information to sshare output if the --partition option
is specified on the sshare command line.
-- Add sreport -T/--tres option to identify Trackable RESources (TRES) to
report.
-- Display job in sacct when single step's cpus are different from the job
allocation.
-- Add association usage information to "scontrol show cache" command output.
-- MPI/MVAPICH plugin now requires Munge for authentication.
-- job_submit/lua: Add default_qos fields. Add job record qos. Add partition
record allow_qos and qos_char fields.
* Changes in Slurm 15.08.0pre5
==============================
-- Add jobcomp/elasticsearch plugin. Libcurl is required for build. Configure
the server as follows: "JobCompLoc=http://YOUR_ELASTICSEARCH_SERVER:9200".
-- Scancel logic large re-written to better support job arrays.
-- Added a slurm.conf parameter PrologEpilogTimeout to control how long
-- Added TRES (Trackable resources) to track Mem, GRES, license, etc
utilization.
-- Add re-entrant versions of glibc time functions (e.g. localtime) to Slurm
in order to eliminate rare deadlock of slurmstepd fork and exec calls.
-- Constrain kernel memory (if available) in cgroups.
-- Add PrologFlags option of "Contain" to create a proctrack container at
job resource allocation time.
-- Disable the OOM Killer in slurmd and slurmstepd's memory cgroup when using
MemSpecLimit.
* Changes in Slurm 15.08.0pre4
==============================
-- Burst_buffer/cray - Convert logic to use new commands/API names (e.g.
"dws_setup" rather than "bbs_setup").
-- Remove the MinJobAge size limitation. It can now exceed 65533 as it
is represented using an unsigned integer.
-- Verify that all plugin version numbers are identical to the component
attempting to load them. Without this verification, the plugin can reference
Slurm functions in the caller which differ (e.g. the underlying function's
arguments could have changed between Slurm versions).
NOTE: All plugins (except SPANK) must be built against the identical
version of Slurm in order to be used by any Slurm command or daemon. This
should eliminate some very difficult to diagnose problems due to use of old
plugins.
-- Increase the MAX_PACK_MEM_LEN define to avoid PMI2 failure when fencing
with large amount of ranks (to 1GB).
-- Requests by normal user to reset a job priority (even to lower it) will
result in an error saying to change the job's nice value instead.
-- SPANK naming changes: For environment variables set using the
spank_job_control_setenv() function, the values were available in the
slurm_spank_job_prolog() and slurm_spank_job_epilog() functions using
getenv where the name was given a prefix of "SPANK_". That prefix has
been removed for consistency with the environment variables available in
the Prolog and Epilog scripts.
-- Add "TopologyParam" configuration parameter. Optional value of "dragonfly"
is supported.
-- Optimize resource allocation for systems with dragonfly networks.
-- Add "--thread-spec" option to salloc, sbatch and srun commands. This is
the count of threads reserved for system use per node.
-- job_submit/lua: Enable reading and writing job environment variables.
For example: if (job_desc.environment.LANGUAGE == "en_US") then ...
-- Added two new APIs slurm_job_cpus_allocated_str_on_node_id()
and slurm_job_cpus_allocated_str_on_node() to print the CPUs id
allocated to a job.
-- Specialized memory (a node's MemSpecLimit configuration parameter) is not
available for allocation to jobs.
-- Modify scontrol update job to allow jobid specification without
the = sign. 'scontrol update job=123 ...' and 'scontrol update job 123 ...'
are both valid syntax.
-- Archive a month at a time when there are lots of records to archive.
-- Introduce new sbatch option '--kill-on-invalid-dep=yes|no' which allows
users to specify which behavior they want if a job dependency is not
satisfied.
-- Add Slurmdb::qos_get() interface to perl api.
-- If a job fails to start set the requeue reason to be:
job requeued in held state.
-- Implemented a new MPI key,value PMIX_RING() exchange algorithm as
an alternative to PMI2.

Brian Christiansen
committed
-- Remove possible deadlocks in the slurmctld when the slurmdbd is busy
archiving/purging.
-- Add DB_ARCHIVE debug flag for filtering out debug messages in the slurmdbd
when the slurmdbd is archiving/purging.
-- Fix some power_save mode issues: Parsing of SuspendTime in slurm.conf was
bad, powered down nodes would get set non-responding if there was an
in-flight message, and permit nodes to be powered down from any state.
-- Initialize variables in consumable resource plugin to prevent core dump.
* Changes in Slurm 15.08.0pre3
==============================
-- CRAY - addition of acct_gather_energy/cray plugin.
-- Add job credential to "Run Prolog" RPC used with a configuration of
PrologFlags=alloc. This allows the Prolog to be passed identification of
GPUs allocated to the job.
-- Add SLURM_JOB_CONSTAINTS to environment variables available to the Prolog.
-- Added "--mail=stage_out" option to job submission commands to notify user
when burst buffer state out is complete.
-- Require a "Reason" when using scontrol to set a node state to DOWN.
-- Mail notifications on job BEGIN, END and FAIL now apply to a job array as a
whole rather than generating individual email messages for each task in the
job array.
-- task/affinity - Fix memory binding to NUMA with cpusets.

Brian Christiansen
committed
-- Display job's estimated NodeCount based off of partition's configured
resources rather than the whole system's.
-- Add AuthInfo option of "cred_expire=#" to specify the lifetime of a job
step credential. The default value was changed from 1200 to 120 seconds.
-- Set the delay time for job requeue to the job credential lifetime (120
seconds by default). This insures that prolog runs on every node when a
job is requeued. (This change will slow down launch of re-queued jobs).
-- Add AuthInfo option of "cred_expire=#" to specify the lifetime of a job
step credential.
-- Remove srun --max-launch-time option. The option has not been functional
Loading
Loading full blame...