Newer
Older
This file describes changes in recent versions of SLURM. It primarily
documents those changes that are of interest to users and admins.
* Changes in SLURM 2.4.0.pre1
=============================
-- BGQ - use the ba_geo_tables to figure out the blocks instead of the old
algorithm. The improves timing in the worst cases and simplifies the code
greatly.
-- BLUEGENE - Change to output tools labels from BP to Midplane
(i.e. BP List -> MidplaneList).
========================
-- BLUEGENE - make sure we only set the jobinfo_select start_loc on a job
when we are on a small block, not a regular one.
-- BGQ - fix issue where not copying the correct amount of memory.
-- BLUEGENE - fix clean start if jobs were running when the slurmctld was
shutdown and then the system size changed. This would probably only happen
if you were emulating a system.
-- Fix sview for calling a cray system from a non-cray system to get the
correct geometry of the system.
-- BLUEGENE - fix to correctly import pervious version of block state file.
-- BLUEGENE - handle loading better when doing a clean start with static
blocks.
-- Add sinfo format and sort option "%n" for NodeHostName and "%o" for
NodeAddr.
-- If a job is deferred due to partition limits, then re-test those limits
after a partition is modified. Patch from Don Lipari.
-- Fix bug which would crash slurmcld if job's owner (not root) tries to clear
a job's licenses by setting value to "".
-- Cosmetic fix for printing out debug info in the priority plugin.
-- In sview when switching from a bluegene machine to a regular linux cluster
and vice versa the node->base partition lists will be displayed if setup
in your .slurm/sviewrc file.
-- BLUEGENE - Fix for creating full system static block on a BGQ system.
-- BLUEGENE - Fix deadlock issue if toggling between Dynamic and Static block
allocation with jobs running on blocks that don't exist in the static
setup.
-- BLUEGENE - Modify code to only give HTC states to BGP systems and not
allow them on Q systems.
-- BLUEGENE - Make it possible for an admin to define multiple dimension
conn_types in a block definition.
-- BGQ - Alter tools to output multiple dimensional conn_type.
-- With sched/wiki or sched/wiki2 (Maui or Moab scheduler), insure that a
requeued job's priority is reset to zero.
-- BLUEGENE - fix to run steps correctly in a BGL/P emulated system.
-- Fixed issue where if there was a network issue between the slurmctld and
the DBD where both remained up but were disconnected the slurmctld would
get registered again with the DBD.
-- Fixed issue where if the DBD connection from the ctld goes away because of
a POLLERR the dbd_fail callback is called.
-- BLUEGENE - Fix to smap command-line mode display.
-- Change in GRES behavior for job steps: A job step's default generic
resource allocation will be set to that of the job. If a job step's --gres
value is set to "none" then none of the generic resources which have been
allocated to the job will be allocated to the job step.
-- Add srun environment value of SLURM_STEP_GRES to set default --gres value
for a job step.
-- Require SchedulerTimeSlice configuration parameter to be at least 5 seconds
to avoid thrashing slurmd daemon.
-- Cray - Fix to make nodes state in accounting consistent with state set by
ALPS.
-- Cray - A node DOWN to ALPS will be marked DOWN to SLURM only after reaching
SlurmdTimeout. In the interim, the node state will be NO_RESPOND. This
change makes behavior makes SLURM handling of the node DOWN state more
consistent with ALPS. This change effects only Cray systems.
-- Cray - Fix to work with 4.0.* instead of just 4.0.0
-- Cray - Modify srun/aprun wrapper to map --exclusive to -F exclusive and
--share to -F share. Note this does not consider the partition's Shared
configuration, so it is an imperfect mapping of options.
-- BLUEGENE - Added notice in the print config to tell if you are emulated
or not.
-- BLUEGENE - Fix job step scalability issue with large task count.
-- BGQ - Improved c-node selection when asked for a sub-block job that
cannot fit into the available shape.
-- BLUEGENE - Modify "scontrol show step" to show I/O nodes (BGL and BGP) or
c-nodes (BGQ) allocated to each step. Change field name from "Nodes=" to
"BP_List=".
-- Code cleanup on step request to get the correct select_jobinfo.
-- Memory leak fixed for rolling up accounting with down clusters.
-- BGQ - fix issue where if first job step is the entire block and then the
next parallel step is ran on a sub block, SLURM won't over subscribe cnodes.
-- Treat duplicate switch name in topology.conf as fatal error. Patch from Rod
Schultz, Bull
-- Minor update to documentation describing the AllowGroups option for a
partition in the slurm.conf.
-- Fix problem with _job_create() when not using qos's. It makes
_job_create() consistent with similar logic in select_nodes().
-- Fix for squeue -t "CONFIGURING" to actually work.
-- CRAY - Add cray.conf parameter of SyncTimeout, maximum time to defer job
scheduling if SLURM node or job state are out of synchronization with ALPS.
-- If salloc was run as interactive, with job control, reset the foreground
process group of the terminal to the process group of the parent pid before
exiting. Patch from Don Albert, Bull.
-- BGQ - set up the corner of a sub block correctly based on a relative
position in the block instead of absolute.
-- BGQ - make sure the recently added select_jobinfo of a step launch request
isn't sent to the slurmd where environment variables would be overwritten
incorrectly.
-- NOTE THERE HAVE BEEN NEW FIELDS ADDED TO THE JOB AND PARTITION STATE SAVE
FILES AND RPCS. PENDING AND RUNNING JOBS WILL BE LOST WHEN UPGRADING FROM
EARLIER VERSION 2.3 PRE-RELEASES AND RPCS WILL NOT WORK WITH EARLIER
VERSIONS.
-- select/cray: Add support for Accelerator information including model and
memory options.
-- Cray systems: Add support to suspend/resume salloc command to insure that
aprun does not get initiated when the job is suspended. Processes suspended
and resumed are determined by using process group ID and parent process ID,
so some processes may be missed. Since salloc runs as a normal user, it's
ability to identify processes associated with a job is limited.
-- Cray systems: Modify smap and sview to display all nodes even if multiple
nodes exist at each coordinate.
-- Improve efficiency of select/linear plugin with topology/tree plugin
configured, Patch by Andriy Grytsenko (Massive Solutions Limited).
-- For front-end architectures on which job steps are run (emulated Cray and
BlueGene systems only), fix bug that would free memory still in use.
-- Add squeue support to display a job's license information. Patch by Andy
Roosen (University of Deleware).
-- Add flag to the select APIs for job suspend/resume indicating if the action
is for gang scheduling or an explicit job suspend/resume by the user. Only
an explicit job suspend/resume will reset the job's priority and make
resources exclusively held by the job available to other jobs.
-- Fix possible invalid memory reference in sched/backfill. Patch by Andriy
Grytsenko (Massive Solutions Limited).
-- Add select_jobinfo to the task launch RPC. Based upon patch by Andriy
Grytsenko (Massive Solutions Limited).
-- Add DefMemPerCPU/Node and MaxMemPerCPU/Node to partition configuration.
This improves flexibility when gang scheduling only specific partitions.
-- Added new enums to print out when a job is held by a QOS instead of an
association limit.
-- Enhancements to sched/backfill performance with select/cons_res plugin.
Patch from Bjørn-Helge Mevik, University of Oslo.
-- Correct job run time reported by smap for suspended jobs.
-- Improve job preemption logic to avoid preempting more jobs than needed.
-- Add contribs/arrayrun tool providing support for job arrays. Contributed by
Bjørn-Helge Mevik, University of Oslo. NOTE: Not currently packaged as RPM
and manual file editing is required.
-- When suspending a job, wait 2 seconds instead of 1 second between sending
SIGTSTP and SIGSTOP. Some MPI implementation were not stopping within the
1 second delay.
-- Add support for managing devices based upon Linux cgroup container. Based
upon patch by Yiannis Georgiou, Bull.
-- Fix memory buffering bug if a AllowGroups parameter of a partition has 100
or more users. Patch by Andriy Grytsenko (Massive Solutions Limited).
-- Fix bug in generic resource tracking of gres associated with specific CPUs.
Resources were being over-allocated.
-- On systems with front-end nodes (IBM BlueGene and Cray) limit batch jobs to
only one CPU of these shared resources.
-- Set SLURM_MEM_PER_CPU or SLURM_MEM_PER_NODE environment variables for both
interactive (salloc) and batch jobs if the job has a memory limit. For Cray
systems also set CRAY_AUTO_APRUN_OPTIONS environment variable with the
memory limit.
-- Fix bug in select/cons_res task distribution logic when tasks-per-node=0.
Patch from Rod Schultz, Bull.
-- Restore node configuration information (CPUs, memory, etc.) for powered
down when slurmctld daemon restarts rather than waiting for the node to be
restored to service and getting the information from the node (NOTE: Only
relevent if FastSchedule=0).
-- For Cray systems with the srun2aprun wrapper, rebuild the srun man page
identifying the srun optioins which are valid on that system.
-- BlueGene: Permit users to specify a separate connection type for each
dimension (e.g. "--conn-type=torus,mesh,torus").
-- Add the ability for a user to limit the number of leaf switches in a job's
allocation using the --switch option of salloc, sbatch and srun. There is
also a new SchedulerParameters value of max_switch_wait, which a SLURM
administrator can used to set a maximum job delay and prevent a user job
from blocking lower priority jobs for too long. Based on work by Rod
Schultz, Bull.
* Changes in SLURM 2.3.0.pre6
=============================
-- NOTE: THERE HAS BEEN A NEW FIELD ADDED TO THE CONFIGURATION RESPONSE RPC
AS SHOWN BY "SCONTROL SHOW CONFIG". THIS FUNCTION WILL ONLY WORK WHEN THE
SERVER AND CLIENT ARE BOTH RUNNING SLURM VERSION 2.3.0.pre6
-- Modify job expansion logic to support licenses, generic resources, and
currently running job steps.
-- Added an rpath if using the --with-munge option of configure.
-- Add support for multiple sets of DEFAULT node, partition, and frontend
specifications in slurm.conf so that default values can be changed mulitple
times as the configuration file is read.

Danny Auble
committed
-- BLUEGENE - Improved logic to place small blocks in free space before freeing
larger blocks.
-- Add optional argument to srun's --kill-on-bad-exit so that user can set
its value to zero and override a SLURM configuration parameter of
KillOnBadExit.
-- Fix bug in GraceTime support for preempted jobs that prevented proper
operation when more than one job was being preempted. Based on patch from
Bill Brophy, Bull.
-- Fix for running sview from a non-bluegene cluster to a bluegene cluster.
Regression from pre5.
-- If job's TMPDIR environment is not set or is not usable, reset to "/tmp".
Patch from Andriy Grytsenko (Massive Solutions Limited).
-- Remove logic for defunct RPC: DBD_GET_JOBS.
Loading
Loading full blame...