Newer
Older
This file describes changes in recent versions of SLURM. It primarily
documents those changes that are of interest to users and admins.
* Changes in SLURM 1.4.0-pre7
=============================
-- Bug fix for preemption with select/cons_res when there are no idle nodes.
-- Bug fix for use of srun options --exclusive and --cpus-per-task together
for job step resource allocation (tracking of cpus in use was bad).
-- Added the srun option --preserve-env to pass the current values of
environment variables SLURM_NNODES and SLURM_NPROCS through to the
executable, rather than computing them from commandline parameters.
-- For select/cons_res or sched/gang only: Validate a job's resource
allocation socket and core count on each allocated node. If the node's
configuration has been changed, then abort the job.
-- For select/cons_res or sched/gang only: Disable updating a node's
processor count if FastSchedule=0. Administrators must set a valid
processor count although the memory and disk space configuration can
be loaded from the compute node when it starts.
-- Add configure option "--disable-iso8601" to disable SLURM use of ISO 8601
time format at the time of SLURM build. Default output for all commands
is now ISO 8601 (yyyy-mm-ddThh:mm:ss).
-- Add support for scontrol to explicity power a node up or down using the
configured SuspendProg and ResumeProg programs.

Moe Jette
committed
-- Add select/3d_torus plugin which will support Sun Constellation and
eventually Cray architectures.
* Changes in SLURM 1.4.0-pre6
=============================
-- Fix job preemption when sched/gang and select/linear are configured with
non-sharing partitions.
-- In select/cons_res insure that required nodes have available resources.
* Changes in SLURM 1.4.0-pre5
=============================
-- Correction in setting of SLURM_CPU_BIND environment variable.
-- Rebuild slurmctld's job select_jobinfo->node_bitmap on restart/reconfigure
of the daemon rather than restoring the bitmap since the nodes in a system
can change (be added or removed).
-- Add configuration option "--with-cpusetdir=PATH" for non-standard
locations.
-- Get new multi-core data structures working on BlueGene systems.
-- Modify PMI_Get_clique_ranks() to return an array of integers rather
than a char * to satisfy PMI standard. Correct logic in
PMI_Get_clique_size() for when srun --overcommit option is used.
-- Fix bug in select/cons_res, allocated a job all of the processors on a
node when the --exclusive option is specified as a job submit option.
-- Add NUMA cpu_bind support to the task affinity plugin. Binds tasks to
a set of CPUs that belong NUMA locality domain with the appropriate
--cpu-bind option (ldoms, rank_ldom, map_ldom, and mask_ldom), see
"man srun" for more information.
* Changes in SLURM 1.4.0-pre4
=============================
-- For task/affinity, force jobs to use a particular task binding by setting
the TaskPluginParam configuration parameter rather than slurmd's
SLURM_ENFORCED_CPU_BIND environment variable.
-- Enable full preemption of jobs by partition with select/cons_res
(cons_res_preempt.patch from Chris Holmes, HP).
-- Add configuration parameter DebugFlags to provide detailed logging for
specific subsystems (steps and triggers so far).
-- srun's --no-kill option is passed to slurmctld so that a job step is
killed even if the node where srun executes goes down (unless the
--no-kill option is used, previous termination logic would fail if
srun was not responding).
-- Transfer a job step's core bitmap from the slurmctld to the slurmd
within the job step credential.
-- Add cpu_bind, cpu_bind_type, mem_bind and mem_bind_type to job allocation
request and job_details structure in slurmctld. Add support to --cpu_bind
and --mem_bind options from salloc and sbatch commands.

Moe Jette
committed
* Changes in SLURM 1.4.0-pre3
=============================
-- Internal changes: CPUs per node changed from 32-bit to 16-bit size.
Node count fields changed from 16-bit to 32-bit size in some structures.
-- Remove select plugin functions select_p_get_extra_jobinfo(),
select_p_step_begin() and select_p_step_fini().
-- Remove the following slurmctld job structure fields: num_cpu_groups,
cpus_per_node, cpu_count_reps, alloc_lps_cnt, alloc_lps, and used_lps.
Use equivalent fields in new "select_job" structure, which is filled
in by the select plugins.
-- Modify mem_per_task in job step request from 16-bit to 32-bit size.
Use new "select_job" structure for the job step's memory management.
-- Add core_bitmap_job to slurmctld's job step structure to identify

Moe Jette
committed
-- Add new configuration option OverTimeLimit to permit jobs to exceed
their (soft) time limit by a configurable amount. Backfill scheduling
will be based upon the soft time limit.

Moe Jette
committed
-- Remove select_g_get_job_cores(). That data is now within the slurmctld's
job structure.
* Changes in SLURM 1.4.0-pre2
=============================
-- Remove srun's --ctrl-comm-ifhn-addr option (for PMI/MPICH2). It is no
longer needed.
-- Modify power save mode so that nodes can be powered off when idle. See
https://computing.llnl.gov/linux/slurm/power_save.html or
"man slurm.conf" (SuspendProgram and related parameters) for more
information.
-- Added configuration parameter PrologSlurmctld, which can be used to boot
nodes into a particular state for each job. See "man slurm.conf" for
details.
-- Add configuration parameter CompleteTime to control how long to wait for
a job's completion before allocating already released resources to pending
jobs. This can be used to reduce fragmentation of resources. See
"man slurm.conf" for details.
-- Make default CryptoType=crypto/munge. OpenSSL is now completely optional.
-- Make default AuthType=auth/munge rather than auth/none.
-- Change output format of "sinfo -R" from "%35R %N" to "%50R %N".
* Changes in SLURM 1.4.0-pre1
=============================
-- Save/restore a job's task_distribution option on slurmctld retart.
NOTE: SLURM must be cold-started on converstion from version 1.3.x.
-- Remove task_mem from job step credential (only job_mem is used now).
-- Remove --task-mem and --job-mem options from salloc, sbatch and srun
(use --mem-per-cpu or --mem instead).
-- Remove DefMemPerTask from slurm.conf (use DefMemPerCPU or DefMemPerNode
instead).
-- Modify slurm_step_launch API call. Move launch host from function argument
to element in the data structure slurm_step_launch_params_t, which is
used as a function argument.
-- Add state_reason_string to job state with optional details about why
a job is pending.
-- Make "scontrol show node" output match scontrol input for some fields
("Cores" changed to "CoresPerSocket", etc.).
-- Add support for a new node state "FUTURE" in slurm.conf. These node records
are created in SLURM tables for future use without a reboot of the SLURM
daemons, but are not reported by any SLURM commands or APIs.
* Changes in SLURM 1.3.13
=========================
* Changes in SLURM 1.3.12
=========================
-- Added support for Workload Characteristic Key (WCKey) in accounting. The
WCkey is something that can be used in accounting to group associations
together across clusters or within clusters that are not related. Use
the --wckey option in srun, sbatch or salloc or set the SLURM_WCKEY env
var to have this set. Use sreport with the wckey option to view reports.
THIS CHANGES THE RPC LEVEL IN THE SLURMDBD. YOU MUST UPGRADE YOUR SLURMDBD
BEFORE YOU UPGRADE THE REST OF YOUR CLUSTERS. THE NEW SLURMDBD WILL TALK
TO OLDER VERSIONS OF SLURM FINE.
-- Added configuration parameter BatchStartTimeout to control how long to
allow for a batch job prolog and environment loading (for Moab) to run.
See "man slurm.conf" for details.
-- For a job step, add support for srun's --nodelist and --exclusive options
to be used together.
-- On slurmstepd failure, set node state to DRAIN rather than DOWN.
-- Fix bug in select/cons_res that would incorrectly satify a tasks's
--cpus-per-task specification by allocating the task CPUs on more than
one node.
-- Add support for hostlist expressions containing up to two numeric
expressions (e.g. "rack[0-15]_blade[0-41]").
-- Fix bug in slurmd message forwarding which left file open in the case of
some communication failures.
-- Correction to sinfo node state information on BlueGene systems. DRAIN
state was replaced with ALLOC or IDLE under some situations.
-- For sched/wiki2 (Moab), strip quotes embedded within job names from the
name reported.
-- Fix bug in jobcomp/script that could cause the slurmctld daemon to exit
upon reconfiguration ("scontrol reconfig" or SIGHUP).
-- Fix to sinfo, don't print a node's memory size or tmp_disk space with
suffix of "K" or "M" (thousands or millions of megabytes).
* Changes in SLURM 1.3.11
=========================
-- Bluegene/P support added (minimally tested, but builds correctly).
-- Fix infinite loop when using accounting_storage/mysql plugin either from
the slurmctld or slurmdbd daemon.
-- Added more thread safety for assoc_mgr in the controller.
-- For sched/wiki2 (Moab), permit clearing of a job's dependencies with the
JOB_MODIFY option "DEPEND=0".
-- Do not set a running or pending job's EndTime when changing it's time
limit.
-- Fix bug in use of "include" parameter within the plugstack.conf file.
-- Fix bug in the parsing of negative numeric values in configuration files.
-- Propagate --cpus-per-task parameter from salloc or sbatch input line to
the SLURM_CPUS_PER_TASK environment variable in the spawned shell for
srun to use.
-- Add support for srun --cpus-per-task=0. This can be used to spawn tasks
without allocating resouces for the job step from the job's allocation
when running multiple job steps with the --exclusive option.
-- Remove registration messages from saved messages when bringing down cluster.
Without causes deadlock if wrong cluster name is given.
-- Correction to build for srun debugger (export symbols).
-- sacct will now display more properly allocations made with salloc with only
one step.
-- Altered sacctmgr, sreport to look at complete option before applying.
Before we would only look at the first determined significant characters.
-- BLUGENE - in overlap mode marking a block to error state will now end
jobs on overlapping blocks and free them.
-- Give a batch job 20 minutes to start before considering it missing and
killing it (long delay could result from slurmd being paged out). Changed
the log message from "Master node lost JobId=%u, killing it" to "Batch
JobId=%u missing from master node, killing it".
-- Avoid "Invalid node id" error when a job step within an existing job
allocation specifies a node count which is less than the node count
allocated in order to satisfy the task count specification (e.g.
"srun -n16 -N1 hostname" on allocation of 16 one-CPU nodes).
-- For sched/wiki2 (Moab) disable changing a job's name after it has begun
Loading
Loading full blame...