Newer
Older
This file describes changes in recent versions of SLURM. It primarily
documents those changes that are of interest to users and admins.
* Changes in SLURM 2.2.0.pre9
=============================

Danny Auble
committed
-- sbatch can now submit jobs to multiple clusters and run on the earliest
available.
-- Fix bug introduced in pre8 that prevented job dependencies and job
triggers from working without the --enable-debug configure option.
-- Replaced slurm_addr with slurm_addr_t
-- Skeleton code added for BlueGeneQ.
-- Jobs can now be submitted to multiple partitions (job queues) and use the
one permitting earliest start time.

Danny Auble
committed
-- Change slurmdb_coord_table back to acct_coord_table to keep consistant
with < 2.1.
-- Introduced locking system similar to that in the slurmctld for the
assoc_mgr.
-- Added ability to change a users name in accounting.
-- Restore squeue support for "%G" format (group id) accidentally removed in
2.2.0.pre7.
-- Added preempt_mode option to QOS.
-- Added a grouping=individual for sreport size reports.
-- Added remove_qos logic to jobs running under a QOS that was removed.
-- scancel now exits with a 1 if any job is non-existant when canceling.
-- Better handling of select plugins that don't exist on various systems for
cross cluster communication. Slurmctld, slurmd, and slurmstepd now only
load the default select plugin as well.
-- Prevent scontrol from aborting if getlogin() returns NULL.
-- Prevent scontrol segfault when there are hidden nodes.
-- Prevent srun segfault after task launch failure.
-- Added job_submit/lua plugin.
-- Fixed sinfo on a bluegene system to print correctly the output for:

Danny Auble
committed
sinfo -e -o "%9P %6m %.4c %.22F %f"
-- Add scontrol commands "hold" and "release" to simplify setting a job's
priority to 0 or 1. Also tests that the job is in pending state.
-- Increase maximum node list size (for incoming RPC) from 1024 bytes to 64k.
-- In the backup slurmctld, purge triggers before recovering trigger state to
avoid duplicate entries.
-- Fix bug in sacct processing of --fields= option.
-- Fix bug in checkpoint/blcr for jobs spanning multiple nodes introduced when
changing some variable names in version 2.2.0.pre5.
-- Removed the vestigal set_max_cluster_usage() function from the Priority
Plugin API.
-- Modify the output of "scontrol show job" for the field ReqS:C:T=. Fields
not specified by the user will be reported as "*" instead of 65534.
-- Added DefaultQOS option for an association.

Danny Auble
committed
-- BLUEGENE - Added -B option to the slurmctld to clear created blocks from
the system on start.
-- BLUEGENE - Added option to scontrol & sview to recreate existing blocks.

Danny Auble
committed
-- Fixed flags for returning messages to use the correct munge key when going
cross-cluster.

Danny Auble
committed
-- BLUEGENE - Added option to scontrol & sview to resume blocks in an error
state instead of just freeing them.
* Changes in SLURM 2.2.0.pre8
=============================
-- Add DebugFlags parameter of "Backfill" for sched/backfill detailed logging.
-- Add DebugFlags parameter of "Gang" for detailed logging of gang scheduling
activities.
-- Add DebugFlags parameter of "Priority" for detailed logging of priority
multifactor activities.
-- Add DebugFlags parameter of "Reservation" for detailed logging of advanced
reservations.
-- Add run time to mail message upon job termination and queue time for mail
message upon job begin.
-- Add email notification option for job requeue.
-- Generate a fatal error if the srun --relative option is used when not
within an existing job allocation.
-- Modify the meaning of InactiveLimit slightly. It will now cancel the job
allocation created using the salloc or srun command if those commands
cease responding for the InactiveLimit regardless of any running job steps.
This parameter will no longer effect jobs spawned using sbatch.
-- Remove AccountingStoragePass and JobCompPass from configuration RPC and
scontrol show config command output. The use of SlurmDBD is still strongly
recommended as SLURM will have limited database functionality or protection
otherwise.
-- Add sbatch options of --export and SBATCH_EXPORT to control which
environment variables (if any) get propagated to the spawned job. This is
particularly important for jobs that are submitted on one cluster and run
on a different cluster.
-- Fix bug in select/linear when used with gang scheduling and there are
preempted jobs at the time slurmctld restarts that can result in over-
subscribing resources.
-- Added keeping track of the qos a job is running with in accounting.

Danny Auble
committed
-- Fix for handling correctly jobs that resize, and also reporting correct
stats on a job after it finishes.

Danny Auble
committed
-- Modify gang scheduler so with SelectTypeParameter=CR_CPUS and task
affinity is enabled, keep track of the individual CPUs allocated to jobs
rather than just the count of CPUs allocated (which could overcommit
specific CPUs for running jobs).
-- Modify select/linear plugin data structures to eliminate underflow errors
for the exclusive_cnt and tot_job_cnt variables (previously happened when
slurmctld reconfigured while the job was in completing state).
-- Change slurmd's working directory (and location of core files) to match
that of the slurmctld daemon: the same directory used for log files,
SlurmdLogFile (if specified with an absolute pathname) otherwise the
directory used to save state, SlurmdSpoolDir.
-- Add sattach support for the --pty option.
-- Modify slurmctld communications logic to accept incoming messages on more
than one port for improved scalability.
-- Add SchedulerParameters option of "defer" to avoid trying to schedule a
job at submission time, but to attempt scheduling many jobs at once for
improved performance under heavy load.
-- Correct logic controlling slurmctld thread limit eliminating check of
RLIMIT_STACK.
-- Make slurmctld's trigger logic more robust in the event that job records
get purged before their trigger can be processed (e.g. MinJobAge=1).
-- Add support for users to hold/release their own jobs (submit the job with
srun/sbatch --hold/-H option or use "scontrol update jobid=# priority=0"
to hold and "scontrol update jobid=# priority=1" to release).
-- Added ability for sacct to query jobs by qos and a range of timelimits.
-- Added ability for sstat to query pids of steps running.
-- Support time specification in UTS format with a prefix of "uts" (e.g.
"sbatch --begin=uts458389988 my.script").
* Changes in SLURM 2.2.0.pre7
=============================
-- Fixed issue with sacctmgr if querying against non-existent cluster it

Danny Auble
committed
works the same way as 2.1.
-- Added infrastructure to support allocation of generic node resources (gres).
-Modified select/linear and select/cons_res plugins to allocate resources
at the level of a job without oversubcription.
-Get sched/backfill operating with gres allocations.
-Get gres configuration changes (reconfiguration) working.
-Have job steps allocate resources.
-Modified job step credential to include the job's and step's gres
allocation details.
-Integrate with HWLOC library to identify GPUs and NICs configured on each
node.
-- SLURM commands (squeue, sinfo, etc...) can now go cross-cluster on like
linux systems. Cross-cluster for bluegene to linux and such should
-- Added the ability to configure PreemptMode on a per-partition basis.
-- Change slurmctld's default thread limit count to 1024, but adjust that down
as needed based upon the process's resource limits.
-- Removed the non-functional "SystemCPU" and "TotalCPU" reporting fields from
sstat and updated man page
-- Correct location of apbasil command on Cray XT systems.
-- Fixed bug in MinCPU and AveCPU calculations in sstat command
-- Send message to srun when the Prolog takes too long (MessageTimeout) to
complete.
-- Change timeout for socket connect() to be half of configured MessageTimeout.
-- Added high-throughput computing web page with configuration guidance.
-- Use more srun sockets to process incoming PMI (MPICH2) connections for
better scalability.
-- Added DebugFlags for the select/bluegene plugin: DEBUG_FLAG_BG_PICK,
DEBUG_FLAG_BG_WIRES, DEBUG_FLAG_BG_ALGO, and DEBUG_FLAG_BG_ALGO_DEEP.
-- Remove vestigial job record field "kill_on_step_done" (internal to the
slurmctld daemon only).
-- For MPICH2 jobs: Clear PMI state between job steps.
* Changes in SLURM 2.2.0.pre6
=============================
-- sview - added ability to see database configuration.
-- sview - added ability to add/remove visible tabs.
-- sview - change way grid highlighting takes place on selected objects.
-- Added infrastructure to support allocation of generic node resources.
-Added node configuration parameter of Gres=.
-Added ability to view/modify a node's gres using scontrol, sinfo and sview.
-Added salloc, sbatch and srun --gres option.
-Added ability to view a job or job step's gres using scontrol, squeue and
sview.
-Added new configuration parameter GresPlugins to define plugins used to
manage generic resources.
-Added DebugFlags option of "gres" for detailed debugging of gres actions.
-- Slurmd modified to log slow slurmstepd startup and note possible file system
problem.
-- sview - There is now a .slurm/sviewrc created when running sview.

Danny Auble
committed
Defaults are put in there as to how sview looks when first launched.
You can set these by Ctrl-S or Options->Set Default Settings.
-- Add scontrol "wait_job <job_id>" option to wait for nodes to boot as needed.

Moe Jette
committed
Useful for batch jobs (in Prolog, PrologSlurmctld or the script) if powering
down idle nodes.
-- Added salloc and sbatch option --wait-for-nodes. If set non-zero, job
initiation will be delayed until all allocated nodes have booted. Salloc
will log the delay with the messages "Waiting for nodes to boot" and "Nodes
are ready for use".

Danny Auble
committed
-- The Priority/mulitfactor plugin now takes into consideration size of job
in cpus as well as size in nodes when looking at the job size factor.
Previously only nodes were considered.

Danny Auble
committed
-- When using the SlurmDBD messages waiting to be sent will be combined
and sent in one message.
-- Remove srun's --core option. Move the logic to an optional SPANK plugin
(currently in the contribs directory, but plan to distribute through
http://code.google.com/p/slurm-spank-plugins/).

Danny Auble
committed
-- Patch for adding CR_CORE_DEFAULT_DIST_BLOCK as a select option to layout
jobs using block layout across cores within each node instead of cyclic
which was previously the default.

Danny Auble
committed
-- Accounting - When removing associations if jobs are running, those jobs
must be killed before proceeding. Before the jobs were killed
automatically thus causing user confusion on what is most likely an
admin's mistake.
-- sview - color column keeps reference color when highlighting.
-- Configuration parameter MaxJobCount changed from 16-bit to 32-bit field.
The default MaxJobCount was changed from 5,000 to 10,000.

Danny Auble
committed
-- SLURM commands (squeue, sinfo, etc...) can now go cross-cluster on like
linux systems. Cross-cluster for bluegene to linux and such does not

Danny Auble
committed
currently work. You can submit jobs with sbatch. Salloc and srun are not
cross-cluster compatible, and given their nature to talk to actual compute

Danny Auble
committed
nodes these will likely never be.
-- salloc modified to forward SIGTERM to the spawned program.
-- In sched/wiki2 (for Moab support) - Add GRES and WCKEY fields to MODIFYJOBS
and GETJOBS commands. Add GRES field to GETNODES command.
-- In struct job_descriptor and struct job_info: rename min_sockets to
sockets_per_node, min_cores to cores_per_socket, and min_threads to
threads_per_core (the values are not minimum, but represent the target
values).
-- Fixed bug in clearing a partition's DisableRootJobs value reported by
Hongjia Cao.
-- Purge (or ignore) terminated jobs in a more timely fashion based upon the
MinJobAge configuration parameter. Small values for MinJobAge should improve
responsiveness for high job throughput.
* Changes in SLURM 2.2.0.pre5
=============================
-- Modify commands to accept time format with one or two digit hour value
-- Modify time parsing logic to accept "minute", "hour", "day", and "week" in
addition to the currently accepted "minutes", "hours", etc.
-- Add slurmd option of "-C" to print actual hardware configuration and exit.
-- Pass EnforcePartLimits configuration parameter from slurmctld for user
commands to see the correct value instead of always "NO".
-- Modify partition data structures to replace the default_part,
disable_root_jobs, hidden and root_only fields with a single field called
"flags" populated with the flags PART_FLAG_DEFAULT, PART_FLAG_NO_ROOT
PART_FLAG_HIDDEN and/or PART_FLAG_ROOT_ONLY. This is a more flexible
solution besides making for smaller data structures.
-- Add node state flag of JOB_RESIZING. This will only exist when a job's
accounting record is being written immediately before or after it changes
size. This permits job accounting records to be written for a job at each
size.
-- Make calls to jobcomp and accounting_storage plugins before and after a job
changes size (with the job state being JOB_RESIZING). All plugins write a
record for the job at each size with intermediate job states being
-- When changing a job size using scontrol, generate a script that can be
executed by the user to reset SLURM environment variables.
-- Modify select/linear and select/cons_res to use resources released by job
resizing.
-- Added to contribs foundation for Perl extension for slurmdb library.
-- Add new configuration parameter JobSubmitPlugins which provides a mechanism
to set default job parameters or perform other site-configurable actions at
job submit time.
-- Better postgres support for accounting, still beta.
-- Speed up job start when using the slurmdbd.

Danny Auble
committed
-- Forward step failure reason back to slurmd before in some cases it would
just be SLURM_FAILURE returned.
-- Changed squeue to fail when passed invalid -o <output_format> or
-S <sort_list> specifications.
* Changes in SLURM 2.2.0.pre4
=============================
-- Add support for a PropagatePrioProcess configuration parameter value of 2
to restrict spawned task nice values to that of the slurmd daemon plus 1.
This insures that the slurmd daemon always have a higher scheduling
priority than spawned tasks.
-- Add support in slurmctld, slurmd and slurmdbd for option of "-n <value>" to
-- Fixed slurm_load_slurmd_status and slurm_pid2jobid to work correctly when
multiple slurmds are in use.

Danny Auble
committed
-- Altered srun to set max_nodes to min_nodes if not set when doing an
allocation to mimic that which salloc and sbatch do. If running a step if
the max isn't set it remains unset.
-- Applied patch from David Egolf (David.Egolf@Bull.com). Added the ability
to purge/archive accounting data on a day or hour basis, previously
it was only available on a monthly basis.

Moe Jette
committed
-- Add support for maximum node count in job step request.
-- Fix bug in CPU count logic for job step allocation (used count of CPUS per
node rather than CPUs allocated to the job).
-- Add new configuration parameters GroupUpdateForce and GroupUpdateTime.
See "man slurm.conf" for details about how these control when slurmctld
updates its information of which users are in the groups allowed to use
partitions.
Loading
Loading full blame...