Newer
Older
This file describes changes in recent versions of SLURM. It primarily
documents those changes that are of interest to users and admins.
* Changes in SLURM 1.3.0-pre9
=============================
-- Add spank support to sbatch. Note that spank_local_user() will be called
with step_layout=NULL and gid=SLURM_BATCH_SCRIPT and spank_fini() will
be called immediately afterwards.

Danny Auble
committed
-- Made configure use mysql_config to find location of mysql database install
Removed bluegene specific information from the general database tables.
-- Re-write sched/backfill to utilize new will-run logic in the select
plugins. It now supports select/cons_res and all job options (required
nodes, excluded nodes, contiguous, etc.).
-- Modify scheduling logic to better support overlapping partitions.
-- Add --task-mem option and remove --job-mem option from srun, salloc, and
sbatch commands. Enforce step memory limit, if specified and there is
no job memory limit specified (--mem). Also see DefMemPerTask and
MaxMemPerTask in "man slurm.conf". Enforcement is dependent upon job
accounting being enabled with non-zero value for JoabAcctGatherFrequency.
* Changes in SLURM 1.3.0-pre8
=============================
-- Modify how strings are packed in the RPCs, Maximum string size
increased from 64KB (16-bit size field) to 4GB (32-bit size field).
-- Fix bug that prevented time value of "INFINITE" from being processed.
-- Added new srun/sbatch option "--open-mode" to control how output/error
files are opened ("t" for truncate, "a" for append).
-- Added checkpoint/xlch plugin for use with XLCH (Hongjia Cao, NUDT).
-- Added srun option --checkpoint-path for use with XLCH (Hongjia Cao, NUDT).
-- Added new srun/salloc/sbatch option "--acctg-freq" for user control over
accounting data collection polling interval.
-- In sched/wiki2 add support for hostlist expression use in GETNODES command
with HostFormat=2 in the wiki.conf file.
-- Added new scontrol option "setdebug" that can change the slurmctld daemons
debug level at any time (Hongjia Cao, NUDT).
-- Track total total suspend time for jobs and steps for accounting purposes.
-- Add version information to partition state file.
-- Added 'will-run' functionality to all of the select plugins (bluegene,
linear, and cons_res) to return node list and time job can start based
on other jobs running.
-- Major restructuring of node selection logic. select/linear now supports
partition max_share parameter and tries to match like size jobs on the

Moe Jette
committed
same nodes to improve gang scheduling performance. Also supports treating
memory as consumable resource for job preemption and gang scheduling if
SelectTypeParameter=CR_Memory in slurm.conf.
-- BLUEGENE: Reorganized bluegene plugin for maintainability sake.
-- Major restructuring of data structures in select/cons_res.
-- Support job, node and partition names of arbitrary size.
-- Fix bug that caused slurmd to hang when using select/linear with
task/affinity.
* Changes in SLURM 1.3.0-pre7
=============================
-- Fix a bug in the processing of srun's --exclusive option for a job step.
* Changes in SLURM 1.3.0-pre6
=============================
-- Add support for configurable number of jobs to share resources using the
partition Shared parameter in slurm.conf (e.g. "Shared=FORCE:3" for two
jobs to share the resources). From Chris Holmes, HP.
-- Made salloc use api instead of local code for message handling.
* Changes in SLURM 1.3.0-pre5
=============================
-- Add select_g_reconfigure() function to node changes in slurmctld configuration
that can impact node scheduling.
-- scontrol to set/get partition's MaxTime and job's Timelimit in minutes plus
new formats: min:sec, hr:min:sec, days-hr:min:sec, days-hr, etc.
-- scontrol "notify" command added to send message to stdout of srun for
specified job id.

Moe Jette
committed
-- For BlueGene, make alpha part of node location specification be case insensitive.
-- Report scheduler-plugin specific configuration information with the
"scontrol show configuration" command on the SCHEDULER_CONF line. This
information is not found in the "slurm.conf" file, but a scheduler plugin
specific configuration (e.g. "wiki.conf").
-- sview partition information reported now includes partition priority.
-- Expand job dependency specification to support concurrent execution,
testing of job exit status and multiple job IDs.
* Changes in SLURM 1.3.0-pre4
=============================
-- Job step launch in srun is now done from the slurm api's all further
modifications to job launch should be done there.
-- Add new partition configuration parameter Priority. Add job count to
Shared parameter.
-- Add new configuration parameters DefMemPerTask, MaxMemPerTask, and
SchedulerTimeSlice.
-- In sched/wiki2, return REJMESSAGE with details on why a job was
requeued (e.g. what node failed).
* Changes in SLURM 1.3.0-pre3
=============================
-- Added srun option "--checkpoint=time" for job step to automatically be
checkpointed on a period basis.
-- Change behavior of "scancel -s KILL <jobid>" to send SIGKILL to all job
steps rather than cancelling the job. This now matches the behavior of
all other signals. "scancel <jobid>" still cancels the job and all steps.
-- Add support for new job step options --exclusive and --immediate. Permit
job steps to be queued when resources are not available within an existing
job allocation to dedicate the resources to the job step. Useful for
executing simultaneous job steps. Provides resource management both at
the level of jobs and job steps.
srun --nodes=16 --constraint=graphics*4 ...
Based upon work by Kumar Krishna (HP, India).
-- Add multi-core options to salloc and sbatch commands (sbatch.patch and
cleanup.patch from Chris Holmes, HP).
-- In select/cons_res properly release resources allocated to job being
suspended (rmbreak.patch, from Chris Holmes, HP).
-- Removed database and jobacct plugin replaced with jobacct_storage
and jobacct_gather for easier hooks for further expansion of the
jobacct plugin.
* Changes in SLURM 1.3.0-pre2
=============================
-- Added new srun option --pty to start job with pseudo terminal attached
to task 0 (all other tasks have I/O discarded)
-- Disable user specifying jobid when sched/wiki2 configured (needed for
Moab releases until early 2007).
-- Report command, args and working directory for batch jobs with
"scontrol show job".
* Changes in SLURM 1.3.0-pre1
=============================

Christopher J. Morrone
committed
-- !!! SRUN CHANGES !!!
The srun options -A/--allocate, -b/--batch, and -a/--attach have been
removed! That functionality is now available in the separate commands
salloc, sbatch, and sattach, respectively.
-- Add new node state FAILING plus trigger for when node enters that state.
-- Add new configuration paramter "PrivateData". This can be used to
prevent a user from seeing jobs or job steps belonging to other users.
-- Added configuration parameters for node power save mode: ResumeProgram
ResumeRate, SuspendExcNodes, SuspendExcParts, SuspendProgram and
SuspendRate.
-- Slurmctld maintains the IP address (rather than hostname) for srun
communications. This fixes some possible network routing issues.
-- Added global database plugin. Job accounting and Job completion are the
first to use it. Follow documentation to add more to the plugin.
-- Removed no-longer-needed jobacct/common/common_slurmctld.c since that is
replaced by the database plugin.
-- Added new configuration parameter: CryptoType.
Moved existing digital signature logic into new plugin: crypto/openssl.
Added new support for crypto/munge (available with GPL license).
* Changes in SLURM 1.2.23
=========================
-- Fix for libpmi to not export unneed variables like xstr*
-- BLUEGENE - added per partition dynamic block creation
-- fix infinite loop bug in sview when there were multiple partitions
-- Send message to srun command when a job is requeued due to node failure.
Note this will be overwritten in the output file unless JobFileAppend
is set in slurm.conf. In slurm version 1.3, srun's --open-mode=append
option will offer this control for each job.
* Changes in SLURM 1.2.22
=========================
-- In sched/wiki2, add support for MODIFYJOB option "MINSTARTTIME=<time>"
to modify a job's earliest start time.
-- In sbcast, fix bug with large files and causing sbcast to die.
-- In sched/wiki2, add support for COMMENT= option in STARTJOB and CANCELJOB
commands.
-- Avoid printing negative job run time in squeue due to clock skew.
-- In sched/wiki and sched/wiki2, add support for wiki.conf option
HidePartitionJobs (see man pages for details).
-- Update to srun/sbatch --get-user-env option logic (needed by Moab).
-- In slurmctld (for Moab) added job->details->reserved_resources field
to report resources that were kept in reserve for job while it was
pending.
-- In sched/wiki (for Maui scheduler) report a pending job's node feature
requirements (from Miguel Roa, BSC).
-- Permit a user to change a pending job's TasksPerNode specification
using scontrol (from Miguel Roa, BSC).
-- Add support for node UP/DOWN event logging in jobacct/gold plugin
WARNING: using the jobacct/gold plugin slows the system startup set the
MessageTimeout variable in the slurm.conf to around 20+.
-- Added check at start of slurmctld to look for /tmp/slurm_gold_first if
there, and using the gold plugin slurm will make record of all nodes in
downed or drained state.
* Changes in SLURM 1.2.21
=========================
-- Fixed torque wrappers to look in the correct spot for the perl api
-- Do not treat user resetting his time limit to the current value as
an error.
-- Set correct executable names for Totalview when --multi-prog option
is used and more than one node is allocated to the job step.
-- When a batch job gets requeued, record in accounting logs that
the job was cancelled, the requeued job's submit time will be
set to the time of its requeue so it looks like a different job.
-- Prevent communication problems if the slurmd/slurmstepd have a
different JobAcct plugin configured than slurmctld.
-- Adding Gold plugin for job accounting
-- In sched/wiki2, add support for MODIFYJOB option "JOBNAME=<name>"
to modify a job's name.
-- Add configuration check for sys/syslog.h and include it as needed.
-- Add --propagate option to sbatch for control over limit propagation.
-- Added Gold interface to the jobacct plugin. To configure in the config
file specify...
JobAcctType=jobacct/gold
Loading
Loading full blame...