Newer
Older
This file describes changes in recent versions of SLURM. It primarily
documents those changes that are of interest to users and admins.
* Changes in SLURM 1.3.0-pre3
=============================
* Changes in SLURM 1.3.0-pre2
=============================
-- Added new srun option --pty to start job with pseudo terminal attached
to task 0 (all other tasks have I/O discarded)
-- Disable user specifying jobid when sched/wiki2 configured (needed for
Moab releases until early 2007).
-- Report command, args and working directory for batch jobs with
"scontrol show job".
* Changes in SLURM 1.3.0-pre1
=============================

Christopher J. Morrone
committed
-- !!! SRUN CHANGES !!!
The srun options -A/--allocate, -b/--batch, and -a/--attach have been
removed! That functionality is now available in the separate commands
salloc, sbatch, and sattach, respectively.
-- Add new node state FAILING plus trigger for when node enters that state.
-- Add new configuration paramter "PrivateData". This can be used to
prevent a user from seeing jobs or job steps belonging to other users.
-- Added configuration parameters for node power save mode: ResumeProgram
ResumeRate, SuspendExcNodes, SuspendExcParts, SuspendProgram and
SuspendRate.
-- Slurmctld maintains the IP address (rather than hostname) for srun
communications. This fixes some possible network routing issues.
-- Added global database plugin. Job accounting and Job completion are the
first to use it. Follow documentation to add more to the plugin.
-- Removed no-longer-needed jobacct/common/common_slurmctld.c since that is
replaced by the database plugin.
-- Added new configuration parameter: CryptoType.
Moved existing digital signature logic into new plugin: crypto/openssl.
Added new support for crypto/munge (available with GPL license).
* Changes in SLURM 1.2.14
=========================
-- Fix a couple of bugs in MPICH/MX support (from Asier Roa, BSC).
-- Fix perl api for AIX
-- Add wiki.conf parameter ExcludePartitions for selected partitions to
be directly schedule by Slurm without Moab control
-- Optimize load leveling for shared nodes (alloc.patch, contributed
by Chris Holmes, HP).
-- Added PMI_TIME environment variable for user to control how PMI
communications are spread out in time. See "man srun" for details.
-- Added PMI timing information to srun debug mode to aid in tuning.
Use "srun -vv ..." to see the information.
-- Added checkpoint/ompi (OpenMPI) plugin (still under development).
-- Fix bug in load leveling logic added to v1.2.13 which can cause an
infinite loop and hang slurmctld when sharing nodes between jobs.
* Changes in SLURM 1.2.13
=========================
-- Add slurm.conf parameter JobFileAppend.
-- Fix for segv in "scontrol listpids" on nodes not in SLURM config.
-- Add support for SCANCEL_CTLD env var.
-- In mpi/mvapich plugin, add startup timeout logic. Time based upon
SLURM_MVAPICH_TIMEOUT (value in seconds).
-- Fixed pick_step_node logic to only pick the number of nodes requested
from the user when excluding nodes, to avoid an error message.
-- Disable salloc, sbatch and srun -I/--immediate options with
Moab scheduler.
-- Added "contribs" directory with a Perl API and Torque wrappers for Torque
to SLURM migration. This directory should be used to put anything that
is outside of SLURM proper such as a different API. Perl APIs contributed
by Hongjia Cao (NUDT).
-- In sched/wiki2: add support for tasklist with node name expressions
and task counts (e.g. TASKLIST=tux[1-4]*2:tux[12-14]*4").
-- In select/cons_res with sched/wiki2: fix bug in task layout logic.
-- Removed all curses info from the bluegene plugin putting it into smap
where it belongs.
-- Add support for job time limit specification formats: min, min:sec,
hour:min:sec, and days-hour:min:sec (formerly only supported minutes).
Applies to salloc, sbatch, and srun commands.
-- Improve scheduling support for exclusive constraint list, nodes can
now be in more than one constraint specific exclusively for a job
(e.g. "srun -C [rack1|rack2|rack3|rowB] srun")
-- Create separate MPICH/MX plugin (split out from MPICH/GM plugin)
-- Increase default MessageTimeout (in slurm.conf) from 5 to 10 secs.
-- Fix bug in batch job requeue if node zero of allocation fails to respond
to task launch request.
-- Improve load leveling logic to more evenly distribute the workload
(best_load.patch, contributed by Chris Holmes, HP).
* Changes in SLURM 1.2.12
=========================
-- Increase maximum message size from 1MB to 16MB (from Ernest Artiaga, BSC).
-- In PMI_Abort(), log the event and abort the entire job step.
-- Add support for additional PMI functions: PMI_Get_clique_ranks and
PMI_Get_clique_size (from Chuck Clouston, Bull).
-- Report an error when a hostlist comes in appearing to be a box but not
formatted in XYZxXYZ format.
-- Add support for partition configuration "Shared=exclusive". This is
equivalent to "srun --exclusive" when select/cons_res is configured.
-- In sched/wiki2, report the reason for a node being unavailable for the
GETNODES command using the CAT="<reason>" field.
-- In sched/wiki2 with select/linear, duplicate hostnames in HOSTLIST, one
per allocated processor.
-- Fix bug in scancel with specific signal and job lacks active steps.
-- In sched/wiki2, add support for NOTIFYJOB ARG=<jobid> MSG=<message>.
This sends a message to an active srun command.
-- salloc will now set SLURM_NPROCS to improve srun's behavior under salloc.
-- In sched/wiki2 and select/cons_res: insure that Slurm's CPU allocation
is identical to Moab's (from Ernest Artiaga and Asier Roa, BSC).
-- Added "scontrol show slurmd" command to status local slurmd daemon.
-- Set node DOWN if prolog fails on node zero of batch job launch.
-- Properly handle "srun --cpus-per-task" within a job allocation when
SLURM_TASKS_PER_NODE environment varable is not set.

Christopher J. Morrone
committed
-- Fixed return of slurm_send_rc_msg if msg->conn_fd is < 0 set errno ENOTCONN
and return SLURM_ERROR instead of return ENOTCONN
-- Added read before we send anything down a socket to make sure the socket
is still there.
-- Add slurm.conf variables UnkillableStepProgram and UnkillableStepTimeout.

Christopher J. Morrone
committed
-- Enable nice file propagation from sbatch command.

Christopher J. Morrone
committed
* Changes in SLURM 1.2.11
=========================
-- Updated "etc/mpich1.slurm.patch" for direct srun launch of MPICH1_P4
tasks. See the "README" portion of the patch for details.
-- Added new scontrol command "show hostlist <hostnames>" to translate a list
of hostnames into a hostlist expression (e.g. "tux1,tux2" -> "tux[1-2]")
and "show hostnames <list>", returns a list of of nodes (one node per line)
from SLURM hostlist expression or from SLURM_NODELIST environment variable
if no hostlist specified.

Christopher J. Morrone
committed
-- Add the sbatch option "--wrap".
-- Add the sbatch option "--get-user-env".
-- Added support for mpich-mx (use the mpichgm plugin).

Christopher J. Morrone
committed
-- Make job's stdout and stderr file access rights be based upon user's umask
at job submit time.
-- Add support for additional PMI functions: PMI_Parse_option,
PMI_Args_to_keyval, PMI_Free_keyvals and PMI_Get_options (from Puenlap Lee
and Nancy Kritkausky, Bull).
-- Make default value of SchedulerPort (configuration parameter) be 7321.
-- Use SLURM_UMASK environment variable (if set) at job submit time as umask
for spawned job.
-- Correct some format issues in the man pages (from Gennero Oliva, ICAR).
-- Added support for parallel make across an existing SLURM allocation
based upon GNU make-3.81. Patch is in "etc/make.slurm.patch".
-- Added '-b' option to sbatch for easy MOAB trasition to sbatch instead of
srun. Option does nothing in sbatch.
-- Changed wiki2's handling of a node state in Completing to return 'busy'
instead of 'running' which matches slurm version 1.1

Christopher J. Morrone
committed

Christopher J. Morrone
committed
-- Fix race condititon in jobacct/linux with use of proctrack/pgid and a
realloc issue inside proctrack/linux

Christopher J. Morrone
committed
-- Added MPICH1_P4 plugin for direct launch of mpich1/p4 tasks using srun
and a patched version of the mpi library. See "etc/mpich1.slurm.patch".
NOTE: This is still under development and not ready for production use.
-- Add new sinfo field to sort by "%E" sorts by the time associated with a
node's state (from Prashanth Tamraparni, HP).
-- In sched/wiki: fix logic for restarting backup slurmctld.
-- Preload SLURM plugins early in the slurmstepd operation to avoid
multiple dlopens after forking (and to avoid a glibc bug
that leaves dlopen locks in a bad state after a fork).
-- Added MPICH1_P4 patch to launch tasks using srun rather than rsh and
automatically generate mpirun's machinefile based upon the job's
allocation. See "etc/mpich1.slurm.patch".
-- BLUEGENE - fix for overlap mode to mark all other base partitions as used
when creating a new block from the file to insure we only use the base
partitions we are asking for.
-- Fix in proctrack/sgi_job plugin that could cause slurmstepd to seg_fault
preventing timely clean-up of batch jobs in some cases.
* Changes in SLURM 1.2.7
========================
-- BLUEGENE - code to make it so you can make a 36x36x36 system.

Danny Auble
committed
The wiring should be correct for a system with x-dim of 1,2,4,5,8,13
in emulation mode. It will work with any real system no matter the size.
-- Major re-write of jobcomp/script plugin: fix memory leak and
general code clean-up.
-- Add ability to change MaxNodes and ExcNodeList for pending job
using scontrol.
-- Purge zombie processes spawned via event triggers.
-- Add support for power saving mode (experimental code to reduce voltage
and frequency on nodes that stay in the IDLE state, for more information
see http://www.llnl.gov/linux/slurm/power_save.html). None of this
code is enabled by default.

Moe Jette
committed
* Changes in SLURM 1.2.6
========================
-- Fix MPIRUN_PORT env variable in mvapich plugin
-- Disable setting triggers by other than user SlurmUser unless SlurmUser
is root for improved security.

Moe Jette
committed
* Changes in SLURM 1.2.5
========================
-- Fix nodelist truncation in "scontrol show jobs" output
-- In mpi/mpichgm, fix potential problem formatting GMPI_PORT, from
Loading
Loading full blame...