Newer
Older
This file describes changes in recent versions of SLURM. It primarily
documents those changes that are of interest to users and admins.
* Changes in SLURM 2.2.0.pre0
=============================
-- Added SLURM_VERSION_NUMBER and removed SLURM_API_VERSION from
slurm/slurm.h.

Danny Auble
committed
-- Added support to handle communication with SLURM 2.1 clusters. Job's
should not be lost in the future when upgrading to higher versions of
SLURM.
-- Added withdeleted options for listing clusters, users, and accounts
-- Remove PLPA task affinity functions due to that package being deprecated.
-- Preserve current partition state information rather than use contents of
slurm.conf file after slurmctld restart or reconfiguration.
-- Preserve current node Feature state information rather than use contents
of slurm.conf file after slurmctld restart or reconfiguration.
-- Modify SLURM's PMI library (for MPICH2) to properly execute an executable
program stand-alone (single MPI task launched without srun).
-- Made GrpCPUs and MaxCPUs limits work for select/cons_res.

Danny Auble
committed
-- Moved all SQL dependant plugins into a seperate rpm slurm-sql. This
should be needed only where a connection to a database is needed (i.e.
where the slurmdbd is running)
-- Add command line option "no_sys_info" to PAM module to supress system
logging of "access granted for user ...", access denied and other errors
will still be logged.
* Changes in SLURM 2.1.2
=============================
-- Added nodelist to sview for jobs on non-bluegene systems
-- Correction in value of batch job environment variable SLURM_TASKS_PER_NODE
under some conditions.
-- When a node silently fails which is already drained/down the reason
for draining for the node is not changed.
-- Srun will ignore SLURM_NNODES environment variable and use the count of
currently allocated nodes if that count changes during the job's lifetime
(e.g. job allocation uses the --no-kill option and a node goes DOWN, job
step would previously always fail).
-- Made it so sacctmgr can't add blank user or account. The MySQL plugin
will also reject such requests.
-- Revert libpmi.so version for compatibility with SLURM version 2.0 and
earlier to avoid forcing applications using a specific libpmi.so version to
rebuild unnecessarily (revert from libpmi.so.21.0.0 to libpmi.so.0.0.0).
-- Restore support for a pending job's constraints (required node features)
when slurmctld is restarted (internal structure needed to be rebuilt).
-- Removed checkpoint_blcr.so from the plugin rpm in the slurm.spec since
it is also in the blcr rpm.
-- Fixed issue in sview where you were unable to edit the count
of jobs to share resources.
-- BLUEGENE - Fixed issue where tasks on steps weren't being displayed
correctly with scontrol and sview.
-- BLUEGENE - fixed wiki2 plugin to report correct task count for pending jobs.
* Changes in SLURM 2.1.1
=============================

Danny Auble
committed
-- Fix for case sensitive databases when a slurmctld has a mixed case
clustername to lower case the string to easy compares.

Danny Auble
committed
-- Fix squeue if job is completing and failed to print remaining
nodes instead of failed message.
-- Fix sview core when searching for partitions by state.

Danny Auble
committed
-- Fixed setting the start time when querying in sacct to the
beginning of the day if not set previously.
-- Defined slurm_free_reservation_info_msg and slurm_free_topo_info_msg
in common/slurm_protocol_defs.h
-- Avoid generating error when a job step includes a memory specification and
memory is not configured as a consumable resource.
-- Patch for small memory leak in src/common/plugstack.c
-- Fix bug in which improperly formed job dependency specification can cause
slurmctld to abort.
-- Fixed issue where slurmctld wouldn't always get a message to send cluster
information when registering for the first time with the slurmdbd.
-- Add slurm_*_trigger.3 man pages for event trigger APIs.
-- Fix bug in job preemption logic that would free allocated memory twice.
-- Fix spelling issues (from Gennaro Oliva)
-- Fix issue when changing parents of an account in accounting all childern
weren't always sent to their respected slurmctlds until a restart.
-- Restore support for srun/salloc/sbatch option --hint=nomultithread to
bind tasks to cores rather than threads (broken in slurm v2.1.0-pre5).
-- Fix issue where a 2.0 sacct could not talk correctly to a 2.1 slurmdbd.
-- BLUEGENE - Fix issue where no partitions have any nodes assigned them to
alert user no blocks can be created.
-- BLUEGENE - Fix smap to put BGP images when using -Dc on a Blue Gene/P system
-- Set SLURM_SUBMIT_DIR environment variable for srun and salloc commands to
match behavior of sbatch command.
-- Report WorkDir from "scontrol show job" command for jobs launched using
salloc and srun.
-- Update correctly the wckey when changing it on a pending job.
-- Set wckeyid correctly in accounting when cancelling a pending job.
-- BLUEGENE - critical fix where jobs would be killed incorrectly.
-- BLUEGENE - fix for sview putting multiple ionodes on to nodelists when
viewing the jobs tab.
-- Improve sview layout of blocks in use.
-- A user can now change the dimensions of the grid in sview.
-- BLUEGENE - improved startup speed further for large numbers of defined
blocks
-- Fix to _get_job_min_nodes() in wiki2/get_jobs.c suggested by Michal Novotny
-- BLUEGENE - fixed issues when updating a pending job when a node
count was incorrect for the asked for connection type.

Danny Auble
committed
-- BLUEGENE - fixed issue when combining blocks that are in ready states to
make a larger block from those or make multiple smaller blocks by
splitting the larger block. Previously this would only work with block
in a free state.

Danny Auble
committed
-- Fix bug in wiki(2) plugins where if HostFormat=2 and the task list is
greater than 64 we don't truncate. Previously this would mess up Moab
by sending a truncated task list when doing a get jobs.
-- Added update slurmctld debug level to sview when in admin mode.
-- Added logic to make sure if enforcing a memory limit when using the
jobacct_gather plugin a user can no longer turn off the logic to enforce
the limit.
-- Replaced many calls to getpwuid() with reentrant uid_to_string()

Danny Auble
committed
-- The slurmstepd will now refresh it's log file handle on a reconfig,
previously if a log was rolled any output from the stepd was lost.
* Changes in SLURM 2.1.0-pre9
=============================
-- Added the "scontrol update SlurmctldDebug" as the preferred alternative to
the "scontrol setdebug" command.

Danny Auble
committed
-- BLUEGENE - made it so when removing a block in an error state the nodes in
the block are set correctly in accounting as not in error.

Danny Auble
committed
-- Fixed issue where if slurmdbd is not up qos' are set up correctly for
-- scontrol, squeue, sview all display the correct node, cpu count along with
correct corresponding nodelist on completing jobs.
-- Patch (Mark Grondona) fixes serious security vulnerability in SLURM in
the spank_job_env functionality.
-- Improve spank_job_env interface and documentation
-- Add ESPANK_NOT_LOCAL error code to spank_err_t
-- Made the #define DECAY_INTERVAL used in the priority/multifactor plugin
a slurm.conf variable (PriorityCalcPeriod)

Danny Auble
committed
-- Added new macro SLURM_VERSION for use in autoconf scripts to determine
current version of slurm installed on system when building against the api.
-- Patch from Matthieu Hautreux that adds an entry into the error file when
a job or step receives a TERM or KILL signal.

Danny Auble
committed
-- Make it so env var SLURM_SRUN_COMM_HOST is overwritten if already in
* Changes in SLURM 2.1.0-pre8
=============================
-- Rearranged the "scontrol show job" output into functional groupings
-- Change the salloc/sbatch/srun -P option to -d (dependency)
-- Removed the srun -d option; must use srun --slurmd-debug instead
-- When running the mysql plugin natively MUNGE errors are now eliminated
when sending updates to slurmctlds.
-- Check to make sure we have a default account before looking to
fill in default association.
-- Accounting - Slurmctld and slurmdbd will now set uids of users which were
created after the start of the daemons on reconfig. Slurmdbd will
attempt to set previously non-existant uids every hour.

Danny Auble
committed
-- Patch from Aaron Knister and Mark Grondona, to parse correctly quoted
#SBATCH options in a batch script.
-- job_desc_msg_t - in, out, err have been changed to std_in, std_out,
and std_err respectfully. Needed for PySLURM, since Python sees (in)
as a keyword.
-- Changed the type of addr to struct sockaddr_in in _message_socket_accept()

Danny Auble
committed
in sattach.c, step_launch.c, and allocate_msg.c, and moved the function
into a common place for all the calls since the code was very similar.
-- proctrack/lua support has been added see contribs/lua/protrack.lua
-- replaced local gtk m4 test with AM_PATH_GTK_2_0
-- changed AC_CHECK_LIB to AC_SEARCH_LIBS to avoid extra libs in
compile lines.
-- Patch from Matthieu Hautreux to improve error message in slurmd/req.c
-- Added support for split groups from (Matthiu Hautreux CEA)
-- Patch from Mark Grondona to move blcr scripts into pkglibexecdir
-- Patch from Doug Parisek to calculate a job's projected start time under the
builtin scheduler.
-- Removed most global variables out of src/common/jobacct_common.h
* Changes in SLURM 2.1.0-pre7
=============================

Danny Auble
committed
-- BLUEGENE - make 2.1 run correctly on a real bluegene cluster
-- sacctmgr - Display better debug for when an admin specifies a non-existant

Danny Auble
committed
parent account when changing parent accounts.
-- Added a mechanism to the slurmd to defer the epilog from starting until
after a running prolog has finished.
-- If a node reboots inbetween checking status the node is marked down unless
ReturnToService=2

Danny Auble
committed
-- Added -R option to slurmctld to recover partition state also when
restarting or reconfiguring.
* Changes in SLURM 2.1.0-pre6
=============================
-- When getting information about nodes in hidden partitions, return a node
name of NULL rather than returning no information about the node so that
node index information is still valid.
-- When querying database for jobs in certain state and a time period is

Danny Auble
committed
given only jobs in that state during the period will be returned,
previously if a time period was given in sacct jobs eligible to run or
running would be displayed, which is still the default if no states are
requested.
-- One can now query jobs based on size (nodes and or cpus) (mysql plugin only)

Don Lipari
committed
-- Applied patch from Mark Grondona that tests for a missing config file before
any other processing in spank_init(). This now prevents fatal errors from
being mistakenly treated as recoverable.
-- --enable-debug no longer has to be stated at configure time to have
the slurmctld or slurmstepd dump core on a seg fault.
-- Moved the errant slurm_job_node_ready() declaration from job_info.h to
Loading
Loading full blame...