Newer
Older
This file describes changes in recent versions of SLURM. It primarily
documents those changes that are of interest to users and admins.
* Changes in SLURM 1.2.0-pre2
=============================
* Changes in SLURM 1.2.0-pre1
=============================
-- Fix bug that could run a job's prolog more than once
-- Permit batch jobs to be requeued, scontrol requeue <jobid>
-- Send overcommit flag from srun in RPCs and have slurmd set SLURM_OVERCOMMIT
flag at batch job launch time.
-- Added new configuration parameter MessageTimeout (replaces #define in
the code)
* Changes in SLURM 1.1.2
========================
-- Fix bug in jobcomp/filetxt plugin to report proper NodeCnt when a job
fails due to a node failure.
-- Fix Bluegene configure to work with the new 64bit libs.
-- Fix bug in controller that causes it to segfault when hit with a malformed
message.
-- For "srun --attach=X" to other users job, report an error and exit (it
previously just hung).
-- BLUEGENE - fix for doing correct small block logic on user error.
-- BLUEGENE - Added support in slurmd to create a fake libdb2.so if it
doesn't exist so smap won't seg fault
-- BLUEGENE - "scontrol show job" reports "MaxProcs=None" and "Start=None"
if values are not specified at job submit time
-- Add retry logic for PMI communications, may be needed for highly parallel
jobs.
* Changes in SLURM 1.1.1
========================
-- Fix bug in packing job suspend/resume RPC.
-- If a user breaks out of srun before the allocation takes place, mark the
job as CANCELLED rather than COMPLETED and change its start and end time
to that time.
-- Fix bug in PMI support that prevented use of second PMI_Barrier call.
This fix is needed for MVAPICH2 use.
-- Add "-V" options to slurmctld and slurmd to print version number and exit.
-- Fix scalability bug in sbcast.
-- Fix bug in cons_res allocation strategy.
-- Fix bug in forwarding with mpi
-- Fix bug sacct forwarding with stat option
-- Added nodeid to sacct stat information
-- cleaned up way slurm_send_recv_node_msg works no more clearing errno
-- Fix error handling bug in the networking code that causes the slurmd to
xassert if the server is not running when the slurmd tries to register.
* Changes in SLURM 1.1.0
========================
-- Fix bug that could temporarily make nodes DOWN when they are really
responding.
-- Fix bug preventing backup slurmctld from responding to PING RPCs.
-- Set "CFLAGS=-DISO8601" before configuration to get ISO8601 format
times for all SLURM commands. NOTE: This may break Moab, Maui, and/or
LSF schedulers.
-- Fix for srun -n and -O options when paired with -b.

Danny Auble
committed
-- Added logic for fanout to failover to forward list if main node is
unreachable
-- sacct also now keeps track of submitted, started and ending times of jobs
-- reinit config file mutex at beginning of slurmstepd to avoid fork issues

Danny Auble
committed
* Changes in SLURM 1.1.0-pre8
=============================
-- Fix bug in enforcement of partition's MaxNodes limit.

Danny Auble
committed
-- BLUEGENE - added support for srun -w option also fixed the geometry option
for srun.

Danny Auble
committed
* Changes in SLURM 1.1.0-pre7
=============================
-- Accounting works for aix systems, use jobacct/aix
-- Support large (over 2GB) files on 32-bit linux systems

Danny Auble
committed
-- changed all writes to safe_write in srun
-- added $float to globals.example in the testsuite
-- Set job's num_proc correctly for jobs that do not have exclusive use
of it's allocated nodes.
-- Change in support for test suite: 'testsuite/expect/globals.example'
is now 'testsuite/expect/globals' and you can override variable
settings with a new file 'testsuite/expect/globals.local'.
-- Job suspend now sends SIGTSTP, sleep(1), sends SIGSTOP for better
MPI support.
-- Bluegene - before assigning a job to a block the plugin will check the bps
to make sure they aren't in error state.
-- Change time format in job completion logging (JobCompType=jobcomp/filetxt)
from "MM/DD HH:MM:SS" to "YYYY-MM-DDTHH:MM:SS", conforming with the ISO8601
standard format.
* Changes in SLURM 1.1.0-pre6
=============================
-- Added logic to "stat" a running job with sacct option -S use -j to specify
job.step

Danny Auble
committed
-- removed jobacct/bluegene (no real need for this) meaning, I don't think
there is a way to gather the data yet.
-- Added support for mapping "%h" in configured SlurmdLog to the hostname.
-- Add PropagatePrioProcess to control propagation of a user's nice value
to spawned tasks (based upon work by Daniel Christians, HP).

Danny Auble
committed
* Changes in SLURM 1.1.0-pre5
=============================
-- Added step completion RPC logic
-- Vastly changed sacct and the jobacct plugin. Read documentation for full
details.
-- Added jobacct plugin for AIX and BlueGene, they currently don't work,
-- Add support for srun option --ctrl-comm-ifhn to set PMI communications
address (Hongjia Cao, National University of Defense Technology).
-- Moved safe_read/write to slurm_protocol_defs.h removing multiple copies.
-- Remove vestigial functions slurm_allocate_resources_and_run() and
slurm_free_resource_allocation_and_run_response_msg().
-- Added support for different executable files and arguments by task based
upon a configuration file. See srun's --multi-prog option (based upon
work by Hongjia Cao, National University of Defense Technology).
-- moved the way forward logic waited for fanout logic mostly eliminating
problems with scalability issues.
-- changed -l option in sacct to display different params see sacct/sacct.h
for details.

Danny Auble
committed
* Changes in SLURM 1.1.0-pre4
=============================
-- Bluegene specific - Added support to set bluegene block state to
free/error via scontrol update BlockName
-- Add needed symbol to select/bluegene in order to load plugin.

Danny Auble
committed
* Changes in SLURM 1.1.0-pre3
=============================
-- Added framework for XCPU job launch support.

Christopher J. Morrone
committed
-- New general configuration file parser and slurm.conf handling code.
Allows long lines to be continued on the next line by ending with a "\".
Whitespace is allowed between the key and "=", and between the "=" and
value.
WARNING: A NodeName may now occur only once in a slurm.conf file.
If you want to temporarily make nodes DOWN in the slurm.conf,
use the new DownNodes keyword (see "man slurm.conf").
-- Gracefully handle request to submit batch job from within an existing
batch job.

Moe Jette
committed
-- Warn user attempting to create a job allocation from within an existing job
allocation.
-- Add web page description for proctrack plugin.
-- Add new function slurm_get_rem_time() for job's time limit.
-- JobAcct plugin renamed from "log" to "linux" in preparation for support of
new system types.
WARNING: "JobAcctType=jobacct/log" is no longer supported.
-- Removed vestigal 'bg' names from bluegene plugin and smap
-- InactiveLimit parameter is not enforced for RootOnly partitions.
-- Update select/cons_res web page (Susanne Balle, HP,
cons_res_doc_patch_3_29_06).
-- Build a "slurmd.test" along with slurmd. slurmd.test has the path to
slurmstepd set allowing it to run unmodified out of the builddir for
testing (Mark Grondona).
* Changes in SLURM 1.1.0-pre2
=============================
-- Added "bcast" command to transmit copies of a file to compute nodes
with message fanout.
-- Bluegene specific - Added support for overlapping partitions and
dynamic partitioning.
-- Bluegene specific - Added support for nodecard sized blocks.
-- Added logic to accept 1k for 1024 and so on for --nodes option of srun.
This logic is through display tools such as smap, sinfo, scontrol, and
squeue.
-- Added bluegene.conf man page.
-- Added support for memory affinity, see srun --mem_bind option.
* Changes in SLURM 1.1.0-pre1
=============================
-- New --enable-multiple-slurmd configure parameter to allow running
more than one copy of slurmd on a node at the same time. Only
really useful for developers.
-- New communication is now branched on all processes to slurmd's from
slurmctld and srun launch command. This is done with a tree type
algorithm. Spawn and batch mode work the same as before. New slurm.conf
variable TreeWidth=50 is default. This is the number of threads per
-- Configuration parameter HeartBeatInterval is depracated. Now used half
of SlurmdTimeout and SlurmctldTimeout for communications to slurmd and
slurmctld daemons repsectively.
-- Add hash tables for select/cons_res plugin (Susanne Balle, HP,
patch_02222006).
-- Remove some use of cr_enabled flag in slurmctld job record, use
new flag "test_only" in select_g_job_test() instead.
* Changes in SLURM 1.0.13
=========================
-- Fix for AllowGroups option to work when the /etc/group file doesn't
contain all users in group by adding the uids of the names in /etc/passwd
that have a gid of that which we are looking for.
-- Fix bug in InactiveLimit support that can potentially purge active jobs.
NOTE: This is highly unlikely except on very large AIX clusters.
* Changes in SLURM 1.0.12
=========================
-- Report node state of DRAIN rather than DOWN if DOWN with DRAIN flag set.
-- Initialize job->mail_type to 0 (NONE) for job submission.
-- Fix for stalled task stdout/stderr when buffered I/O is used, and
a single line exceeds 4096 bytes.
-- Memory leak fixes for maui plugin (hjcao@nudt.edu.cn)
Loading
Loading full blame...