Newer
Older
This file describes changes in recent versions of SLURM. It primarily
documents those changes that are of interest to users and admins.
* Changes in SLURM 1.2.0-pre7
=============================
-- BLUEGENE - added configurable images for bluegene block creation.
(No documentation out side of srun and sbatch just yet,
no sercurity either)
* Changes in SLURM 1.2.0-pre6
=============================
-- Maintain actually job step run time with suspend/resume use.
-- Allow slurm.conf options to appear multiple times. SLURM will use the
last instance of any particular option.
-- Add version number to node state save file. Will not recover node
state information on restart from older version.
-- Add logic to save/restore multi-core state information.
-- Updated multi-core logic to use types uint16_t and uint32_t instead
of just type int.
-- Add support for Portable Linux Processor Affinity (PLPA, see
http://www.open-mpi.org/software/plpa).
-- When a job epilog completes on all non-DOWN nodes, immediately purge
it's job steps that lack switch windows. Needed for LSF operation.
Based upon slurm.hp.node_fail.patch.
-- Modify srun to ignore entries on --nodelist for job step creation
if their count exceeds the task count. Based on slurm.hp.srun.patch.
* Changes in SLURM 1.2.0-pre5
=============================
-- Patch from HP patch.1.2.0.pre4.061017.crcore_hints, supports cores as
consumable resource.
-- Added node_inx to job_step_info_t to get the node indecies for mapping out

Danny Auble
committed
steps in a job by nodes.
-- sview grid added
-- BLUEGENE node_inx added to blocks for reference.
-- Automatic CPU_MASK generation for task launch, new srun option -B.
-- Automatic logical to physical processor identification and mapping.
-- Added new srun options to --cpu_bind: sockets, cores, and threads
-- Updated select/cons_res to operate as socket granularity.
-- New srun task distribution options to -m: plane
-- Multi-core support in sinfo, squeue, and scontrol.
-- Memory can be treated as a consumable resource.
-- New srun options --ntasks-per-[node|socket|core].
* Changes in SLURM 1.2.0-pre3
=============================
-- Remove configuration parameter ShedulerAuth (defunct).
-- Add NextJobId to "scontrol show config" output.
-- Add new slurm.conf parameter MailProg.
-- New forwarding logic. New recieve_msg functions depending on what you
are expecting to get back. No srun_node_id anymore passed around in
a slurm_msg_t
-- Remove sched/wiki plugin (use sched/wiki2 for now)
-- Disable pthread_create() for PMI_send when TotalView is running for
better performance.
-- Fixed certain tests in test suite to not run with bluegene or front-end

Danny Auble
committed
systems
-- Removed addresses from slurm_step_layout_t
-- Added new job field, "comment". Set by srun, salloc and sbatch. See
with "scontrol show job". Used in sched/wiki2.
-- Report a job's exit status in "scontrol show job".
-- In sched/wiki2: add support for JOBREQUEUE command.
* Changes in SLURM 1.2.0-pre2
=============================
-- Added function slurm_init_slurm_msg to be used to init any slurm_msg_t
you no longer need do any other type of initialization to the type.
* Changes in SLURM 1.2.0-pre2
=============================
-- Fixed task dist to work with hostfile and warn about asking for more tasks
than you have nodes for in arbitray mode.
-- Added "account" field to job and step accounting information and sacct output.

Danny Auble
committed
-- Moved task layout to slurmctld instead of srun. Job step create returns
step_layout structure with hostnames and addresses that corrisponds
to those nodes.
-- Changed api slurm_lookup_allocation params,
resource_allocation_response_msg_t changed to job_alloc_info_response_msg_t
this structure is being renamed so contents are the same.
-- alter resource_allocation_response_msg_t see slurm.h.in
-- remove old_job_alloc_msg_t and function slurm_confirm_alloc
-- Slurm configuration files now support an "Include" directive to
include other files inline.

Danny Auble
committed
-- BLUEGENE New --enable-bluegene-emulation configure parameter to allow
running system in bluegene emulation mode. Only
really useful for developers.
-- New added new tool sview GUI for displaying slurm info.
-- fixed bug in step layout to lay out tasks correctly
* Changes in SLURM 1.2.0-pre1
=============================
-- Fix bug that could run a job's prolog more than once
-- Permit batch jobs to be requeued, scontrol requeue <jobid>
-- Send overcommit flag from srun in RPCs and have slurmd set SLURM_OVERCOMMIT
flag at batch job launch time.
-- Added new configuration parameter MessageTimeout (replaces #define in
the code)
* Changes in SLURM 1.1.19
=========================
- BLUEGENE - make sure the order of blocks read in from the bluegene.conf
are created in that order (static mode).
* Changes in SLURM 1.1.18
=========================
- In sched/wiki2, add support for EHost and EHostBackup configuration
parameters in wiki.conf file
- In sched/wiki2, fix memory management bug for JOBWILLRUN command.
- In sched/wiki2, consider job Busy while in Completing state for
KillWait+10 seconds (used to be 30 seconds).
- BLUEGENE - Fixes to allow full block creation on the system and not to add
passthrough nodes to the allocation when creating a block.
- BLUEGENE - Fix deadlock issue with starting and failing jobs at the same
time
- Make connect() non-blocking and poll() with timeout to avoid huge
waits under some conditions.
- Set "ENVIRONMENT=BATCH" environment variable for "srun --batch" jobs only.
- Add logic to save/restore select/cons_res state information.

Christopher J. Morrone
committed
- BLUEGENE - make all sprintf's into snprintf's
- Fix timeout calculation to work correctly for fan out.

Christopher J. Morrone
committed
- Fix for "srun -A" segfault on a node failure.
* Changes in SLURM 1.1.17
=========================
- BLUEGENE - fix to make dynamic partitioning not go create block where
there are nodes that are down or draining.
- Fix srun's default node count with an existing allocation when neither
SLURM_NNODES nor -N are set.
- Stop srun from setting SLURM_DISTRIBUTION under job steps when a
specific was not explicitly requested by the user.
* Changes in SLURM 1.1.16
=========================
- BLUEGENE - fix to make prolog run 5 minutes longer to make sure we have
enough time to free the overlapping blocks when starting a new job on a
block.
- BLUEGENE - edit to the libsched_if.so to read env and look at
MPIRUN_PARTITION to see if we are in slurm or running mpirun natively.
- Plugins are now dlopened RTLD_LAZY instead of RTLD_NOW.
* Changes in SLURM 1.1.15
=========================
- BLUEGENE - fix to be able to create static partitions
- Fixed fanout timeout logic.
- Fix for slurmctld timeout on outgoing message (Hongjia Cao, NUDT.edu.cn).
* Changes in SLURM 1.1.14
=========================
- In sched/wiki2: report job/node id and state only if no changes since
time specified in request.
- In sched/wiki2: include a job's exit code in job state information.
- In sched/wiki2: add event notification logic on job submit and completion.
- In sched/wiki2: add support for JOBWILLRUN command type.
- In sched/wiki2: for job info, include required HOSTLIST if applicable.
- In sched/wiki2: for job info, replace PARTITIONMASK with RCLASS (report
partition name associated with a job, but no task count)
- In sched/wiki2: for job and node info, report all data if TS==0,
volitile data if TS<=update_time, state only if TS>update_time
- In sched/wiki2: add support for CMD=JOBSIGNAL ARG=jobid SIGNAL=name or #
- In sched/wiki2: add support for CMD=JOBMODIFY ARG=jobid [BANK=name]
[TIMELIMIT=minutes] [PARTITION=name]
- In sched/wiki2: add support for CMD=INITIALIZE ARG=[USEHOSTEXP=T|F]
[EPORT=#]; RESPONSE=EPORT=# USEHOSTEXP=T
- In sched/wiki2: fix memory leak.
- Fix sinfo node state filtering when asking for idle nodes that are also
draining.
- Add Fortran extension to slurm_get_rem_time() API.
- Fix bug when changing the time limit of a running job that has previously
been suspended (formerly failed to account for suspend time in setting
termination time).
- fix for step allocation to be able to specify only a few nodes in a
step and ask for more that specified.
- patch from Hongjia Cao for forwarding logic
- BLUEGENE - able to allocate specific nodes without locking up.
- BLUEGENE - better tracking of blocks that are created dynamically,
less hitting the db2.
* Changes in SLURM 1.1.13
=========================
- Fix hang in sched/wiki2 if Moab stops responding responding when
response is outgoing.
- BLUEGENE - fix to make sure the block is good to go when picking it
- BLUEGENE - add libsched_if.so so mpirun doesn't try to create a block
by itself.
- Enable specification of srun --jobid=# option with --batch (for user root).
- Verify that job actually starts when requested by sched/wiki2.
- Add new wiki.conf parameters: EPort and JobAggregationTime for event
notification logic (see wiki.conf man page for details)
* Changes in SLURM 1.1.12
=========================
- Sched/wiki2 to report a job's account as COMMENT response to GETJOBS
request.
- Add srun option "--comment" (maps to job account until slurm v1.2,
needed for Moab scheduler functionality).
Loading
Loading full blame...