Newer
Older
This file describes changes in recent versions of SLURM. It primarily
documents those changes that are of interest to users and admins.
* Changes in SLURM 1.2.0-pre10
==============================
-- Fix for sinfo node state counts by state (%A and %F output options).
-- Add ability to change a node's features via "scontrol update". NOTE:
Update slurm.conf also to preserve changes over slurmctld restart or
reconfig.
-- Added new slurm.conf parameter TaskPluginParam.
-- Fix for job requeue and credential revoke logic from Hongjia Cao (NUDT).
* Changes in SLURM 1.2.0-pre9
=============================
-- Fix for select/cons_res state preservation over slurmctld restart,
patch.1.2.0-pre7.061130.cr_state from Dan Palermo.
-- Validate product of socket*core*thread count on node registration rather
than individual values. Correct values will need to be specified in slurm.conf
with FastSchedule=1 for correct multi-core scheduling behavior.
* Changes in SLURM 1.2.0-pre8
=============================
-- Modity job state "reason" field to report why a job failed (previously
previously reported only reason waiting to run). Requires cold-start of
slurmctld (-c option).
-- For sched/wiki2 job state request, return REJMESSAGE= with reason for
a job's failure.
-- New FastSchedule configuration parameter option "2" means to base
scheduling decisions upon the node's configuration as specified in
slurm.conf and ignore the node's actual hardware configuration. This
can be useful for testing.
-- Add sinfo output format option "%C" for CPUs (active/idle/other/total).
Based upon work by Anne-Marie Wunderlin (BULL).
-- Assorted multi-core bug fixes (patch1.2.0-pre7.061128.mcorefixes).
-- Report SelectTypeParameters from "scontrol show config".
-- Build sched/wiki plugin for Maui Scheduler (based upon new sched/wiki2
code for Moab Scheduler).

Danny Auble
committed
-- BLUEGENE - changed way of keeping track of smaller partitions using
ionode range instead of quarter nodecard notation.
(i.e. bgl000[0-4] instead of bgl000.0.0)
-- Patch from Hongjia Cao (EINPROGRESS error message change)
-- Fix for correct requid for jobacct plugin
-- Added subsec timing display for sacct
* Changes in SLURM 1.2.0-pre7
=============================
-- BLUEGENE - added configurable images for bluegene block creation.
-- Support processors, core, and physical IDs that are not in numeric
order (in slurmd to gathering node state information, based on patch
by Don Albert, Bull).
-- Fixed bug with aix not looking in the correct dir for the proctrack
include files
-- Removed global_srun.* from common merged it into srun proper
-- Added bluegene section to troubleshooting guide (web page).
-- NOTE: Requires cold-start when moving from 1.2.0-pre6, save state
info for jobs changed.
-- BLUEGENE - Changed logic for wiring bgl blocks to be more maintainable.
(Haven't tested on large system yet, works on 2 base partition system)
-- Do not read the select/cons_res state save file if slurmctld is
cold-started (with the "-c" option).
* Changes in SLURM 1.2.0-pre6
=============================
-- Maintain actually job step run time with suspend/resume use.
-- Allow slurm.conf options to appear multiple times. SLURM will use the
last instance of any particular option.
-- Add version number to node state save file. Will not recover node
state information on restart from older version.
-- Add logic to save/restore multi-core state information.
-- Updated multi-core logic to use types uint16_t and uint32_t instead
of just type int.
-- Add support for Portable Linux Processor Affinity (PLPA, see
http://www.open-mpi.org/software/plpa).
-- When a job epilog completes on all non-DOWN nodes, immediately purge
it's job steps that lack switch windows. Needed for LSF operation.
Based upon slurm.hp.node_fail.patch.
-- Modify srun to ignore entries on --nodelist for job step creation
if their count exceeds the task count. Based on slurm.hp.srun.patch.
* Changes in SLURM 1.2.0-pre5
=============================
-- Patch from HP patch.1.2.0.pre4.061017.crcore_hints, supports cores as
consumable resource.
-- Added node_inx to job_step_info_t to get the node indecies for mapping out

Danny Auble
committed
steps in a job by nodes.
-- sview grid added
-- BLUEGENE node_inx added to blocks for reference.
-- Automatic CPU_MASK generation for task launch, new srun option -B.
-- Automatic logical to physical processor identification and mapping.
-- Added new srun options to --cpu_bind: sockets, cores, and threads
-- Updated select/cons_res to operate as socket granularity.
-- New srun task distribution options to -m: plane
-- Multi-core support in sinfo, squeue, and scontrol.
-- Memory can be treated as a consumable resource.
-- New srun options --ntasks-per-[node|socket|core].
* Changes in SLURM 1.2.0-pre3
=============================
-- Remove configuration parameter ShedulerAuth (defunct).
-- Add NextJobId to "scontrol show config" output.
-- Add new slurm.conf parameter MailProg.
-- New forwarding logic. New recieve_msg functions depending on what you
are expecting to get back. No srun_node_id anymore passed around in
a slurm_msg_t
-- Remove sched/wiki plugin (use sched/wiki2 for now)
-- Disable pthread_create() for PMI_send when TotalView is running for
better performance.
-- Fixed certain tests in test suite to not run with bluegene or front-end

Danny Auble
committed
systems
-- Removed addresses from slurm_step_layout_t
-- Added new job field, "comment". Set by srun, salloc and sbatch. See
with "scontrol show job". Used in sched/wiki2.
-- Report a job's exit status in "scontrol show job".
-- In sched/wiki2: add support for JOBREQUEUE command.
* Changes in SLURM 1.2.0-pre2
=============================
-- Added function slurm_init_slurm_msg to be used to init any slurm_msg_t
you no longer need do any other type of initialization to the type.
* Changes in SLURM 1.2.0-pre2
=============================
-- Fixed task dist to work with hostfile and warn about asking for more tasks
than you have nodes for in arbitray mode.
-- Added "account" field to job and step accounting information and sacct output.

Danny Auble
committed
-- Moved task layout to slurmctld instead of srun. Job step create returns
step_layout structure with hostnames and addresses that corrisponds
to those nodes.
-- Changed api slurm_lookup_allocation params,
resource_allocation_response_msg_t changed to job_alloc_info_response_msg_t
this structure is being renamed so contents are the same.
-- alter resource_allocation_response_msg_t see slurm.h.in
-- remove old_job_alloc_msg_t and function slurm_confirm_alloc
-- Slurm configuration files now support an "Include" directive to
include other files inline.

Danny Auble
committed
-- BLUEGENE New --enable-bluegene-emulation configure parameter to allow
running system in bluegene emulation mode. Only
really useful for developers.
-- New added new tool sview GUI for displaying slurm info.
-- fixed bug in step layout to lay out tasks correctly
* Changes in SLURM 1.2.0-pre1
=============================
-- Fix bug that could run a job's prolog more than once
-- Permit batch jobs to be requeued, scontrol requeue <jobid>
-- Send overcommit flag from srun in RPCs and have slurmd set SLURM_OVERCOMMIT
flag at batch job launch time.
-- Added new configuration parameter MessageTimeout (replaces #define in
the code)

Christopher J. Morrone
committed
* Changes in SLURM 1.1.22
=========================
* Changes in SLURM 1.1.21
=========================
- BLUEGENE - Wait on a fini to make sure all threads are finished before
cleaning up.
- BLUEGENE - replacements to not destroy lists but just empty it to avoid
losing the pointer to the list in the block allocator.
- BLUEGENE - added --enable-bluegene-emulation configure option to 1.1
- In sched/wiki2, enclose a job's COMMENT value in double quotes.
- In sched/wiki2, support newly defined SIGNALJOB command.
- In sched/wiki2, maintain open event socket, don't open and close
for each event.
- In sched/wiki2, fix for scalability problem starting large jobs.
- Fix logic to execute a batch job step (under an existing resource
allocation) as needed by LSF.
- Patches from Hongjia Cao (pmi finialize issues and type declaration)
- Delete pending job if it's associated partition is deleted.
- fix for handling batch steps completing correctly and setting the
return code.
- Altered ncurses check to make sure programs can link before saying we
have a working curses lib and header.
- Fixed an init issue with forward_struct_init not being set correctly in
a few locations in the slurmd.
- Fix for user to use the NodeHostname (when specified in the slurm.conf file)
to start jobs on.

Christopher J. Morrone
committed
* Changes in SLURM 1.1.20
=========================
- Added new SPANK plugin hook slurm_spank_local_user_init() called
from srun after node allocation.
- Fixed bug with hostfile support not working on a direct srun
* Changes in SLURM 1.1.19
=========================
- BLUEGENE - make sure the order of blocks read in from the bluegene.conf
are created in that order (static mode).
- Fix logic in connect(), slurmctld fail-over was broken in v1.1.18.
- Fix logic to calculate the correct timeout for fan out.
* Changes in SLURM 1.1.18
=========================
- In sched/wiki2, add support for EHost and EHostBackup configuration
Loading
Loading full blame...