Newer
Older
This file describes changes in recent versions of SLURM. It primarily
documents those changes that are of interest to users and admins.
* Changes in SLURM 1.3.0-pre1
=============================
-- Add new sinfo field to sort by "%E" sorts by the time associated with a
node's state (from Prashanth Tamraparni, HP).
-- In sched/wiki: fix logic for restarting backup slurmctld.
-- Preload SLURM plugins early in the slurmstepd operation to avoid
multiple dlopens after forking (and to avoid a glibc bug
that leaves dlopen locks in a bad state after a fork).
-- Added MPICH1_P4 patch to launch tasks using srun rather than rsh and
automatically generate mpirun's machinefile based upon the job's
allocation. See "etc/mpich1.slurm.patch".
-- BLUEGENE - fix for overlap mode to mark all other base partitions as used
when creating a new block from the file to insure we only use the base
partitions we are asking for.
-- Fix in proctrack/sgi_job plugin that could cause slurmstepd to seg_fault
preventing timely clean-up of batch jobs in some cases.
* Changes in SLURM 1.2.7
========================
-- BLUEGENE - code to make it so you can make a 36x36x36 system.

Danny Auble
committed
The wiring should be correct for a system with x-dim of 1,2,4,5,8,13
in emulation mode. It will work with any real system no matter the size.
-- Major re-write of jobcomp/script plugin: fix memory leak and
general code clean-up.
-- Add ability to change MaxNodes and ExcNodeList for pending job
using scontrol.
-- Purge zombie processes spawned via event triggers.
-- Add support for power saving mode (experimental code to reduce voltage
and frequency on nodes that stay in the IDLE state, for more information
see http://www.llnl.gov/linux/slurm/power_save.html). None of this
code is enabled by default.

Moe Jette
committed
* Changes in SLURM 1.2.6
========================
-- Fix MPIRUN_PORT env variable in mvapich plugin
-- Disable setting triggers by other than user SlurmUser unless SlurmUser
is root for improved security.

Moe Jette
committed
* Changes in SLURM 1.2.5
========================
-- Fix nodelist truncation in "scontrol show jobs" output
-- In mpi/mpichgm, fix potential problem formatting GMPI_PORT, from
Ernest Artiaga, BSC.
-- In sched/wiki2 - Report job's account, from Ernest Artiaga, BSC.
-- Add sbatch option "--ntasks-per-node".
* Changes in SLURM 1.2.4
========================
-- In select/cons_res - fix for function argument type mis-match in getting
CPU count for a job, from Ernest Artiaga, BSC.
-- In sched/wiki2 - Report job's tasks_per_node requirement.
-- In forward logic fix to check if the forwarding node recieves a connection
but doesn't ever get the message from the sender (network issue or
something) also check to make sure if we get something back we make sure
we account for everything we sent out before we call it good.
-- Another fix to make sure steps with requested nodes have correct cpus
accounted for and a fix to make sure the user can't allocate more
cpus than the have requested.
* Changes in SLURM 1.2.3
========================
-- Cpuset logic added to task/affinity, from Don Albert (Bull) and
Moe Jette (LLNL). The /dev/cpuset file system must be mounted and
set "TaskPluginParam=cpusets" in slurm.conf to enable.
-- In sched/wiki2, fix possible overflow in job's nodelist, from
Ernest Artiaga, BSC.
-- Defer creation of new job steps until a suspended job is resumed.
-- In select/linear - fix for potential stack corruption bug.
* Changes in SLURM 1.2.2
========================
-- Added new command "strigger" for event trigger management, a new
capability. See "man strigger" for details.
-- srun --get-user-env now sends su's stderr to /dev/null
-- Fix in node_scheduling logic with multiple node_sets, from
Ernest Artiaga, BSC.
-- In select/cons_res, fix for function argument type mis-match in getting
CPU count for a job.
-- MPICHGM support bug fixes from Ernest Artiaga, BSC.
-- Support longer hostlist strings, from Ernest Artiaga, BSC.
-- Srun to use env vars for SLURM_PROLOG, SLURM_EPILOG, SLURM_TASK_PROLOG,
and SLURM_TASK_EPILOG. patch.1.2.0-pre11.070201.envproepilog from
Dan Palermo, HP.
-- Documenation update. patch.1.2.0-pre11.070201.mchtml from Dan Palermo, HP.
-- Set SLURM_DIST_CYCLIC = 1 (needed for HP MPI, slurm.hp.env.patch).
* Changes in SLURM 1.2.0-pre15
==============================
-- Fix for another spot where the backup controller calls switch/federation
code before switch/federation is initialized.
* Changes in SLURM 1.2.0-pre14
==============================
-- In sched/wiki2, clear required nodes list when a job is requeued.
Note that the required node list is set to every node used when
a job is started via sched/wiki2.
-- BLUEGENE - Added display of deallocating blocks to smap and other tools.
-- Make slurmctld's working directory be same as SlurmctldLogFile (if any),
otherwise StateSaveDir (which is likely a shared directory, possibly
making core file identification more difficult).
-- Fix bug in switch/federation that results in the backup controller
aborting if it receives an epilog-complete message.
* Changes in SLURM 1.2.0-pre13
==============================
-- Fix for --get-user-env.
* Changes in SLURM 1.2.0-pre12
==============================

Danny Auble
committed
-- BLUEGENE - Added correct node info for sinfo and sview for viewing
allocated nodes in a partition.
-- BLUEGENE - Added state save on slurmctld shutdown of blocks in an error
state on real systems and total block config on emulation systems.
-- Major update to Slurm's PMI internal logic for better scalability.
Communications now supported directly between application tasks via
Slurm's PMI library. Srun sends single message to one task on each node
and that tasks forwards key-pairs to other tasks on that nodes. The old
code sent key-pairs directly to each task.
NOTE: PMI applications must re-link with this new library.
-- For multi-core support: Fix task distribution bug and add automated
tests, patch.1.2.0-pre11.070111.plane from Dan Palermo (HP).
* Changes in SLURM 1.2.0-pre11
==============================
-- Add multi-core options to slurm_step_launch API.
-- Add man pages for slurm_step_launch() and related functions.

Danny Auble
committed
-- Jobacct plugin only looks at the proctrack list instead of the entire
list of processes running on the node. Cutting down a lot of unnecessary
file opens in linux and cutting down the time to query the procs by
more than half.
-- Multi-core bug fix, mask re-use with multiple job steps,
patch.1.2.0-pre10.061214.affinity_stepid from Dan Palermo (HP).
-- Modify jobacct/linux plugin to completely eliminate open /proc files.
-- Added slurm_sched_plugin_reconfig() function to re-read config files.
-- BLUEGENE - --reboot option to srun, salloc, and sbatch actually works.
-- Modified step context and step launch APIs.
* Changes in SLURM 1.2.0-pre10
==============================
-- Fix for sinfo node state counts by state (%A and %F output options).
-- Add ability to change a node's features via "scontrol update". NOTE:
Update slurm.conf also to preserve changes over slurmctld restart or
reconfig.
NOTE: Job and node state information can not be preserved from earlier
versions.
-- Added new slurm.conf parameter TaskPluginParam.
-- Fix for job requeue and credential revoke logic from Hongjia Cao (NUDT).
-- Fix for incorrectly generated masks for task/affinity plugin,
patch.1.2.0-pre9.061207.bitfmthex from Dan Palermo (HP).
-- Make mask_cpu options of srun and slaunch commands not requeue prefix
of "0x". patch.1.2.0-pre9.061208.srun_maskparse from Dan Palermo (HP).
-- Add -c support to the -B automatic mask generation for multi-core
support, patch.1.2.0-pre9.061208.mcore_cpuspertask from Dan Palermo (HP).
-- Fix bug in MASK_CPU calculation,
patch.1.2.0-pre9.061211.avail_cpuspertask from Dan Palermo (HP).
-- BLUEGENE - Added --reboot option to srun, salloc, and sbatch commands.
-- Add "scontrol listpids [JOBID[.STEPID]]" support.
-- Multi-core support patches, fixed SEGV and clean up output for large
task counts, patch.1.2.0-pre9.061212.cpubind_verbose from Dan Palermo (HP).
-- Make sure jobacct plugin files are closed before exec of user tasks to
prevent problems with job checkpoint/restart (based on work by
Hongjia Cao, NUDT).
* Changes in SLURM 1.2.0-pre9
=============================
-- Fix for select/cons_res state preservation over slurmctld restart,
patch.1.2.0-pre7.061130.cr_state from Dan Palermo.
-- Validate product of socket*core*thread count on node registration rather
than individual values. Correct values will need to be specified in slurm.conf
with FastSchedule=1 for correct multi-core scheduling behavior.
* Changes in SLURM 1.2.0-pre8
=============================
-- Modity job state "reason" field to report why a job failed (previously
previously reported only reason waiting to run). Requires cold-start of
slurmctld (-c option).
-- For sched/wiki2 job state request, return REJMESSAGE= with reason for
a job's failure.
Loading
Loading full blame...