Newer
Older
This file describes changes in recent versions of SLURM. It primarily
documents those changes that are of interest to users and admins.
* Changes in SLURM 0.5.0-pre18
==============================
-- elan switch plugin memory leak plugged
-- added g_slurmctld_jobacct_fini() to release all memory (useful
to confirm no memory leaks)
* Changes in SLURM 0.5.0-pre17
==============================
-- slurmd calls the proctrack destroy function at job step completion
-- federation driver tries harder to clean up switch windows

Christopher J. Morrone
committed
* Changes in SLURM 0.5.0-pre16
==============================
-- Check slurm.conf values for under/overflows (some are 16 bit values).

Christopher J. Morrone
committed
-- Federation driver clears windows at job step completion
-- Modify code for clean build with gcc v4.0

Christopher J. Morrone
committed
-- New SLURM_NETWORK environmant variable used by slurm_ll_api

Christopher J. Morrone
committed
* Changes in SLURM 0.5.0-pre15
==============================
-- Added "network" field to "scontrol show job" output.
-- Federation fix for unfreed windows when multiple adapters on
one node use the same LID
* Changes in SLURM 0.5.0-pre14
==============================
-- RDMA works on fed plugin.
* Changes in SLURM 0.5.0-pre13
==============================
-- Major mods to support checkpoint on AIX.
-- Job accounting documenation expanded, added tuning options, minor bug fixes
-- BGL wiring will now work on <= 4 node X-dim partitions and also 8 node
X-dim partitions.
-- ENV variables set for spawning jobs.
-- jobacct patch from HP to not erroneously lock a mutex in the
jobacct_log plugin.
-- switch/federation supports multiple adapters per task. sn_all behaviour
is now correct, and it also supports sn_single.
* Changes in SLURM 0.5.0-pre12
==============================
-- Minor build changes to support RPM creation on AIX
* Changes in SLURM 0.5.0-pre11
==============================
-- Slurmd tests for initialized session manager (user's) slurmd pid before
killing it to avoid killing system daemon (race condition).
-- srun --output or --error file names of "none" mapped to /dev/null for
batch jobs rather than a file actually named "none".
-- BGL: don't try to read bglblock state until they are all created to
avoid having BGL Bridge API seg fault.
* Changes in SLURM 0.5.0-pre10
==============================
-- Fix bug that was resetting BGL job geometry on unrelated field update.
-- squeue and sinfo print timestamp in interate mode by default.
-- added scrolling windows in smap
-- introduced new variable to start polling thread in the bluegene plugin.
-- Latest accounting patches from Riebs/HP, retry communications.
-- Added srun option --kill-on-bad-exit from Holmes/HP.
-- Support large (64-bit address) log files where possible.
-- Fix problem of signals being delivered twice to tasks. Note that as
part of the fix the slurmd session manger no longer calls setsid to
create a new session.
* Changes in SLURM 0.5.0-pre9
=============================
-- If a job and node are in COMPLETING state and slurmd stops responding for
SlurmdTimeout, then set the node DOWN and the job COMPLETED.
-- Add logic to switch/elan to track contexts allocated to active job steps
rather than just using a cyclic counter and hoping to avoid collisions.
-- Plug memory leak in freeing job info retrieved using API.
-- Bluegene Plugin handles long deallocating states from driver 202.
-- Fix bug in bitfmt2int() which can go off allocated memory.
* Changes in SLURM 0.5.0-pre8
=============================
-- BlueGene srun --geometry was not getting propogated properly.
-- Fix race condition with multiple simultaneous epilogs.
-- Modify slurmd to resend job completion RPC to slurmctld in the
case where slurmctld is not responding.
-- Updated sacct: handle cancelled jobs correctly, add user/group
output, add ntasks ans synonym for nprocs, display error field
by default, display ncpus instead of nprocs
-- Parallelization of queing jobs up to 32 at once. Variable
MAX_AGENT_COUNT used in bgl_job_run.c to specify.
* Changes in SLURM 0.5.0-pre7
=============================
-- Preserve next_job_id across restarts.
-- Add support for really long job names (256 bytes).
-- Add configuration parameter SchedulerRootFilter to control what
entity manages prioritization of jobs in RootOnly partition
(internal scheduler plugin or external entity).
-- Added support for job accounting.
-- Added support for consumable resource based node scheduling.
-- Permit batch job to be launched to re-existing allocation.
* Changes in SLURM 0.5.0-pre6
=============================
-- Load bluegene.conf and federation.conf based upon SLURM_CONF env
var (if set).
-- Fix slurmd shutdown signal synchronization bug (not consistently
terminating).
-- Add doc/html/ibm.html document. Update bluegene.html.
-- Remove geometry[SYSTEM_DIMENSIONS] from opaque node_select data
type if SYSTEM_DIMENSIONS==0 (not ASCI-C compliant).
-- Modify smap to test for valid libdb2.so before issuing any BGL
Bridge API calls.
-- Modify spec file for optional inclusion of select_bluegene and
sched_wiki plugin libraries.
-- Initialize job->network in data structure, could cause job
submit/update to fail depending upon what is left on stack.
* Changes in SLURM 0.5.0-pre5
=============================
-- Expand buffer to hold node_select info in job termination log.
-- Modify slurmctld node hashing function to reduce collisions.
-- Treat bglblock vanishing as fatal error for job, prolog and epilog
exit immediately.
* Changes in SLURM 0.5.0-pre4
=============================
-- Fix bug in slurmd that could double KillWait time on job timeout.
-- Fix bug in srun's error code reporting to slurmctld, could DOWN
a node if job run as root has non-zero error code.
-- Remove a node's partition info when removed from existing partition.
-- Use proctrack plugin to call all processes in a job step before
calling interconnect_postfini() to insure no processes escape from
job and prevent switch windows from being released.
-- Added mail.html web page telling how to get on slurm mailing lists.
-- Added another directory to search for DB2 files on BGL system.
-- Added overview man page slurm.1.
-- Added new configure option "--with-db2-dir=PATH" for BGL.
* Changes in SLURM 0.5.0-pre3
=============================
-- Merge of SLURM v0.4-branch into v0.5/HEAD.
* Changes in SLURM 0.5.0-pre2
=============================
-- Fix bug in srun to clean-up upon failure of an allocated node
(srun -A would generate a segmentation fault, Chris Holmes, HP).
-- If slurmd's node name is mapped to NULL (due to bad configuration)
terminate slurmd with a fatal error and don't crash slurmctld.
-- Add SLURMD_DEBUG env var for use with AIX/POE in spawn_task RPC.
-- Always pack job's "features" for access by prolog/epilog
* Changes in SLURM 0.5.0-pre1
=============================
-- Add network option to srun and job creation API for specification
of communication protocol over IBM Federation switch.
-- Add new slurm.conf parameter ProctrackType (process tracking) and
associated plugin in the slurmd module.
-- Send node's switch state with job epilog completion RPC and
node registration (only when slurmd starts, not on periodic
registrtions).
-- Add federation switch plugin.
-- Add new configuration keyword, SchedulerRootFilter, to control
external scheduler control of RoolOnly partition (Chris Holmes, HP).
-- Modify logic to set process group ID for spawned processes (last
patch from slurm v0.3.11).

Moe Jette
committed
-- "srun -A" modified to return exit code of last command executed
(Chris Holmes, HP).
-- Add support for different slurm.conf files controlled via SLURM_CONF
env var (Brian O'Sullivan, pathscale)
-- Fix bug if srun given --uid without --gid option (Chris Holmes, HP).
* Changes in SLURM 0.4.24
=========================
-- DRAIN nodes with switches on base partitions are in ERROR, MISSING,
or DOWN states.
* Changes in SLURM 0.4.23
=========================
-- Modified bluegene plugin to only sync bglblocks to jobs on initial
startup, not on reconfig. Fixes race condition.
-- Modified bluegene plugin to work with 141 driver. Enabling it to
only have to reboot when switching from coproc -> virtual and back.
-- added support for a full system partition to make sure every other
partition is free and vice-verse.
-- smap resizing issue fixed.
-- change prolog not to add time when a partition is in deallocating
state.
-- NOTE: This version of SLURM requires BGL driver 141/2005.
* Changes in SLURM 0.4.22
=========================
-- Modified bluegene plugin to not do anything if the bluegene.conf file
is altered.
-- added checking for lists before trying to create iterator on the list.
Loading
Loading full blame...