Skip to content
Snippets Groups Projects
NEWS 48.8 KiB
Newer Older
This file describes changes in recent versions of SLURM. It primarily
documents those changes that are of interest to users and admins. 

* Changes in SLURM 0.5.0-pre18
==============================
 -- elan switch plugin memory leak plugged
 -- added g_slurmctld_jobacct_fini() to release all memory (useful 
    to confirm no memory leaks)

* Changes in SLURM 0.5.0-pre17
==============================
 -- slurmd calls the proctrack destroy function at job step completion
 -- federation driver tries harder to clean up switch windows
 -- BGL wiring changes
* Changes in SLURM 0.5.0-pre16
==============================
 -- Check slurm.conf values for under/overflows (some are 16 bit values).
 -- Federation driver clears windows at job step completion
 -- Modify code for clean build with gcc v4.0
 -- New SLURM_NETWORK environmant variable used by slurm_ll_api
* Changes in SLURM 0.5.0-pre15
==============================
 -- Added "network" field to "scontrol show job" output. 
 -- Federation fix for unfreed windows when multiple adapters on
    one node use the same LID
Danny Auble's avatar
Danny Auble committed
* Changes in SLURM 0.5.0-pre14
==============================
 -- RDMA works on fed plugin.

* Changes in SLURM 0.5.0-pre13
==============================
 -- Major mods to support checkpoint on AIX.
 -- Job accounting documenation expanded, added tuning options, minor bug fixes
Danny Auble's avatar
Danny Auble committed
 -- BGL wiring will now work on <= 4 node X-dim partitions and also 8 node 
    X-dim partitions.
 -- ENV variables set for spawning jobs. 
 -- jobacct patch from HP to not erroneously lock a mutex in the 
    jobacct_log plugin.
 -- switch/federation supports multiple adapters per task.  sn_all behaviour
    is now correct, and it also supports sn_single.
* Changes in SLURM 0.5.0-pre12
==============================
 -- Minor build changes to support RPM creation on AIX

Moe Jette's avatar
Moe Jette committed
* Changes in SLURM 0.5.0-pre11
==============================
 -- Slurmd tests for initialized session manager (user's) slurmd pid before 
    killing it to avoid killing system daemon (race condition).
 -- srun --output or --error file names of "none" mapped to /dev/null for 
    batch jobs rather than a file actually named "none".
Moe Jette's avatar
Moe Jette committed
 -- BGL: don't try to read bglblock state until they are all created to 
    avoid having BGL Bridge API seg fault.
Moe Jette's avatar
Moe Jette committed

* Changes in SLURM 0.5.0-pre10
==============================
 -- Fix bug that was resetting BGL job geometry on unrelated field update.
 -- squeue and sinfo print timestamp in interate mode by default.
 -- added scrolling windows in smap
 -- introduced new variable to start polling thread in the bluegene plugin.
 -- Latest accounting patches from Riebs/HP, retry communications.
 -- Added srun option --kill-on-bad-exit from Holmes/HP.
 -- Support large (64-bit address) log files where possible.
 -- Fix problem of signals being delivered twice to tasks.  Note that as
    part of the fix the slurmd session manger no longer calls setsid to
    create a new session.
* Changes in SLURM 0.5.0-pre9
=============================
 -- If a job and node are in COMPLETING state and slurmd stops responding for
    SlurmdTimeout, then set the node DOWN and the job COMPLETED.
 -- Add logic to switch/elan to track contexts allocated to active job steps 
    rather than just using a cyclic counter and hoping to avoid collisions. 
 -- Plug memory leak in freeing job info retrieved using API.
Danny Auble's avatar
Danny Auble committed
 -- Bluegene Plugin handles long deallocating states from driver 202.
 -- Fix bug in bitfmt2int() which can go off allocated memory.
* Changes in SLURM 0.5.0-pre8
=============================
 -- BlueGene srun --geometry was not getting propogated properly.
 -- Fix race condition with multiple simultaneous epilogs.
 -- Modify slurmd to resend job completion RPC to slurmctld in the 
    case where slurmctld is not responding.
Moe Jette's avatar
Moe Jette committed
 -- Updated sacct: handle cancelled jobs correctly, add user/group
    output, add ntasks ans synonym for nprocs, display error field 
    by default, display ncpus instead of nprocs
Danny Auble's avatar
Danny Auble committed
 -- Parallelization of queing jobs up to 32 at once.  Variable 
    MAX_AGENT_COUNT used in bgl_job_run.c to specify.
Danny Auble's avatar
Danny Auble committed
 -- bgl_job_run.c fixed threading issue with uid_to_string use.
* Changes in SLURM 0.5.0-pre7
=============================
 -- Preserve next_job_id across restarts.
 -- Add support for really long job names (256 bytes).
 -- Add configuration parameter SchedulerRootFilter to control what 
    entity manages prioritization of jobs in RootOnly partition 
    (internal scheduler plugin or external entity).
 -- Added support for job accounting.
 -- Added support for consumable resource based node scheduling.
 -- Permit batch job to be launched to re-existing allocation.

* Changes in SLURM 0.5.0-pre6
=============================
 -- Load bluegene.conf and federation.conf based upon SLURM_CONF env 
    var (if set).
 -- Fix slurmd shutdown signal synchronization bug (not consistently 
    terminating).
 -- Add doc/html/ibm.html document. Update bluegene.html.
Danny Auble's avatar
Danny Auble committed
 -- Add sfree to bluegene plugin. 
 -- Remove geometry[SYSTEM_DIMENSIONS] from opaque node_select data 
    type if SYSTEM_DIMENSIONS==0 (not ASCI-C compliant).
 -- Modify smap to test for valid libdb2.so before issuing any BGL 
    Bridge API calls.
 -- Modify spec file for optional inclusion of select_bluegene and 
    sched_wiki plugin libraries.
 -- Initialize job->network in data structure, could cause job 
    submit/update to fail depending upon what is left on stack.
* Changes in SLURM 0.5.0-pre5
=============================
 -- Expand buffer to hold node_select info in job termination log.
 -- Modify slurmctld node hashing function to reduce collisions.
 -- Treat bglblock vanishing as fatal error for job, prolog and epilog 
    exit immediately.
Danny Auble's avatar
Danny Auble committed
 -- bug fix for following multiple X-dim partitions
Moe Jette's avatar
Moe Jette committed
* Changes in SLURM 0.5.0-pre4
=============================
 -- Fix bug in slurmd that could double KillWait time on job timeout.
 -- Fix bug in srun's error code reporting to slurmctld, could DOWN 
    a node if job run as root has non-zero error code.
Moe Jette's avatar
Moe Jette committed
 -- Remove a node's partition info when removed from existing partition.
Moe Jette's avatar
Moe Jette committed
 -- Use proctrack plugin to call all processes in a job step before 
    calling interconnect_postfini() to insure no processes escape from 
    job and prevent switch windows from being released.
 -- Added mail.html web page telling how to get on slurm mailing lists.
Moe Jette's avatar
Moe Jette committed
 -- Added another directory to search for DB2 files on BGL system.
 -- Added overview man page slurm.1.
Moe Jette's avatar
Moe Jette committed
 -- Added new configure option "--with-db2-dir=PATH" for BGL.
Moe Jette's avatar
Moe Jette committed

* Changes in SLURM 0.5.0-pre3
=============================
 -- Merge of SLURM v0.4-branch into v0.5/HEAD.

* Changes in SLURM 0.5.0-pre2
=============================
 -- Fix bug in srun to clean-up upon failure of an allocated node
    (srun -A would generate a segmentation fault, Chris Holmes, HP).
 -- If slurmd's node name is mapped to NULL (due to bad configuration)
    terminate slurmd with a fatal error and don't crash slurmctld.
 -- Add SLURMD_DEBUG env var for use with AIX/POE in spawn_task RPC.
 -- Always pack job's "features" for access by prolog/epilog
* Changes in SLURM 0.5.0-pre1
=============================
 -- Add network option to srun and job creation API for specification 
    of communication protocol over IBM Federation switch.
 -- Add new slurm.conf parameter ProctrackType (process tracking) and 
    associated plugin in the slurmd module.
Moe Jette's avatar
Moe Jette committed
 -- Send node's switch state with job epilog completion RPC and 
    node registration (only when slurmd starts, not on periodic 
    registrtions).
 -- Add federation switch plugin.
 -- Add new configuration keyword, SchedulerRootFilter, to control 
    external scheduler control of RoolOnly partition (Chris Holmes, HP).
 -- Modify logic to set process group ID for spawned processes (last 
    patch from slurm v0.3.11).
 -- "srun -A" modified to return exit code of last command executed
    (Chris Holmes, HP).
 -- Add support for different slurm.conf files controlled via SLURM_CONF
    env var (Brian O'Sullivan, pathscale)
 -- Fix bug if srun given --uid without --gid option (Chris Holmes, HP).
Moe Jette's avatar
Moe Jette committed

* Changes in SLURM 0.4.24
=========================
 -- DRAIN nodes with switches on base partitions are in ERROR, MISSING, 
    or DOWN states.
 
* Changes in SLURM 0.4.23
========================= 
 -- Modified bluegene plugin to only sync bglblocks to jobs on initial 
    startup, not on reconfig. Fixes race condition.
 -- Modified bluegene plugin to work with 141 driver. Enabling it to 
    only have to reboot when switching from coproc -> virtual and back.
 -- added support for a full system partition to make sure every other 
    partition is free and vice-verse.
 -- smap resizing issue fixed.
 -- change prolog not to add time when a partition is in deallocating 
    state.
 -- NOTE: This version of SLURM requires BGL driver 141/2005.

* Changes in SLURM 0.4.22
=========================
 -- Modified bluegene plugin to not do anything if the bluegene.conf file 
    is altered.
 -- added checking for lists before trying to create iterator on the list.

Loading
Loading full blame...