Skip to content
Snippets Groups Projects
NEWS 46.2 KiB
Newer Older
This file describes changes in recent versions of SLURM. It primarily
documents those changes that are of interest to users and admins. 

* Changes in SLURM 0.5.0-pre10
==============================
 -- squeue and sinfo print timestamp in interate mode by default.
 -- Latest accounting patches from Riebs/HP, retry communications.
* Changes in SLURM 0.5.0-pre9
=============================
 -- If a job and node are in COMPLETING state and slurmd stops responding for
    SlurmdTimeout, then set the node DOWN and the job COMPLETED.
 -- Add logic to switch/elan to track contexts allocated to active job steps 
    rather than just using a cyclic counter and hoping to avoid collisions. 
 -- Plug memory leak in freeing job info retrieved using API.
Danny Auble's avatar
Danny Auble committed
 -- Bluegene Plugin handles long deallocating states from driver 202.
 -- Fix bug in bitfmt2int() which can go off allocated memory.
* Changes in SLURM 0.5.0-pre8
=============================
 -- BlueGene srun --geometry was not getting propogated properly.
 -- Fix race condition with multiple simultaneous epilogs.
 -- Modify slurmd to resend job completion RPC to slurmctld in the 
    case where slurmctld is not responding.
Moe Jette's avatar
Moe Jette committed
 -- Updated sacct: handle cancelled jobs correctly, add user/group
    output, add ntasks ans synonym for nprocs, display error field 
    by default, display ncpus instead of nprocs
Danny Auble's avatar
Danny Auble committed
 -- Parallelization of queing jobs up to 32 at once.  Variable 
    MAX_AGENT_COUNT used in bgl_job_run.c to specify.
Danny Auble's avatar
Danny Auble committed
 -- bgl_job_run.c fixed threading issue with uid_to_string use.
* Changes in SLURM 0.5.0-pre7
=============================
 -- Preserve next_job_id across restarts.
 -- Add support for really long job names (256 bytes).
 -- Add configuration parameter SchedulerRootFilter to control what 
    entity manages prioritization of jobs in RootOnly partition 
    (internal scheduler plugin or external entity).
 -- Added support for job accounting.
 -- Added support for consumable resource based node scheduling.
 -- Permit batch job to be launched to re-existing allocation.

* Changes in SLURM 0.5.0-pre6
=============================
 -- Load bluegene.conf and federation.conf based upon SLURM_CONF env 
    var (if set).
 -- Fix slurmd shutdown signal synchronization bug (not consistently 
    terminating).
 -- Add doc/html/ibm.html document. Update bluegene.html.
Danny Auble's avatar
Danny Auble committed
 -- Add sfree to bluegene plugin. 
 -- Remove geometry[SYSTEM_DIMENSIONS] from opaque node_select data 
    type if SYSTEM_DIMENSIONS==0 (not ASCI-C compliant).
 -- Modify smap to test for valid libdb2.so before issuing any BGL 
    Bridge API calls.
 -- Modify spec file for optional inclusion of select_bluegene and 
    sched_wiki plugin libraries.
 -- Initialize job->network in data structure, could cause job 
    submit/update to fail depending upon what is left on stack.
* Changes in SLURM 0.5.0-pre5
=============================
 -- Expand buffer to hold node_select info in job termination log.
 -- Modify slurmctld node hashing function to reduce collisions.
 -- Treat bglblock vanishing as fatal error for job, prolog and epilog 
    exit immediately.
Danny Auble's avatar
Danny Auble committed
 -- bug fix for following multiple X-dim partitions
Moe Jette's avatar
Moe Jette committed
* Changes in SLURM 0.5.0-pre4
=============================
 -- Fix bug in slurmd that could double KillWait time on job timeout.
 -- Fix bug in srun's error code reporting to slurmctld, could DOWN 
    a node if job run as root has non-zero error code.
Moe Jette's avatar
Moe Jette committed
 -- Remove a node's partition info when removed from existing partition.
Moe Jette's avatar
Moe Jette committed
 -- Use proctrack plugin to call all processes in a job step before 
    calling interconnect_postfini() to insure no processes escape from 
    job and prevent switch windows from being released.
 -- Added mail.html web page telling how to get on slurm mailing lists.
Moe Jette's avatar
Moe Jette committed
 -- Added another directory to search for DB2 files on BGL system.
 -- Added overview man page slurm.1.
Moe Jette's avatar
Moe Jette committed
 -- Added new configure option "--with-db2-dir=PATH" for BGL.
Moe Jette's avatar
Moe Jette committed

* Changes in SLURM 0.5.0-pre3
=============================
 -- Merge of SLURM v0.4-branch into v0.5/HEAD.

* Changes in SLURM 0.5.0-pre2
=============================
 -- Fix bug in srun to clean-up upon failure of an allocated node
    (srun -A would generate a segmentation fault, Chris Holmes, HP).
 -- If slurmd's node name is mapped to NULL (due to bad configuration)
    terminate slurmd with a fatal error and don't crash slurmctld.
 -- Add SLURMD_DEBUG env var for use with AIX/POE in spawn_task RPC.
 -- Always pack job's "features" for access by prolog/epilog
* Changes in SLURM 0.5.0-pre1
=============================
 -- Add network option to srun and job creation API for specification 
    of communication protocol over IBM Federation switch.
 -- Add new slurm.conf parameter ProctrackType (process tracking) and 
    associated plugin in the slurmd module.
Moe Jette's avatar
Moe Jette committed
 -- Send node's switch state with job epilog completion RPC and 
    node registration (only when slurmd starts, not on periodic 
    registrtions).
 -- Add federation switch plugin.
 -- Add new configuration keyword, SchedulerRootFilter, to control 
    external scheduler control of RoolOnly partition (Chris Holmes, HP).
 -- Modify logic to set process group ID for spawned processes (last 
    patch from slurm v0.3.11).
 -- "srun -A" modified to return exit code of last command executed
    (Chris Holmes, HP).
 -- Add support for different slurm.conf files controlled via SLURM_CONF
    env var (Brian O'Sullivan, pathscale)
 -- Fix bug if srun given --uid without --gid option (Chris Holmes, HP).
Moe Jette's avatar
Moe Jette committed

* Changes in SLURM 0.4.24
=========================
 -- DRAIN nodes with switches on base partitions are in ERROR, MISSING, 
    or DOWN states.
 
* Changes in SLURM 0.4.23
========================= 
 -- Modified bluegene plugin to only sync bglblocks to jobs on initial 
    startup, not on reconfig. Fixes race condition.
 -- Modified bluegene plugin to work with 141 driver. Enabling it to 
    only have to reboot when switching from coproc -> virtual and back.
 -- added support for a full system partition to make sure every other 
    partition is free and vice-verse.
 -- smap resizing issue fixed.
 -- change prolog not to add time when a partition is in deallocating 
    state.
 -- NOTE: This version of SLURM requires BGL driver 141/2005.

* Changes in SLURM 0.4.22
=========================
 -- Modified bluegene plugin to not do anything if the bluegene.conf file 
    is altered.
 -- added checking for lists before trying to create iterator on the list.

* Changes in SLURM 0.4.21
=========================
 -- Fix in race condition with time in Status Thread of BGL
 -- Fix no leading zeros in smap output.

* Changes in SLURM 0.4.20
=========================
 -- Smap output is more user friendly with -c option

* Changes in SLURM 0.4.19
=========================
 -- Added new RPCs for getting bglblock state info remotely and cache data 
    within the plugin (permits removal of DB2 access from BGL FEN and 
    dramatically increases smap responsivenss, also changed prolog/epilog
    operation)
 -- Move smap executable to main slurm RPM (from separate RPM).
 -- smap uses RPC instead of DB2 to get info about bgl partitions.
 -- Status function added to bluegene_agent thread.  Keeps current state
    of BGL partitions updating every second.  will handle multiple attempts 
    at booting if booting a partition fails. 

* Changes in SLURM 0.4.18
=========================
 -- Added error checking of rm_remove_partition calls.
 -- job_term() was terminating a job in real time rather than 
    queueing the request. This would result in slurmctld hanging 
    for many seconds when a job termination was required.

* Changes in SLURM 0.4.17
========================
 -- Bug fixes from testing .16.

* Changes in SLURM 0.4.16
========================
 -- Added error checking to a bunch of Bridge API calls and more 
    gracefully handle failure modes.
 -- Made smap more robust for more jobs.

* Changes in SLURM 0.4.15
========================
 -- Added error checking to a bunch of Bridge API calls and more 
    gracefully handle failure modes.

* Changes in SLURM 0.4.14
========================
 -- job state is kept on warm start of slurm

* Changes in SLURM 0.4.13
========================
 -- epilog fix for bgl plugin

* Changes in SLURM 0.4.12
========================
 -- bug shot for new api calls.
 -- added BridgeAPILogFile as an option for bluegene.conf file
 
* Changes in SLURM 0.4.11
========================
 -- changed as many rm_get_partition() to rm_get_partitions_info as we could 
    for time saving.
 
* Changes in SLURM 0.4.10
Loading
Loading full blame...