Skip to content
Snippets Groups Projects
NEWS 52.3 KiB
Newer Older
Christopher J. Morrone's avatar
Christopher J. Morrone committed
This file describes changes in recent versions of SLURM. It primarily
documents those changes that are of interest to users and admins. 
Christopher J. Morrone's avatar
Christopher J. Morrone committed
* Changes in SLURM 0.6.0-pre3
 -- Add code so job request for shared nodes gets explicitly requested 
    nodes, but lightly loaded nodes otherwise.
 -- Add job step name field.
 -- Add job step network specification field.
Christopher J. Morrone's avatar
Christopher J. Morrone committed
 -- Add proctrack/rms plugin
 -- Change the proctrack API to send a slurmd_job_t pointer to both
    slurm_container_create() and slurm_container_add().  One of those
    functions MUST set job->cont_id.
 -- Remove vestigial node_use (virtual or coprocessor) field from job
    request RPC.
 -- Fix mpich-gm bugs, thanks to Takao Hatazaki (HP).
Christopher J. Morrone's avatar
Christopher J. Morrone committed

* Changes in SLURM 0.6.0-pre2
=============================
 -- Removed "make rpm" target.
* Changes in SLURM 0.6.0-pre1
=============================
 -- Added bgl/partition_allocator/smap changes from 0.5.7.
 -- Added configurable resource limit propagation  (Daniel Christians, HP).
Danny Auble's avatar
Danny Auble committed
 -- Added mpi plugin specify at start of srun.
 -- Changed SlurmUser ID from 16-bit to 32-bit.
 -- Added MpiDefault slurm.conf parameter.
 -- Remove KillTree configuration parameter (replace with
    "ProctrackType=proctrack/linuxproc")
 -- Remove MpichGmDirectSupport configuration parameter (replace with
    "MpiDefault=mpich-gm")
 -- Make default plugin be "none" for mpi.
 -- Added mpi/none plugin and made it the default.
 -- Replace extern program_invocation_short_name with program_invocation_name
    due to short name being truncated to 16 bytes on some systems.
 -- Added support for Elan clusters with different CPU counts on nodes
    (Chris Holmes, HP).
 -- Added Consumable Resources web page (Susanne Balle, HP).
Christopher J. Morrone's avatar
Christopher J. Morrone committed
 -- "Session manager" slurmd process has been eliminated.
 -- switch/federation fixes migrated from 0.5.*
 -- srun pthreads really set detached, fixes scaling problem

* Changes in SLURM 0.5.7
========================
 -- added infrastructure for (eventual) support of AIX checkpointing
    of slurm batch and interactive poe jobs
 -- added wiring for BGL to do wiring for physical location first and then
    logical.
 -- only one thread used to query database before polling thread is there.
 -- slurmd can now handle 130 jobs at one time instead of 64.

* Changes in SLURM 0.5.6
========================
 -- fix for BGL hostnames and full system partition finding

* Changes in SLURM 0.5.5
========================
 -- Increase SLURM_MESSAGE_TIMEOUT_MSEC_STATIC to 15000
 -- Fix for premature timeout in _slurm_send_timeout
 -- Fix for federation overlapping calls to non-thread-safe _get_adapters

* Changes in SLURM 0.5.4
========================
 -- Added support for no reboot for VN to CO on BGL
 -- Fix for if a job starts after it finishes on BGL

* Changes in SLURM 0.5.3
========================
 -- federation patch so the slurm controller has sane window status at
    start-up regardless of the window status reported in the slurmd
    registration.
 -- federation driver exits with fatal() if the federation driver can not
    find all of the adapters listed in the federation.conf

* Changes in SLURM 0.5.2
========================
 -- Extra federation driver sanity checks

* Changes in SLURM 0.5.1
========================
 -- Fix federation driver bad free(), other minor fed fixes
 -- Allow slurm to parse very long lines in the slurm.conf
Moe Jette's avatar
Moe Jette committed
* Changes in SLURM 0.5.0
========================
 -- Fix race condition in job accouting plugin, could hang slurmd
 -- Report SlurmUser id over 16 bits as an error (fix on v0.6)

* Changes in SLURM 0.5.0-pre19
==============================
 -- Fix memory management bug in federation driver

* Changes in SLURM 0.5.0-pre18
==============================
 -- elan switch plugin memory leak plugged
 -- added g_slurmctld_jobacct_fini() to release all memory (useful 
    to confirm no memory leaks)
 -- Fix slurmd bug introduced in pre17
* Changes in SLURM 0.5.0-pre17
==============================
 -- slurmd calls the proctrack destroy function at job step completion
 -- federation driver tries harder to clean up switch windows
 -- BGL wiring changes
* Changes in SLURM 0.5.0-pre16
==============================
 -- Check slurm.conf values for under/overflows (some are 16 bit values).
 -- Federation driver clears windows at job step completion
 -- Modify code for clean build with gcc v4.0
 -- New SLURM_NETWORK environmant variable used by slurm_ll_api
* Changes in SLURM 0.5.0-pre15
==============================
 -- Added "network" field to "scontrol show job" output. 
 -- Federation fix for unfreed windows when multiple adapters on
    one node use the same LID
Danny Auble's avatar
Danny Auble committed
* Changes in SLURM 0.5.0-pre14
==============================
 -- RDMA works on fed plugin.

* Changes in SLURM 0.5.0-pre13
==============================
 -- Major mods to support checkpoint on AIX.
 -- Job accounting documenation expanded, added tuning options, minor bug fixes
Danny Auble's avatar
Danny Auble committed
 -- BGL wiring will now work on <= 4 node X-dim partitions and also 8 node 
    X-dim partitions.
 -- ENV variables set for spawning jobs. 
 -- jobacct patch from HP to not erroneously lock a mutex in the 
    jobacct_log plugin.
 -- switch/federation supports multiple adapters per task.  sn_all behaviour
    is now correct, and it also supports sn_single.
* Changes in SLURM 0.5.0-pre12
==============================
 -- Minor build changes to support RPM creation on AIX

Moe Jette's avatar
Moe Jette committed
* Changes in SLURM 0.5.0-pre11
==============================
 -- Slurmd tests for initialized session manager (user's) slurmd pid before 
    killing it to avoid killing system daemon (race condition).
 -- srun --output or --error file names of "none" mapped to /dev/null for 
    batch jobs rather than a file actually named "none".
Moe Jette's avatar
Moe Jette committed
 -- BGL: don't try to read bglblock state until they are all created to 
    avoid having BGL Bridge API seg fault.
Moe Jette's avatar
Moe Jette committed

* Changes in SLURM 0.5.0-pre10
==============================
 -- Fix bug that was resetting BGL job geometry on unrelated field update.
 -- squeue and sinfo print timestamp in interate mode by default.
 -- added scrolling windows in smap
 -- introduced new variable to start polling thread in the bluegene plugin.
 -- Latest accounting patches from Riebs/HP, retry communications.
 -- Added srun option --kill-on-bad-exit from Holmes/HP.
 -- Support large (64-bit address) log files where possible.
 -- Fix problem of signals being delivered twice to tasks.  Note that as
    part of the fix the slurmd session manger no longer calls setsid to
    create a new session.
* Changes in SLURM 0.5.0-pre9
=============================
 -- If a job and node are in COMPLETING state and slurmd stops responding for
    SlurmdTimeout, then set the node DOWN and the job COMPLETED.
 -- Add logic to switch/elan to track contexts allocated to active job steps 
    rather than just using a cyclic counter and hoping to avoid collisions. 
 -- Plug memory leak in freeing job info retrieved using API.
Danny Auble's avatar
Danny Auble committed
 -- Bluegene Plugin handles long deallocating states from driver 202.
 -- Fix bug in bitfmt2int() which can go off allocated memory.
* Changes in SLURM 0.5.0-pre8
=============================
 -- BlueGene srun --geometry was not getting propogated properly.
 -- Fix race condition with multiple simultaneous epilogs.
 -- Modify slurmd to resend job completion RPC to slurmctld in the 
    case where slurmctld is not responding.
Moe Jette's avatar
Moe Jette committed
 -- Updated sacct: handle cancelled jobs correctly, add user/group
    output, add ntasks ans synonym for nprocs, display error field 
    by default, display ncpus instead of nprocs
Danny Auble's avatar
Danny Auble committed
 -- Parallelization of queing jobs up to 32 at once.  Variable 
    MAX_AGENT_COUNT used in bgl_job_run.c to specify.
Danny Auble's avatar
Danny Auble committed
 -- bgl_job_run.c fixed threading issue with uid_to_string use.
* Changes in SLURM 0.5.0-pre7
=============================
 -- Preserve next_job_id across restarts.
 -- Add support for really long job names (256 bytes).
 -- Add configuration parameter SchedulerRootFilter to control what 
    entity manages prioritization of jobs in RootOnly partition 
    (internal scheduler plugin or external entity).
 -- Added support for job accounting.
 -- Added support for consumable resource based node scheduling.
 -- Permit batch job to be launched to re-existing allocation.

* Changes in SLURM 0.5.0-pre6
=============================
 -- Load bluegene.conf and federation.conf based upon SLURM_CONF env 
    var (if set).
 -- Fix slurmd shutdown signal synchronization bug (not consistently 
Loading
Loading full blame...