Skip to content
Snippets Groups Projects
NEWS 88.9 KiB
Newer Older
Christopher J. Morrone's avatar
Christopher J. Morrone committed
This file describes changes in recent versions of SLURM. It primarily
documents those changes that are of interest to users and admins. 
* Changes in SLURM 1.2.0-pre3
=============================
 -- Remove configuration parameter ShedulerAuth (defunct).
 -- Add NextJobId to "scontrol show config" output.
 -- Add new slurm.conf parameter MailProg.
 -- new forwarding logic.  New recieve_msg functions depending on what you
    are expecting to get back.  No srun_node_id anymore passed around in
    a slurm_msg_t
 -- Remove sched/wiki plugin (use sched/wiki2 for now)
 -- Disable pthread_create() for PMI_send when TotalView is running for 
    better performance.
 -- fixed certain tests in test suite to not run with bluegene or front-end 
    systems
 -- Removed addresses from slurm_step_layout_t
 -- Added new job field, "comment". Set by srun, salloc and sbatch. See 
    with "scontrol show job". Used in sched/wiki2.
 -- Report a job's exit status in "scontrol show job".
* Changes in SLURM 1.2.0-pre2
=============================
 -- added function slurm_init_slurm_msg to be used to init any slurm_msg_t
    you no longer need do any other type of initialization to the type.

Moe Jette's avatar
Moe Jette committed
* Changes in SLURM 1.2.0-pre2
=============================
 -- Fixed task dist to work with hostfile and warn about asking for more tasks 
    than you have nodes for in arbitray mode.
 -- Added "account" field to job and step accounting information and sacct output.
 -- Moved task layout to slurmctld instead of srun.  Job step create returns
    step_layout structure with hostnames and addresses that corrisponds 
    to those nodes. 
 -- changed api slurm_lookup_allocation params, 
    resource_allocation_response_msg_t changed to job_alloc_info_response_msg_t
    this structure is being renamed so contents are the same.
 -- alter resource_allocation_response_msg_t see slurm.h.in
 -- remove old_job_alloc_msg_t and function slurm_confirm_alloc	
 -- Slurm configuration files now support an "Include" directive to
    include other files inline.
 -- BLUEGENE New --enable-bluegene-emulation configure parameter to allow 
    running system in bluegene emulation mode.  Only
    really useful for developers.
 -- New added new tool sview GUI for displaying slurm info.
 -- fixed bug in step layout to lay out tasks correctly
Moe Jette's avatar
Moe Jette committed

* Changes in SLURM 1.2.0-pre1
=============================
 -- Fix bug that could run a job's prolog more than once
 -- Permit batch jobs to be requeued, scontrol requeue <jobid>
 -- Send overcommit flag from srun in RPCs and have slurmd set SLURM_OVERCOMMIT
    flag at batch job launch time.
 -- Added new configuration parameter MessageTimeout (replaces #define in 
    the code)
Moe Jette's avatar
Moe Jette committed
 -- Added support for OSX build.
* Changes in SLURM 1.1.14
=========================
 - In sched/wiki2: report job/node id and state only if no changes since 
   time specified in request.
 - In sched/wiki2: include a job's exit code in job state information.
 - In sched/wiki2: add event notification logic.
 - In sched/wiki2: add support for JOBWILLRUN command type.

* Changes in SLURM 1.1.13
=========================
 - Fix hang in sched/wiki2 if Moab stops responding responding when 
   response is outgoing. 
 - BLUEGENE - fix to make sure the block is good to go when picking it
 - BLUEGENE - add libsched_if.so so mpirun doesn't try to create a block
   by itself.
 - Enable specification of srun --jobid=# option with --batch (for user root).
 - Verify that job actually starts when requested by sched/wiki2.
 - Add new wiki.conf parameters: EPort and JobAggregationTime for event 
   notifcation logic (see wiki.conf man page for details)
 - Sched/wiki2 to report a job's account as COMMENT response to GETJOBS
    request.
 - Add srun option "--comment" (maps to job account until slurm v1.2, 
   needed for Moab scheduler functionality).
 - fixed some timeout issues in the controller hopefully stopping all the 
   issues with excessive timeouts.
 - unit conversion (i.e. 1024 => 1k) only happens on bgl systems for node 
   count.
 - Sched/wiki2 to report a job's COMPETETIME and SUSPENDTIME in GETJOBS 
   response.
 - Added support for Mellanox's version of mvapich-0.9.7.
* Changes in SLURM 1.1.11
=========================
 - Update file headers adding permission to link with OpenSSL.
 - Enable sched/wiki2 message authentication.
 - Fix libpmi compilation issue.
 - Remove "gcc-c++ python" from slurm.spec BuildRequires.  It breaks
   the AIX build, so we'll have to find another way to deal with that.
* Changes in SLURM 1.1.10
=========================
 -- task distribution fix for steps that are smaller than job allocation.
 -- BLUEGENE - fix to only send a success when block was created when trying
    to allocate the block.
 -- fix so if slurm_send_recv_node_msg fails on the send the auth_cred returned
    by the resp is NULL.
 -- Fix switch/federation plugin so backup controller can assume control 
    repeatedly without leaking or corrupting memory.
 -- Add new error code (for Maui/Moab scheduler): ESLURM_JOB_HELD
 -- Tweak slurmctld's node ping logic to better handle failed nodes with 
    hierarchical communications fail-over logic.
 -- Add support for sched/wiki specific configuration file "wiki.conf".
 -- Added sched/wiki2 plugin (new experimental wiki plugin).
* Changes in SLURM 1.1.9
========================
 -- BLUEGENE - fix to handle a NO_VAL sent in as num procs in the job 
    description.
 -- Fix bug in slurmstepd code for parsing --multi-prog command script.
    Parser was failing for commands with no arguments.
 -- Fix bug to check unsigned ints correctly in bitstring.c
 -- Alter node count covert to kilo to only convert number divisible by 
    1024 or 512
* Changes in SLURM 1.1.8
========================
 -- Added bug fixes (fault-tolerance and memory leaks) from Hongjia Cao 
 -- Gixed some potential BLUEGENE issues with the bridge log file not having
    a mutex around the fclose and fopen.
 -- BLUEGENE - srun -n procs now regristers correctly
 -- Fixed problem with reattach double allocating step_layout->tids
 -- BLUEGENE - fix race condition where job is finished before it starts.
* Changes in SLURM 1.1.7
========================
 -- BLUEGENE - fixed issue with doing an allocation for nodes since asking 
    for 32,128, or 512 all mean 1 to the controller. 
 -- Add "Include" directive to slurm.conf files.  If "Include" is found
    at the beginning of a line followed by whitespace and then
    the full path to a file, that file is included inline with the current
    slurm.conf file.
* Changes in SLURM 1.1.6
========================
 -- Improved task layout for relative positions
 -- Fixed heterogeous cpu overcommit issue
 -- Fix bug where srun would hang if it ran on one node and that 
    node's slurmd died
 -- Fix bug where srun task layout would be bad when min-max node range is 
    specified (e.g. "srun -N1-4 ...")
 -- Made slurmctld_conf.node_prefix only be set on Bluegene systems.
 -- Fixed a race condition in the controller to make it so a plugin thread
    wouldn't be able to access the slurmctld_conf structure before it was 
    filled.
* Changes in SLURM 1.1.5
========================
 -- Ignore partition's MaxNodes for SlurmUser and root.
 -- Fix possible memory corruption with use of PMI_KVS_Create call.
 -- Fix race condition when multiple PMI_KVS_Barrier calls.
 -- Fix logic in which slurmctld outgoing RPC requests could get delayed.
 -- Fix logic for laying out steps without a hostlist.
* Changes in SLURM 1.1.4
========================
 -- Improve error handling in hierarchical communications logic.

* Changes in SLURM 1.1.3
========================
 -- Fix big-endian bug in the bitstring code which plagued AIX.
 -- Fix bug in handling srun's --multi-prog option, could go off end of buffer.
 -- Added support for job step completion (and switch window release) on 
    subset of allocated nodes.
 -- BLUEGENE - removed configure option --with-bg-link bridge is linked with 
    dlopen now no longer needing fake database so files on frontend node.
 -- BLUEGENE - implemented use of rm_get_partition_info instead of 
    ...partitions_info which has made a much better design improving stability.
 -- Streamline PMI communications and increase timeouts for highly parallel 
    jobs. Improves scalability of PMI.
* Changes in SLURM 1.1.2
========================
 -- Fix bug in jobcomp/filetxt plugin to report proper NodeCnt when a job 
    fails due to a node failure.
 -- Fix Bluegene configure to work with the new 64bit libs.
 -- Fix bug in controller that causes it to segfault when hit with a malformed
    message.
 -- For "srun --attach=X" to other users job, report an error and exit (it 
    previously just hung).
 -- BLUEGENE - fix for doing correct small block logic on user error. 
 -- BLUEGENE - Added support in slurmd to create a fake libdb2.so if it
    doesn't exist so smap won't seg fault
 -- BLUEGENE - "scontrol show job" reports "MaxProcs=None" and "Start=None"
    if values are not specified at job submit time
 -- Add retry logic for PMI communications, may be needed for highly parallel
    jobs.
 -- Fix bug in slurmd where variable is used in logging message after freed
    (slurmstepd rank info).
 -- Fix bug in scontrol show daemons if NodeName=localhost will work now to
Loading
Loading full blame...