Skip to content
Snippets Groups Projects
NEWS 90.9 KiB
Newer Older
Christopher J. Morrone's avatar
Christopher J. Morrone committed
This file describes changes in recent versions of SLURM. It primarily
documents those changes that are of interest to users and admins.
Danny Auble's avatar
Danny Auble committed
* Changes in SLURM 1.2.0-pre4
=============================
 -- Added node_inx to job_step_info_t to get the node indecies for mapping out
    steps in a job by nodes.
 -- sview grid added
 -- BLUEGENE node_inx added to blocks for reference.
 -- Automatic CPU_MASK generation for task launch, new srun option -B.
 -- Automatic logical to physical processor identification and mapping.
 -- Added new srun options to --cpu_bind: sockets, cores, and threads
 -- Updated select/cons_res to operate as socket granularity.
 -- New srun task distribution options to -m: plane
 -- Multi-core support in sinfo, squeue, and scontrol.
 -- Memory can be treated as a consumable resource.
 -- New srun options --ntasks_per_[node|socket|core].
Danny Auble's avatar
Danny Auble committed

* Changes in SLURM 1.2.0-pre3
=============================
 -- Remove configuration parameter ShedulerAuth (defunct).
 -- Add NextJobId to "scontrol show config" output.
 -- Add new slurm.conf parameter MailProg.
 -- New forwarding logic.  New recieve_msg functions depending on what you
    are expecting to get back.  No srun_node_id anymore passed around in
    a slurm_msg_t
 -- Remove sched/wiki plugin (use sched/wiki2 for now)
 -- Disable pthread_create() for PMI_send when TotalView is running for 
    better performance.
 -- Fixed certain tests in test suite to not run with bluegene or front-end 
 -- Removed addresses from slurm_step_layout_t
 -- Added new job field, "comment". Set by srun, salloc and sbatch. See 
    with "scontrol show job". Used in sched/wiki2.
 -- Report a job's exit status in "scontrol show job".
 -- In sched/wiki2: add support for JOBREQUEUE command.
* Changes in SLURM 1.2.0-pre2
=============================
 -- Added function slurm_init_slurm_msg to be used to init any slurm_msg_t
    you no longer need do any other type of initialization to the type.

Moe Jette's avatar
Moe Jette committed
* Changes in SLURM 1.2.0-pre2
=============================
 -- Fixed task dist to work with hostfile and warn about asking for more tasks 
    than you have nodes for in arbitray mode.
 -- Added "account" field to job and step accounting information and sacct output.
 -- Moved task layout to slurmctld instead of srun.  Job step create returns
    step_layout structure with hostnames and addresses that corrisponds 
    to those nodes. 
 -- Changed api slurm_lookup_allocation params, 
    resource_allocation_response_msg_t changed to job_alloc_info_response_msg_t
    this structure is being renamed so contents are the same.
 -- alter resource_allocation_response_msg_t see slurm.h.in
 -- remove old_job_alloc_msg_t and function slurm_confirm_alloc	
 -- Slurm configuration files now support an "Include" directive to
    include other files inline.
 -- BLUEGENE New --enable-bluegene-emulation configure parameter to allow 
    running system in bluegene emulation mode.  Only
    really useful for developers.
 -- New added new tool sview GUI for displaying slurm info.
 -- fixed bug in step layout to lay out tasks correctly
Moe Jette's avatar
Moe Jette committed

* Changes in SLURM 1.2.0-pre1
=============================
 -- Fix bug that could run a job's prolog more than once
 -- Permit batch jobs to be requeued, scontrol requeue <jobid>
 -- Send overcommit flag from srun in RPCs and have slurmd set SLURM_OVERCOMMIT
    flag at batch job launch time.
 -- Added new configuration parameter MessageTimeout (replaces #define in 
    the code)
Moe Jette's avatar
Moe Jette committed
 -- Added support for OSX build.
* Changes in SLURM 1.1.14
=========================
 - In sched/wiki2: report job/node id and state only if no changes since 
   time specified in request.
 - In sched/wiki2: include a job's exit code in job state information.
 - In sched/wiki2: add event notification logic on job submit and completion.
 - In sched/wiki2: add support for JOBWILLRUN command type.
 - In sched/wiki2: for job info, include required HOSTLIST if applicable.
 - In sched/wiki2: for job info, replace PARTITIONMASK with RCLASS (report
   partition name associated with a job, but no task count)
 - In sched/wiki2: for job and node info, report all data if TS==0, 
   volitile data if TS<=update_time, state only if TS>update_time
 - In sched/wiki2: add support for CMD=JOBSIGNAL ARG=jobid SIGNAL=name or #
 - In sched/wiki2: add support for CMD=JOBMODIFY ARG=jobid [BANK=name]
   [TIMELIMIT=minutes] [PARTITION=name]
 - In sched/wiki2: add support for CMD=INITIALIZE ARG=[USEHOSTEXP=T|F]
   [EPORT=#]; RESPONSE=EPORT=# USEHOSTEXP=T
 - Fix sinfo node state filtering when asking for idle nodes that are also 
   draining. 
 - Add Fortran extension to slurm_get_rem_time() API.
 - Fix bug when changing the time limit of a running job that has previously 
   been suspended (formerly failed to account for suspend time in setting 
   termination time).
 - fix for step allocation to be able to specify only a few nodes in a 
   step and ask for more that specified.
 - patch from Hongjia Cao for forwarding logic
 - BLUEGENE - able to allocate specific nodes without locking up.
* Changes in SLURM 1.1.13
=========================
 - Fix hang in sched/wiki2 if Moab stops responding responding when 
   response is outgoing. 
 - BLUEGENE - fix to make sure the block is good to go when picking it
 - BLUEGENE - add libsched_if.so so mpirun doesn't try to create a block
   by itself.
 - Enable specification of srun --jobid=# option with --batch (for user root).
 - Verify that job actually starts when requested by sched/wiki2.
 - Add new wiki.conf parameters: EPort and JobAggregationTime for event 
   notifcation logic (see wiki.conf man page for details)
 - Sched/wiki2 to report a job's account as COMMENT response to GETJOBS
    request.
 - Add srun option "--comment" (maps to job account until slurm v1.2, 
   needed for Moab scheduler functionality).
 - fixed some timeout issues in the controller hopefully stopping all the 
   issues with excessive timeouts.
 - unit conversion (i.e. 1024 => 1k) only happens on bgl systems for node 
   count.
 - Sched/wiki2 to report a job's COMPETETIME and SUSPENDTIME in GETJOBS 
   response.
 - Added support for Mellanox's version of mvapich-0.9.7.
* Changes in SLURM 1.1.11
=========================
 - Update file headers adding permission to link with OpenSSL.
 - Enable sched/wiki2 message authentication.
 - Fix libpmi compilation issue.
 - Remove "gcc-c++ python" from slurm.spec BuildRequires.  It breaks
   the AIX build, so we'll have to find another way to deal with that.
* Changes in SLURM 1.1.10
=========================
 -- task distribution fix for steps that are smaller than job allocation.
 -- BLUEGENE - fix to only send a success when block was created when trying
    to allocate the block.
 -- fix so if slurm_send_recv_node_msg fails on the send the auth_cred returned
    by the resp is NULL.
 -- Fix switch/federation plugin so backup controller can assume control 
    repeatedly without leaking or corrupting memory.
 -- Add new error code (for Maui/Moab scheduler): ESLURM_JOB_HELD
 -- Tweak slurmctld's node ping logic to better handle failed nodes with 
    hierarchical communications fail-over logic.
 -- Add support for sched/wiki specific configuration file "wiki.conf".
 -- Added sched/wiki2 plugin (new experimental wiki plugin).
* Changes in SLURM 1.1.9
========================
 -- BLUEGENE - fix to handle a NO_VAL sent in as num procs in the job 
    description.
 -- Fix bug in slurmstepd code for parsing --multi-prog command script.
    Parser was failing for commands with no arguments.
 -- Fix bug to check unsigned ints correctly in bitstring.c
 -- Alter node count covert to kilo to only convert number divisible by 
    1024 or 512
* Changes in SLURM 1.1.8
========================
 -- Added bug fixes (fault-tolerance and memory leaks) from Hongjia Cao 
 -- Gixed some potential BLUEGENE issues with the bridge log file not having
    a mutex around the fclose and fopen.
 -- BLUEGENE - srun -n procs now regristers correctly
 -- Fixed problem with reattach double allocating step_layout->tids
 -- BLUEGENE - fix race condition where job is finished before it starts.
* Changes in SLURM 1.1.7
========================
 -- BLUEGENE - fixed issue with doing an allocation for nodes since asking 
    for 32,128, or 512 all mean 1 to the controller. 
 -- Add "Include" directive to slurm.conf files.  If "Include" is found
    at the beginning of a line followed by whitespace and then
    the full path to a file, that file is included inline with the current
    slurm.conf file.
* Changes in SLURM 1.1.6
========================
 -- Improved task layout for relative positions
 -- Fixed heterogeous cpu overcommit issue
 -- Fix bug where srun would hang if it ran on one node and that 
    node's slurmd died
 -- Fix bug where srun task layout would be bad when min-max node range is 
    specified (e.g. "srun -N1-4 ...")
 -- Made slurmctld_conf.node_prefix only be set on Bluegene systems.
 -- Fixed a race condition in the controller to make it so a plugin thread
    wouldn't be able to access the slurmctld_conf structure before it was 
    filled.
* Changes in SLURM 1.1.5
========================
 -- Ignore partition's MaxNodes for SlurmUser and root.
 -- Fix possible memory corruption with use of PMI_KVS_Create call.
 -- Fix race condition when multiple PMI_KVS_Barrier calls.
 -- Fix logic in which slurmctld outgoing RPC requests could get delayed.
 -- Fix logic for laying out steps without a hostlist.
Loading
Loading full blame...