Skip to content
Snippets Groups Projects
NEWS 94.6 KiB
Newer Older
Christopher J. Morrone's avatar
Christopher J. Morrone committed
This file describes changes in recent versions of SLURM. It primarily
documents those changes that are of interest to users and admins.
Moe Jette's avatar
Moe Jette committed
* Changes in SLURM 1.2.0-pre7
=============================
 -- BLUEGENE - added configurable images for bluegene block creation.
    (No documentation out side of srun and sbatch just yet,
     no sercurity either)
Moe Jette's avatar
Moe Jette committed

* Changes in SLURM 1.2.0-pre6
=============================
 -- Maintain actually job step run time with suspend/resume use.
 -- Allow slurm.conf options to appear multiple times.  SLURM will use the
    last instance of any particular option.
Moe Jette's avatar
Moe Jette committed
 -- Add version number to node state save file. Will not recover node 
    state information on restart from older version.
 -- Add logic to save/restore multi-core state information.
 -- Updated multi-core logic to use types uint16_t and uint32_t instead 
    of just type int.
Danny Auble's avatar
Danny Auble committed
 -- Race condition for forwarding logic fix from Hongjia Cao
 -- Add support for Portable Linux Processor Affinity (PLPA, see
    http://www.open-mpi.org/software/plpa).
 -- When a job epilog completes on all non-DOWN nodes, immediately purge
    it's job steps that lack switch windows. Needed for LSF operation. 
    Based upon slurm.hp.node_fail.patch.
 -- Modify srun to ignore entries on --nodelist for job step creation 
    if their count exceeds the task count. Based on slurm.hp.srun.patch.

* Changes in SLURM 1.2.0-pre5
=============================
 -- Patch from HP patch.1.2.0.pre4.061017.crcore_hints, supports cores as 
    consumable resource.

Danny Auble's avatar
Danny Auble committed
* Changes in SLURM 1.2.0-pre4
=============================
 -- Added node_inx to job_step_info_t to get the node indecies for mapping out
    steps in a job by nodes.
 -- sview grid added
 -- BLUEGENE node_inx added to blocks for reference.
 -- Automatic CPU_MASK generation for task launch, new srun option -B.
 -- Automatic logical to physical processor identification and mapping.
 -- Added new srun options to --cpu_bind: sockets, cores, and threads
 -- Updated select/cons_res to operate as socket granularity.
 -- New srun task distribution options to -m: plane
 -- Multi-core support in sinfo, squeue, and scontrol.
 -- Memory can be treated as a consumable resource.
 -- New srun options --ntasks-per-[node|socket|core].
Danny Auble's avatar
Danny Auble committed

* Changes in SLURM 1.2.0-pre3
=============================
 -- Remove configuration parameter ShedulerAuth (defunct).
 -- Add NextJobId to "scontrol show config" output.
 -- Add new slurm.conf parameter MailProg.
 -- New forwarding logic.  New recieve_msg functions depending on what you
    are expecting to get back.  No srun_node_id anymore passed around in
    a slurm_msg_t
 -- Remove sched/wiki plugin (use sched/wiki2 for now)
 -- Disable pthread_create() for PMI_send when TotalView is running for 
    better performance.
 -- Fixed certain tests in test suite to not run with bluegene or front-end 
 -- Removed addresses from slurm_step_layout_t
 -- Added new job field, "comment". Set by srun, salloc and sbatch. See 
    with "scontrol show job". Used in sched/wiki2.
 -- Report a job's exit status in "scontrol show job".
 -- In sched/wiki2: add support for JOBREQUEUE command.
* Changes in SLURM 1.2.0-pre2
=============================
 -- Added function slurm_init_slurm_msg to be used to init any slurm_msg_t
    you no longer need do any other type of initialization to the type.

Moe Jette's avatar
Moe Jette committed
* Changes in SLURM 1.2.0-pre2
=============================
 -- Fixed task dist to work with hostfile and warn about asking for more tasks 
    than you have nodes for in arbitray mode.
 -- Added "account" field to job and step accounting information and sacct output.
 -- Moved task layout to slurmctld instead of srun.  Job step create returns
    step_layout structure with hostnames and addresses that corrisponds 
    to those nodes. 
 -- Changed api slurm_lookup_allocation params, 
    resource_allocation_response_msg_t changed to job_alloc_info_response_msg_t
    this structure is being renamed so contents are the same.
 -- alter resource_allocation_response_msg_t see slurm.h.in
 -- remove old_job_alloc_msg_t and function slurm_confirm_alloc	
 -- Slurm configuration files now support an "Include" directive to
    include other files inline.
 -- BLUEGENE New --enable-bluegene-emulation configure parameter to allow 
    running system in bluegene emulation mode.  Only
    really useful for developers.
 -- New added new tool sview GUI for displaying slurm info.
 -- fixed bug in step layout to lay out tasks correctly
Moe Jette's avatar
Moe Jette committed

* Changes in SLURM 1.2.0-pre1
=============================
 -- Fix bug that could run a job's prolog more than once
 -- Permit batch jobs to be requeued, scontrol requeue <jobid>
 -- Send overcommit flag from srun in RPCs and have slurmd set SLURM_OVERCOMMIT
    flag at batch job launch time.
 -- Added new configuration parameter MessageTimeout (replaces #define in 
    the code)
Moe Jette's avatar
Moe Jette committed
 -- Added support for OSX build.
* Changes in SLURM 1.1.19
=========================
 - BLUEGENE - make sure the order of blocks read in from the bluegene.conf
   are created in that order (static mode).
* Changes in SLURM 1.1.18
=========================
 - In sched/wiki2, add support for EHost and EHostBackup configuration 
   parameters in wiki.conf file
 - In sched/wiki2, fix memory management bug for JOBWILLRUN command.
 - In sched/wiki2, consider job Busy while in Completing state for 
   KillWait+10 seconds (used to be 30 seconds).
 - BLUEGENE - Fixes to allow full block creation on the system and not to add
   passthrough nodes to the allocation when creating a block. 
 - BLUEGENE - Fix deadlock issue with starting and failing jobs at the same
   time
 - Make connect() non-blocking and poll() with timeout to avoid huge 
   waits under some conditions.
 - Set "ENVIRONMENT=BATCH" environment variable for "srun --batch" jobs only.
 - Add logic to save/restore select/cons_res state information.
 - BLUEGENE - make all sprintf's into snprintf's
 - Fix timeout calculation to work correctly for fan out.
 - Fix for "srun -A" segfault on a node failure.
* Changes in SLURM 1.1.17
=========================
 - BLUEGENE - fix to make dynamic partitioning not go create block where
    there are nodes that are down or draining.
 - Fix srun's default node count with an existing allocation when neither
   SLURM_NNODES nor -N are set.
 - Stop srun from setting SLURM_DISTRIBUTION under job steps when a
   specific was not explicitly requested by the user.
* Changes in SLURM 1.1.16
=========================
 - BLUEGENE - fix to make prolog run 5 minutes longer to make sure we have
   enough time to free the overlapping blocks when starting a new job on a 
   block.
 - BLUEGENE - edit to the libsched_if.so to read env and look at 
   MPIRUN_PARTITION to see if we are in slurm or running mpirun natively.
 - Plugins are now dlopened RTLD_LAZY instead of RTLD_NOW.

* Changes in SLURM 1.1.15
=========================
 - BLUEGENE - fix to be able to create static partitions
 - Fixed fanout timeout logic.
 - Fix for slurmctld timeout on outgoing message (Hongjia Cao, NUDT.edu.cn).

* Changes in SLURM 1.1.14
=========================
 - In sched/wiki2: report job/node id and state only if no changes since 
   time specified in request.
 - In sched/wiki2: include a job's exit code in job state information.
 - In sched/wiki2: add event notification logic on job submit and completion.
 - In sched/wiki2: add support for JOBWILLRUN command type.
 - In sched/wiki2: for job info, include required HOSTLIST if applicable.
 - In sched/wiki2: for job info, replace PARTITIONMASK with RCLASS (report
   partition name associated with a job, but no task count)
 - In sched/wiki2: for job and node info, report all data if TS==0, 
   volitile data if TS<=update_time, state only if TS>update_time
 - In sched/wiki2: add support for CMD=JOBSIGNAL ARG=jobid SIGNAL=name or #
 - In sched/wiki2: add support for CMD=JOBMODIFY ARG=jobid [BANK=name]
   [TIMELIMIT=minutes] [PARTITION=name]
 - In sched/wiki2: add support for CMD=INITIALIZE ARG=[USEHOSTEXP=T|F]
   [EPORT=#]; RESPONSE=EPORT=# USEHOSTEXP=T
 - In sched/wiki2: fix memory leak.
 - Fix sinfo node state filtering when asking for idle nodes that are also 
   draining. 
 - Add Fortran extension to slurm_get_rem_time() API.
 - Fix bug when changing the time limit of a running job that has previously 
   been suspended (formerly failed to account for suspend time in setting 
   termination time).
 - fix for step allocation to be able to specify only a few nodes in a 
   step and ask for more that specified.
 - patch from Hongjia Cao for forwarding logic
 - BLUEGENE - able to allocate specific nodes without locking up.
 - BLUEGENE - better tracking of blocks that are created dynamically, 
   less hitting the db2.
* Changes in SLURM 1.1.13
=========================
 - Fix hang in sched/wiki2 if Moab stops responding responding when 
   response is outgoing. 
 - BLUEGENE - fix to make sure the block is good to go when picking it
 - BLUEGENE - add libsched_if.so so mpirun doesn't try to create a block
   by itself.
 - Enable specification of srun --jobid=# option with --batch (for user root).
 - Verify that job actually starts when requested by sched/wiki2.
 - Add new wiki.conf parameters: EPort and JobAggregationTime for event 
   notification logic (see wiki.conf man page for details)
 - Sched/wiki2 to report a job's account as COMMENT response to GETJOBS
    request.
 - Add srun option "--comment" (maps to job account until slurm v1.2, 
   needed for Moab scheduler functionality).
Loading
Loading full blame...