Skip to content
Snippets Groups Projects
NEWS 118 KiB
Newer Older
Christopher J. Morrone's avatar
Christopher J. Morrone committed
This file describes changes in recent versions of SLURM. It primarily
documents those changes that are of interest to users and admins.
* Changes in SLURM 1.3.0-pre1
=============================
 -- !!! SRUN CHANGES !!!
    The srun options -A/--allocate, -b/--batch, and -a/--attach have been
    removed!  That functionality is now available in the separate commands
    salloc, sbatch, and sattach, respectively.
 -- Add new node state FAILING plus trigger for when node enters that state.
 -- Add new configuration paramter "PrivateData". This can be used to 
    prevent a user from seeing jobs or job steps belonging to other users.
 -- Added configuration parameters for node power save mode: ResumeProgram
    ResumeRate, SuspendExcNodes, SuspendExcParts, SuspendProgram and SuspendRate.
 -- Slurmctld maintains the IP address (rather than hostname) for srun 
    communications. This fixes some possible network routing issues.
Danny Auble's avatar
Danny Auble committed
 -- Added global database plugin.  Job accounting and Job completion are the 
    first to use it.  Follow documentation to add more to the plugin.
 -- Removed no-longer-needed jobacct/common/common_slurmctld.c since that is
    replaced by the database plugin.
* Changes in SLURM 1.2.12
=========================
 -- Increase maximum message size from 1MB to 16MB, from Ernest Artiaga, BSC. 

* Changes in SLURM 1.2.11
=========================
 -- Updated "etc/mpich1.slurm.patch" for direct srun launch of MPICH1_P4
    tasks. See the "README" portion of the patch for details.
 -- Added new scontrol command "show hostlist <hostnames>" to translate a list 
    of hostnames into a hostlist expression (e.g. "tux1,tux2" -> "tux[1-2]")
    and "show hostnames <list>", returns a list of of nodes (one node per line)
    from SLURM hostlist expression or from SLURM_NODELIST environment variable 
    if no hostlist specified.
 -- Add the sbatch option "--get-user-env".
 -- Added support for mpich-mx (use the mpichgm plugin).
 -- Make job's stdout and stderr file access rights be based upon user's umask
    at job submit time.
 -- Add support for additional PMI functions: PMI_Parse_option,
    PMI_Args_to_keyval, PMI_Free_keyvals and PMI_Get_options (from Puenlap Lee
    and Nancy Kritkausky, Bull).
 -- Make default value of SchedulerPort (configuration parameter) be 7321.
 -- Use SLURM_UMASK environment variable (if set) at job submit time as umask 
    for spawned job.
 -- Correct some format issues in the man pages (from Gennero Oliva, ICAR).
 -- Added support for parallel make across an existing SLURM allocation
    based upon GNU make-3.81. Patch is in "etc/make.slurm.patch".
 -- Added '-b' option to sbatch for easy MOAB trasition to sbatch instead of
    srun.  Option does nothing in sbatch.
 -- Changed wiki2's handling of a node state in Completiting to return 'busy' 
    instead of 'running' which is already there in slurm1.1
Danny Auble's avatar
Danny Auble committed
* Changes in SLURM 1.2.10
=========================
 -- Fix race condititon in jobacct/linux with use of proctrack/pgid and a
    realloc issue inside proctrack/linux
 -- Added MPICH1_P4 plugin for direct launch of mpich1/p4 tasks using srun
    and a patched version of the mpi library. See "etc/mpich1.slurm.patch".
    NOTE: This is still under development and not ready for production use.
Danny Auble's avatar
Danny Auble committed

Moe Jette's avatar
Moe Jette committed
* Changes in SLURM 1.2.9
========================
 -- Add new sinfo field to sort by "%E" sorts by the time associated with a 
    node's state (from Prashanth Tamraparni, HP).
 -- In sched/wiki: fix logic for restarting backup slurmctld.
 -- Preload SLURM plugins early in the slurmstepd operation to avoid
    multiple dlopens after forking (and to avoid a glibc bug
    that leaves dlopen locks in a bad state after a fork).
 -- Added MPICH1_P4 patch to launch tasks using srun rather than rsh and
    automatically generate mpirun's machinefile based upon the job's 
    allocation.    See "etc/mpich1.slurm.patch".
 -- BLUEGENE - fix for overlap mode to mark all other base partitions as used
    when creating a new block from the file to insure we only use the base 
    partitions we are asking for.
Moe Jette's avatar
Moe Jette committed

Moe Jette's avatar
Moe Jette committed
* Changes in SLURM 1.2.8
========================
Moe Jette's avatar
Moe Jette committed
 -- Added mpi/mpich1_shmem plugin.
Moe Jette's avatar
Moe Jette committed
 -- Fix in proctrack/sgi_job plugin that could cause slurmstepd to seg_fault
    preventing timely clean-up of batch jobs in some cases.
Moe Jette's avatar
Moe Jette committed

Moe Jette's avatar
Moe Jette committed
* Changes in SLURM 1.2.7
========================
 -- BLUEGENE - code to make it so you can make a 36x36x36 system.  
    The wiring should be correct for a system with x-dim of 1,2,4,5,8,13
    in emulation mode.  It will work with any real system no matter the size.
 -- Major re-write of jobcomp/script plugin: fix memory leak and 
    general code clean-up.
 -- Add ability to change MaxNodes and ExcNodeList for pending job 
    using scontrol.
 -- Purge zombie processes spawned via event triggers.
Moe Jette's avatar
Moe Jette committed
 -- Add support for power saving mode (experimental code to reduce voltage
    and frequency on nodes that stay in the IDLE state, for more information 
Moe Jette's avatar
Moe Jette committed
    see http://www.llnl.gov/linux/slurm/power_save.html). None of this
    code is enabled by default.
* Changes in SLURM 1.2.6
========================
 -- Fix MPIRUN_PORT env variable in mvapich plugin
 -- Disable setting triggers by other than user SlurmUser unless SlurmUser
    is root for improved security.
 -- Add event trigger for IDLE nodes.
* Changes in SLURM 1.2.5
========================
 -- Fix nodelist truncation in "scontrol show jobs" output
 -- In mpi/mpichgm, fix potential problem formatting GMPI_PORT, from
    Ernest Artiaga, BSC.
 -- In sched/wiki2 - Report job's account, from Ernest Artiaga, BSC.
 -- Add sbatch option "--ntasks-per-node".
* Changes in SLURM 1.2.4
========================
 -- In select/cons_res - fix for function argument type mis-match in getting
    CPU count for a job, from Ernest Artiaga, BSC.
 -- In sched/wiki2 - Report job's tasks_per_node requirement.
 -- In forward logic fix to check if the forwarding node recieves a connection
    but doesn't ever get the message from the sender (network issue or
    something) also check to make sure if we get something back we make sure
    we account for everything we sent out before we call it good.
 -- Another fix to make sure steps with requested nodes have correct cpus
    accounted for and a fix to make sure the user can't allocate more 
    cpus than the have requested.
Moe Jette's avatar
Moe Jette committed
* Changes in SLURM 1.2.3
========================
Moe Jette's avatar
Moe Jette committed
 -- Cpuset logic added to  task/affinity, from Don Albert (Bull) and
Moe Jette's avatar
Moe Jette committed
    Moe Jette (LLNL).  The /dev/cpuset file system must be mounted and 
    set "TaskPluginParam=cpusets" in slurm.conf to enable.
 -- In sched/wiki2, fix possible overflow in job's nodelist, from 
    Ernest Artiaga, BSC.
 -- Defer creation of new job steps until a suspended job is resumed.
Moe Jette's avatar
Moe Jette committed
 -- In select/linear - fix for potential stack corruption bug.
* Changes in SLURM 1.2.2
========================
 -- Added new command "strigger" for event trigger management, a new 
    capability. See "man strigger" for details.
 -- srun --get-user-env now sends su's stderr to /dev/null
 -- Fix in node_scheduling logic with multiple node_sets, from 
    Ernest Artiaga, BSC.
 -- In select/cons_res, fix for function argument type mis-match in getting 
    CPU count for a job.
Danny Auble's avatar
Danny Auble committed
* Changes in SLURM 1.2.1
Moe Jette's avatar
Moe Jette committed
========================
 -- MPICHGM support bug fixes from Ernest Artiaga, BSC.
 -- Support longer hostlist strings, from Ernest Artiaga, BSC.
Danny Auble's avatar
Danny Auble committed

* Changes in SLURM 1.2.0
Moe Jette's avatar
Moe Jette committed
========================
 -- Srun to use env vars for SLURM_PROLOG, SLURM_EPILOG, SLURM_TASK_PROLOG, 
    and SLURM_TASK_EPILOG. patch.1.2.0-pre11.070201.envproepilog from 
    Dan Palermo, HP.
 -- Documenation update. patch.1.2.0-pre11.070201.mchtml from Dan Palermo, HP.
 -- Set SLURM_DIST_CYCLIC = 1 (needed for HP MPI, slurm.hp.env.patch).
* Changes in SLURM 1.2.0-pre15
==============================
 -- Fix for another spot where the backup controller calls switch/federation
    code before switch/federation is initialized.

* Changes in SLURM 1.2.0-pre14
==============================
 -- In sched/wiki2, clear required nodes list when a job is requeued.
    Note that the required node list is set to every node used when 
    a job is started via sched/wiki2.
 -- BLUEGENE - Added display of deallocating blocks to smap and other tools. 
 -- Make slurmctld's working directory be same as SlurmctldLogFile (if any),
    otherwise StateSaveDir (which is likely a shared directory, possibly 
    making core file identification more difficult).
 -- Fix bug in switch/federation that results in the backup controller
    aborting if it receives an epilog-complete message.
* Changes in SLURM 1.2.0-pre13
==============================
 -- Fix for --get-user-env.

* Changes in SLURM 1.2.0-pre12
==============================
 -- BLUEGENE - Added correct node info for sinfo and sview for viewing
    allocated nodes in a partition.
 -- BLUEGENE - Added state save on slurmctld shutdown of blocks in an error 
    state on real systems and total block config on emulation systems.
 -- Major update to Slurm's PMI internal logic for better scalability.
    Communications now supported directly between application tasks via 
    Slurm's PMI library. Srun sends single message to one task on each node
    and that tasks forwards key-pairs to other tasks on that nodes. The old 
    code sent key-pairs directly to each task. 
    NOTE: PMI applications must re-link with this new library.
 -- For multi-core support: Fix task distribution bug and add automated 
    tests, patch.1.2.0-pre11.070111.plane from Dan Palermo (HP).
Moe Jette's avatar
Moe Jette committed
* Changes in SLURM 1.2.0-pre11
==============================
 -- Add multi-core options to slurm_step_launch API.
 -- Add man pages for slurm_step_launch() and related functions.
 -- Jobacct plugin only looks at the proctrack list instead of the entire
    list of processes running on the node. Cutting down a lot of unnecessary
Loading
Loading full blame...