Skip to content
Snippets Groups Projects
NEWS 123 KiB
Newer Older
Christopher J. Morrone's avatar
Christopher J. Morrone committed
This file describes changes in recent versions of SLURM. It primarily
documents those changes that are of interest to users and admins.
Moe Jette's avatar
Moe Jette committed
* Changes in SLURM 1.3.0-pre3
=============================

* Changes in SLURM 1.3.0-pre2
=============================
 -- Added new srun option --pty to start job with pseudo terminal attached 
    to task 0 (all other tasks have I/O discarded)
 -- Disable user specifying jobid when sched/wiki2 configured (needed for 
    Moab releases until early 2007).
 -- Report command, args and working directory for batch jobs with 
    "scontrol show job".
* Changes in SLURM 1.3.0-pre1
=============================
 -- !!! SRUN CHANGES !!!
    The srun options -A/--allocate, -b/--batch, and -a/--attach have been
    removed!  That functionality is now available in the separate commands
    salloc, sbatch, and sattach, respectively.
 -- Add new node state FAILING plus trigger for when node enters that state.
 -- Add new configuration paramter "PrivateData". This can be used to 
    prevent a user from seeing jobs or job steps belonging to other users.
 -- Added configuration parameters for node power save mode: ResumeProgram
    ResumeRate, SuspendExcNodes, SuspendExcParts, SuspendProgram and 
    SuspendRate.
 -- Slurmctld maintains the IP address (rather than hostname) for srun 
    communications. This fixes some possible network routing issues.
Danny Auble's avatar
Danny Auble committed
 -- Added global database plugin.  Job accounting and Job completion are the 
    first to use it.  Follow documentation to add more to the plugin.
 -- Removed no-longer-needed jobacct/common/common_slurmctld.c since that is
    replaced by the database plugin.
Moe Jette's avatar
Moe Jette committed
 -- Added new configuration parameter: CryptoType.
    Moved existing digital signature logic into new plugin: crypto/openssl.
    Added new support for crypto/munge (available with GPL license).
* Changes in SLURM 1.2.14
=========================
 -- Fix a couple of bugs in MPICH/MX support (from Asier Roa, BSC).
 -- Fix perl api for AIX
 -- Add wiki.conf parameter ExcludePartitions for selected partitions to 
    be directly schedule by Slurm without Moab control
 -- Optimize load leveling for shared nodes (alloc.patch, contributed 
    by Chris Holmes, HP).
 -- Added PMI_TIME environment variable for user to control how PMI 
    communications are spread out in time. See "man srun" for details.
 -- Added PMI timing information to srun debug mode to aid in tuning.
    Use "srun -vv ..." to see the information.
 -- Added checkpoint/ompi (OpenMPI) plugin (still under development).
 -- Fix bug in load leveling logic added to v1.2.13 which can cause an 
    infinite loop and hang slurmctld when sharing nodes between jobs.
* Changes in SLURM 1.2.13
=========================
 -- Add slurm.conf parameter JobFileAppend.
 -- Fix for segv in "scontrol listpids" on nodes not in SLURM config.
 -- Add support for SCANCEL_CTLD env var.
 -- In mpi/mvapich plugin, add startup timeout logic. Time based upon 
    SLURM_MVAPICH_TIMEOUT (value in seconds).
 -- Fixed pick_step_node logic to only pick the number of nodes requested
    from the user when excluding nodes, to avoid an error message.
 -- Disable salloc, sbatch and srun -I/--immediate options with 
    Moab scheduler.
 -- Added "contribs" directory with a Perl API and Torque wrappers for Torque 
    to SLURM migration.  This directory should be used to put anything that 
    is outside of SLURM proper such as a different API. Perl APIs contributed 
    by Hongjia Cao (NUDT).
 -- In sched/wiki2: add support for tasklist with node name expressions 
    and task counts (e.g. TASKLIST=tux[1-4]*2:tux[12-14]*4").
 -- In select/cons_res with sched/wiki2: fix bug in task layout logic.
 -- Removed all curses info from the bluegene plugin putting it into smap
    where it belongs.  
 -- Add support for job time limit specification formats: min, min:sec, 
    hour:min:sec, and days-hour:min:sec (formerly only supported minutes).
    Applies to salloc, sbatch, and srun commands.
 -- Improve scheduling support for exclusive constraint list, nodes can 
    now be in more than one constraint specific exclusively for a job
    (e.g. "srun -C [rack1|rack2|rack3|rowB] srun")
 -- Create separate MPICH/MX plugin (split out from MPICH/GM plugin)
 -- Increase default MessageTimeout (in slurm.conf) from 5 to 10 secs.
 -- Fix bug in batch job requeue if node zero of allocation fails to respond 
    to task launch request.
 -- Improve load leveling logic to more evenly distribute the workload 
    (best_load.patch, contributed by Chris Holmes, HP).
* Changes in SLURM 1.2.12
=========================
 -- Increase maximum message size from 1MB to 16MB (from Ernest Artiaga, BSC). 
 -- In PMI_Abort(), log the event and abort the entire job step.
 -- Add support for additional PMI functions: PMI_Get_clique_ranks and 
    PMI_Get_clique_size (from Chuck Clouston, Bull).
 -- Report an error when a hostlist comes in appearing to be a box but not 
    formatted in XYZxXYZ format.
 -- Add support for partition configuration "Shared=exclusive". This is 
    equivalent to "srun --exclusive" when select/cons_res is configured.
 -- In sched/wiki2, report the reason for a node being unavailable for the 
    GETNODES command using the CAT="<reason>" field.
 -- In sched/wiki2 with select/linear, duplicate hostnames in HOSTLIST, one
    per allocated processor.
 -- Fix bug in scancel with specific signal and job lacks active steps.
 -- In sched/wiki2, add support for NOTIFYJOB ARG=<jobid> MSG=<message>.
    This sends a message to an active srun command.
 -- salloc will now set SLURM_NPROCS to improve srun's behavior under salloc.
 -- In sched/wiki2 and select/cons_res: insure that Slurm's CPU allocation
    is identical to Moab's (from Ernest Artiaga and Asier Roa, BSC).
 -- Added "scontrol show slurmd" command to status local slurmd daemon.
 -- Set node DOWN if prolog fails on node zero of batch job launch.
 -- Properly handle "srun --cpus-per-task" within a job allocation when 
    SLURM_TASKS_PER_NODE environment varable is not set.
 -- Fixed return of slurm_send_rc_msg if msg->conn_fd is < 0 set errno ENOTCONN
    and return SLURM_ERROR instead of return ENOTCONN
 -- Added read before we send anything down a socket to make sure the socket
    is still there.
 -- Add slurm.conf variables UnkillableStepProgram and UnkillableStepTimeout.
 -- Enable nice file propagation from sbatch command.
* Changes in SLURM 1.2.11
=========================
 -- Updated "etc/mpich1.slurm.patch" for direct srun launch of MPICH1_P4
    tasks. See the "README" portion of the patch for details.
 -- Added new scontrol command "show hostlist <hostnames>" to translate a list 
    of hostnames into a hostlist expression (e.g. "tux1,tux2" -> "tux[1-2]")
    and "show hostnames <list>", returns a list of of nodes (one node per line)
    from SLURM hostlist expression or from SLURM_NODELIST environment variable 
    if no hostlist specified.
 -- Add the sbatch option "--get-user-env".
 -- Added support for mpich-mx (use the mpichgm plugin).
 -- Make job's stdout and stderr file access rights be based upon user's umask
    at job submit time.
 -- Add support for additional PMI functions: PMI_Parse_option,
    PMI_Args_to_keyval, PMI_Free_keyvals and PMI_Get_options (from Puenlap Lee
    and Nancy Kritkausky, Bull).
 -- Make default value of SchedulerPort (configuration parameter) be 7321.
 -- Use SLURM_UMASK environment variable (if set) at job submit time as umask 
    for spawned job.
 -- Correct some format issues in the man pages (from Gennero Oliva, ICAR).
 -- Added support for parallel make across an existing SLURM allocation
    based upon GNU make-3.81. Patch is in "etc/make.slurm.patch".
 -- Added '-b' option to sbatch for easy MOAB trasition to sbatch instead of
    srun.  Option does nothing in sbatch.
 -- Changed wiki2's handling of a node state in Completing to return 'busy' 
    instead of 'running' which matches slurm version 1.1
Danny Auble's avatar
Danny Auble committed
* Changes in SLURM 1.2.10
=========================
 -- Fix race condititon in jobacct/linux with use of proctrack/pgid and a
    realloc issue inside proctrack/linux
 -- Added MPICH1_P4 plugin for direct launch of mpich1/p4 tasks using srun
    and a patched version of the mpi library. See "etc/mpich1.slurm.patch".
    NOTE: This is still under development and not ready for production use.
Danny Auble's avatar
Danny Auble committed

Moe Jette's avatar
Moe Jette committed
* Changes in SLURM 1.2.9
========================
 -- Add new sinfo field to sort by "%E" sorts by the time associated with a 
    node's state (from Prashanth Tamraparni, HP).
 -- In sched/wiki: fix logic for restarting backup slurmctld.
 -- Preload SLURM plugins early in the slurmstepd operation to avoid
    multiple dlopens after forking (and to avoid a glibc bug
    that leaves dlopen locks in a bad state after a fork).
 -- Added MPICH1_P4 patch to launch tasks using srun rather than rsh and
    automatically generate mpirun's machinefile based upon the job's 
    allocation.    See "etc/mpich1.slurm.patch".
 -- BLUEGENE - fix for overlap mode to mark all other base partitions as used
    when creating a new block from the file to insure we only use the base 
    partitions we are asking for.
Moe Jette's avatar
Moe Jette committed

Moe Jette's avatar
Moe Jette committed
* Changes in SLURM 1.2.8
========================
Moe Jette's avatar
Moe Jette committed
 -- Added mpi/mpich1_shmem plugin.
Moe Jette's avatar
Moe Jette committed
 -- Fix in proctrack/sgi_job plugin that could cause slurmstepd to seg_fault
    preventing timely clean-up of batch jobs in some cases.
Moe Jette's avatar
Moe Jette committed

Moe Jette's avatar
Moe Jette committed
* Changes in SLURM 1.2.7
========================
 -- BLUEGENE - code to make it so you can make a 36x36x36 system.  
    The wiring should be correct for a system with x-dim of 1,2,4,5,8,13
    in emulation mode.  It will work with any real system no matter the size.
 -- Major re-write of jobcomp/script plugin: fix memory leak and 
    general code clean-up.
 -- Add ability to change MaxNodes and ExcNodeList for pending job 
    using scontrol.
 -- Purge zombie processes spawned via event triggers.
Moe Jette's avatar
Moe Jette committed
 -- Add support for power saving mode (experimental code to reduce voltage
    and frequency on nodes that stay in the IDLE state, for more information 
Moe Jette's avatar
Moe Jette committed
    see http://www.llnl.gov/linux/slurm/power_save.html). None of this
    code is enabled by default.
* Changes in SLURM 1.2.6
========================
 -- Fix MPIRUN_PORT env variable in mvapich plugin
 -- Disable setting triggers by other than user SlurmUser unless SlurmUser
    is root for improved security.
 -- Add event trigger for IDLE nodes.
* Changes in SLURM 1.2.5
========================
 -- Fix nodelist truncation in "scontrol show jobs" output
 -- In mpi/mpichgm, fix potential problem formatting GMPI_PORT, from
Loading
Loading full blame...