Skip to content
Snippets Groups Projects
NEWS 79.3 KiB
Newer Older
Christopher J. Morrone's avatar
Christopher J. Morrone committed
This file describes changes in recent versions of SLURM. It primarily
documents those changes that are of interest to users and admins. 
Moe Jette's avatar
Moe Jette committed
* Changes in SLURM 1.2.0-pre2
=============================

* Changes in SLURM 1.2.0-pre1
=============================
 -- Fix bug that could run a job's prolog more than once
 -- Permit batch jobs to be requeued, scontrol requeue <jobid>
 -- Send overcommit flag from srun in RPCs and have slurmd set SLURM_OVERCOMMIT
    flag at batch job launch time.
 -- Added new configuration parameter MessageTimeout (replaces #define in 
    the code)
Moe Jette's avatar
Moe Jette committed
 -- Added support for OSX build.
* Changes in SLURM 1.1.2
========================
 -- Fix bug in jobcomp/filetxt plugin to report proper NodeCnt when a job 
    fails due to a node failure.
 -- Fix Bluegene configure to work with the new 64bit libs.
 -- Fix bug in controller that causes it to segfault when hit with a malformed
    message.
 -- For "srun --attach=X" to other users job, report an error and exit (it 
    previously just hung).
 -- BLUEGENE - fix for doing correct small block logic on user error. 
 -- BLUEGENE - Added support in slurmd to create a fake libdb2.so if it
    doesn't exist so smap won't seg fault
 -- BLUEGENE - "scontrol show job" reports "MaxProcs=None" and "Start=None"
    if values are not specified at job submit time
 -- Add retry logic for PMI communications, may be needed for highly parallel
    jobs.
* Changes in SLURM 1.1.1
========================
 -- Fix bug in packing job suspend/resume RPC.
 -- If a user breaks out of srun before the allocation takes place, mark the 
    job as CANCELLED rather than COMPLETED and change its start and end time 
    to that time.
 -- Fix bug in PMI support that prevented use of second PMI_Barrier call.
    This fix is needed for MVAPICH2 use.
 -- Add "-V" options to slurmctld and slurmd to print version number and exit.
 -- Fix scalability bug in sbcast.
 -- Fix bug in cons_res allocation strategy.
 -- Fix bug in forwarding with mpi
 -- Fix bug sacct forwarding with stat option
 -- Added nodeid to sacct stat information
 -- cleaned up way slurm_send_recv_node_msg works no more clearing errno
 -- Fix error handling bug in the networking code that causes the slurmd to
    xassert if the server is not running when the slurmd tries to register.

* Changes in SLURM 1.1.0
========================
Moe Jette's avatar
Moe Jette committed
 -- Fix bug that could temporarily make nodes DOWN when they are really 
 -- Fix bug preventing backup slurmctld from responding to PING RPCs.
 -- Set "CFLAGS=-DISO8601" before configuration to get ISO8601 format 
    times for all SLURM commands. NOTE: This may break Moab, Maui, and/or 
    LSF schedulers.
 -- Fix for srun -n and -O options when paired with -b.
 -- Added logic for fanout to failover to forward list if main node is 
    unreachable 
 -- sacct also now keeps track of submitted, started and ending times of jobs
 -- reinit config file mutex at beginning of slurmstepd to avoid fork issues
=============================
 -- Fix bug in enforcement of partition's MaxNodes limit.
 -- BLUEGENE - added support for srun -w option also fixed the geometry option
    for srun.
 -- Accounting works for aix systems, use jobacct/aix
 -- Support large (over 2GB) files on 32-bit linux systems
 -- changed all writes to safe_write in srun
 -- added $float to globals.example in the testsuite
 -- Set job's num_proc correctly for jobs that do not have exclusive use 
    of it's allocated nodes.
 -- Change in support for test suite: 'testsuite/expect/globals.example'
    is now 'testsuite/expect/globals' and you can override variable 
    settings with a new file 'testsuite/expect/globals.local'.
 -- Job suspend now sends SIGTSTP, sleep(1), sends SIGSTOP for better
    MPI support.
Moe Jette's avatar
Moe Jette committed
 -- Plug a bunch of memory leaks in various places.
 -- Bluegene - before assigning a job to a block the plugin will check the bps
    to make sure they aren't in error state.
 -- Change time format in job completion logging (JobCompType=jobcomp/filetxt)
    from "MM/DD HH:MM:SS" to "YYYY-MM-DDTHH:MM:SS", conforming with the ISO8601 
* Changes in SLURM 1.1.0-pre6
=============================
 -- Added logic to "stat" a running job with sacct option -S use -j to specify
    job.step 
 -- removed jobacct/bluegene (no real need for this) meaning, I don't think 
    there is a way to gather the data yet.
 -- Added support for mapping "%h" in configured SlurmdLog to the hostname.
 -- Add PropagatePrioProcess to control propagation of a user's nice value 
    to spawned tasks (based upon work by Daniel Christians, HP).
* Changes in SLURM 1.1.0-pre5
=============================
 -- Added step completion RPC logic
 -- Vastly changed sacct and the jobacct plugin.  Read documentation for full
    details.
 -- Added jobacct plugin for AIX and BlueGene, they currently don't work, 
    but infrastructure is in place.
 -- Add support for srun option --ctrl-comm-ifhn to set PMI communications
    address (Hongjia Cao, National University of Defense Technology).
Danny Auble's avatar
Danny Auble committed
 -- Moved safe_read/write to slurm_protocol_defs.h removing multiple copies.
 -- Remove vestigial functions slurm_allocate_resources_and_run() and 
    slurm_free_resource_allocation_and_run_response_msg().
 -- Added support for different executable files and arguments by task based
    upon a configuration file. See srun's --multi-prog option (based upon 
    work by Hongjia Cao, National University of Defense Technology).
 -- moved the way forward logic waited for fanout logic mostly eliminating 
    problems with scalability issues.
Danny Auble's avatar
Danny Auble committed
 -- changed -l option in sacct to display different params see sacct/sacct.h
    for details.
* Changes in SLURM 1.1.0-pre4
=============================
 -- Bluegene specific - Added support to set bluegene block state to 
    free/error via scontrol update BlockName 
Moe Jette's avatar
Moe Jette committed
 -- Add needed symbol to select/bluegene in order to load plugin.
Moe Jette's avatar
Moe Jette committed
* Changes in SLURM 1.1.0-pre3
=============================
 -- Added framework for XCPU job launch support.
 -- New general configuration file parser and slurm.conf handling code.
    Allows long lines to be continued on the next line by ending with a "\".
    Whitespace is allowed between the key and "=", and between the "=" and
    value.
    WARNING: A NodeName may now occur only once in a slurm.conf file.
             If you want to temporarily make nodes DOWN in the slurm.conf,
             use the new DownNodes keyword (see "man slurm.conf").
 -- Gracefully handle request to submit batch job from within an existing 
    batch job.
 -- Warn user attempting to create a job allocation from within an existing job
    allocation.
 -- Add web page description for proctrack plugin.
 -- Add new function slurm_get_rem_time() for job's time limit.
 -- JobAcct plugin renamed from "log" to "linux" in preparation for support of 
    new system types. 
    WARNING: "JobAcctType=jobacct/log" is no longer supported.
 -- Removed vestigal 'bg' names from bluegene plugin and smap
 -- InactiveLimit parameter is not enforced for RootOnly partitions.
 -- Update select/cons_res web page (Susanne Balle, HP, 
    cons_res_doc_patch_3_29_06).
 -- Build a "slurmd.test" along with slurmd. slurmd.test has the path to 
    slurmstepd set allowing it to run unmodified out of the builddir for 
    testing (Mark Grondona).
* Changes in SLURM 1.1.0-pre2
=============================
 -- Added "bcast" command to transmit copies of a file to compute nodes
    with message fanout.
 -- Bluegene specific - Added support for overlapping partitions and 
    dynamic partitioning. 
 -- Bluegene specific - Added support for nodecard sized blocks.
 -- Added logic to accept 1k for 1024 and so on for --nodes option of srun. 
    This logic is through display tools such as smap, sinfo, scontrol, and 
    squeue.
 -- Added bluegene.conf man page.
 -- Added support for memory affinity, see srun --mem_bind option.
* Changes in SLURM 1.1.0-pre1
=============================
 -- New --enable-multiple-slurmd configure parameter to allow running
    more than one copy of slurmd on a node at the same time.  Only
    really useful for developers.
 -- New communication is now branched on all processes to slurmd's from 
    slurmctld and srun launch command.  This is done with a tree type 
    algorithm.  Spawn and batch mode work the same as before.  New slurm.conf
    variable TreeWidth=50 is default.  This is the number of threads per 
    stop on the tree.  
 -- Configuration parameter HeartBeatInterval is depracated. Now used half
    of SlurmdTimeout and SlurmctldTimeout for communications to slurmd and
    slurmctld daemons repsectively.
 -- Add hash tables for select/cons_res plugin (Susanne Balle, HP, 
    patch_02222006).
 -- Remove some use of cr_enabled flag in slurmctld job record, use 
    new flag "test_only" in select_g_job_test() instead.
* Changes in SLURM 1.0.13
=========================
 -- Fix for AllowGroups option to work when the /etc/group file doesn't 
    contain all users in group by adding the uids of the names in /etc/passwd
    that have a gid of that which we are looking for.
 -- Fix bug in InactiveLimit support that can potentially purge active jobs.
    NOTE: This is highly unlikely except on very large AIX clusters.
* Changes in SLURM 1.0.12
=========================
 -- Report node state of DRAIN rather than DOWN if DOWN with DRAIN flag set.
 -- Initialize job->mail_type to 0 (NONE) for job submission.
 -- Fix for stalled task stdout/stderr when buffered I/O is used, and
    a single line exceeds 4096 bytes.
 -- Memory leak fixes for maui plugin (hjcao@nudt.edu.cn)
Loading
Loading full blame...