Skip to content
Snippets Groups Projects
NEWS 81.9 KiB
Newer Older
Christopher J. Morrone's avatar
Christopher J. Morrone committed
This file describes changes in recent versions of SLURM. It primarily
documents those changes that are of interest to users and admins. 
Moe Jette's avatar
Moe Jette committed
* Changes in SLURM 1.2.0-pre2
=============================
 -- Fixed task dist to work with hostfile and warn about asking for more tasks 
    than you have nodes for in arbitray mode.
Moe Jette's avatar
Moe Jette committed

* Changes in SLURM 1.2.0-pre1
=============================
 -- Fix bug that could run a job's prolog more than once
 -- Permit batch jobs to be requeued, scontrol requeue <jobid>
 -- Send overcommit flag from srun in RPCs and have slurmd set SLURM_OVERCOMMIT
    flag at batch job launch time.
 -- Added new configuration parameter MessageTimeout (replaces #define in 
    the code)
Moe Jette's avatar
Moe Jette committed
 -- Added support for OSX build.
* Changes in SLURM 1.1.5
========================

* Changes in SLURM 1.1.4
========================
 -- Improve error handling in hierarchical communications logic.

* Changes in SLURM 1.1.3
========================
 -- Fix big-endian bug in the bitstring code which plagued AIX.
 -- Fix bug in handling srun's --multi-prog option, could go off end of buffer.
 -- Added support for job step completion (and switch window release) on 
    subset of allocated nodes.
 -- BLUEGENE - removed configure option --with-bg-link bridge is linked with 
    dlopen now no longer needing fake database so files on frontend node.
 -- BLUEGENE - implemented use of rm_get_partition_info instead of 
    ...partitions_info which has made a much better design improving stability.
 -- Streamline PMI communications and increase timeouts for highly parallel 
    jobs. Improves scalability of PMI.
* Changes in SLURM 1.1.2
========================
 -- Fix bug in jobcomp/filetxt plugin to report proper NodeCnt when a job 
    fails due to a node failure.
 -- Fix Bluegene configure to work with the new 64bit libs.
 -- Fix bug in controller that causes it to segfault when hit with a malformed
    message.
 -- For "srun --attach=X" to other users job, report an error and exit (it 
    previously just hung).
 -- BLUEGENE - fix for doing correct small block logic on user error. 
 -- BLUEGENE - Added support in slurmd to create a fake libdb2.so if it
    doesn't exist so smap won't seg fault
 -- BLUEGENE - "scontrol show job" reports "MaxProcs=None" and "Start=None"
    if values are not specified at job submit time
 -- Add retry logic for PMI communications, may be needed for highly parallel
    jobs.
 -- Fix bug in slurmd where variable is used in logging message after freed
    (slurmstepd rank info).
 -- Fix bug in scontrol show daemons if NodeName=localhost will work now to
    display slurmd as place where it is running.  
 -- Patch from HP for init nodes before init_bitmaps
 -- ctrl-c killed sruns will result in job state as cancelled instead of 
    completed.
 -- BLUEGENE - added configure option --with-bg-link to choose dynamic linking
    or static linking with the bridgeapi.
* Changes in SLURM 1.1.1
========================
 -- Fix bug in packing job suspend/resume RPC.
 -- If a user breaks out of srun before the allocation takes place, mark the 
    job as CANCELLED rather than COMPLETED and change its start and end time 
    to that time.
 -- Fix bug in PMI support that prevented use of second PMI_Barrier call.
    This fix is needed for MVAPICH2 use.
 -- Add "-V" options to slurmctld and slurmd to print version number and exit.
 -- Fix scalability bug in sbcast.
 -- Fix bug in cons_res allocation strategy.
 -- Fix bug in forwarding with mpi
 -- Fix bug sacct forwarding with stat option
 -- Added nodeid to sacct stat information
 -- cleaned up way slurm_send_recv_node_msg works no more clearing errno
 -- Fix error handling bug in the networking code that causes the slurmd to
    xassert if the server is not running when the slurmd tries to register.

* Changes in SLURM 1.1.0
========================
Moe Jette's avatar
Moe Jette committed
 -- Fix bug that could temporarily make nodes DOWN when they are really 
 -- Fix bug preventing backup slurmctld from responding to PING RPCs.
 -- Set "CFLAGS=-DISO8601" before configuration to get ISO8601 format 
    times for all SLURM commands. NOTE: This may break Moab, Maui, and/or 
    LSF schedulers.
 -- Fix for srun -n and -O options when paired with -b.
 -- Added logic for fanout to failover to forward list if main node is 
    unreachable 
 -- sacct also now keeps track of submitted, started and ending times of jobs
 -- reinit config file mutex at beginning of slurmstepd to avoid fork issues
=============================
 -- Fix bug in enforcement of partition's MaxNodes limit.
 -- BLUEGENE - added support for srun -w option also fixed the geometry option
    for srun.
 -- Accounting works for aix systems, use jobacct/aix
 -- Support large (over 2GB) files on 32-bit linux systems
 -- changed all writes to safe_write in srun
 -- added $float to globals.example in the testsuite
 -- Set job's num_proc correctly for jobs that do not have exclusive use 
    of it's allocated nodes.
 -- Change in support for test suite: 'testsuite/expect/globals.example'
    is now 'testsuite/expect/globals' and you can override variable 
    settings with a new file 'testsuite/expect/globals.local'.
 -- Job suspend now sends SIGTSTP, sleep(1), sends SIGSTOP for better
    MPI support.
Moe Jette's avatar
Moe Jette committed
 -- Plug a bunch of memory leaks in various places.
 -- Bluegene - before assigning a job to a block the plugin will check the bps
    to make sure they aren't in error state.
 -- Change time format in job completion logging (JobCompType=jobcomp/filetxt)
    from "MM/DD HH:MM:SS" to "YYYY-MM-DDTHH:MM:SS", conforming with the ISO8601 
* Changes in SLURM 1.1.0-pre6
=============================
 -- Added logic to "stat" a running job with sacct option -S use -j to specify
    job.step 
 -- removed jobacct/bluegene (no real need for this) meaning, I don't think 
    there is a way to gather the data yet.
 -- Added support for mapping "%h" in configured SlurmdLog to the hostname.
 -- Add PropagatePrioProcess to control propagation of a user's nice value 
    to spawned tasks (based upon work by Daniel Christians, HP).
* Changes in SLURM 1.1.0-pre5
=============================
 -- Added step completion RPC logic
 -- Vastly changed sacct and the jobacct plugin.  Read documentation for full
    details.
 -- Added jobacct plugin for AIX and BlueGene, they currently don't work, 
    but infrastructure is in place.
 -- Add support for srun option --ctrl-comm-ifhn to set PMI communications
    address (Hongjia Cao, National University of Defense Technology).
Danny Auble's avatar
Danny Auble committed
 -- Moved safe_read/write to slurm_protocol_defs.h removing multiple copies.
 -- Remove vestigial functions slurm_allocate_resources_and_run() and 
    slurm_free_resource_allocation_and_run_response_msg().
 -- Added support for different executable files and arguments by task based
    upon a configuration file. See srun's --multi-prog option (based upon 
    work by Hongjia Cao, National University of Defense Technology).
 -- moved the way forward logic waited for fanout logic mostly eliminating 
    problems with scalability issues.
Danny Auble's avatar
Danny Auble committed
 -- changed -l option in sacct to display different params see sacct/sacct.h
    for details.
* Changes in SLURM 1.1.0-pre4
=============================
 -- Bluegene specific - Added support to set bluegene block state to 
    free/error via scontrol update BlockName 
Moe Jette's avatar
Moe Jette committed
 -- Add needed symbol to select/bluegene in order to load plugin.
Moe Jette's avatar
Moe Jette committed
* Changes in SLURM 1.1.0-pre3
=============================
 -- Added framework for XCPU job launch support.
 -- New general configuration file parser and slurm.conf handling code.
    Allows long lines to be continued on the next line by ending with a "\".
    Whitespace is allowed between the key and "=", and between the "=" and
    value.
    WARNING: A NodeName may now occur only once in a slurm.conf file.
             If you want to temporarily make nodes DOWN in the slurm.conf,
             use the new DownNodes keyword (see "man slurm.conf").
 -- Gracefully handle request to submit batch job from within an existing 
    batch job.
 -- Warn user attempting to create a job allocation from within an existing job
    allocation.
 -- Add web page description for proctrack plugin.
 -- Add new function slurm_get_rem_time() for job's time limit.
 -- JobAcct plugin renamed from "log" to "linux" in preparation for support of 
    new system types. 
    WARNING: "JobAcctType=jobacct/log" is no longer supported.
 -- Removed vestigal 'bg' names from bluegene plugin and smap
 -- InactiveLimit parameter is not enforced for RootOnly partitions.
 -- Update select/cons_res web page (Susanne Balle, HP, 
    cons_res_doc_patch_3_29_06).
 -- Build a "slurmd.test" along with slurmd. slurmd.test has the path to 
    slurmstepd set allowing it to run unmodified out of the builddir for 
    testing (Mark Grondona).
* Changes in SLURM 1.1.0-pre2
=============================
 -- Added "bcast" command to transmit copies of a file to compute nodes
    with message fanout.
 -- Bluegene specific - Added support for overlapping partitions and 
    dynamic partitioning. 
 -- Bluegene specific - Added support for nodecard sized blocks.
 -- Added logic to accept 1k for 1024 and so on for --nodes option of srun. 
    This logic is through display tools such as smap, sinfo, scontrol, and 
    squeue.
 -- Added bluegene.conf man page.
 -- Added support for memory affinity, see srun --mem_bind option.
* Changes in SLURM 1.1.0-pre1
=============================
Loading
Loading full blame...