Skip to content
Snippets Groups Projects
NEWS 76.8 KiB
Newer Older
Christopher J. Morrone's avatar
Christopher J. Morrone committed
This file describes changes in recent versions of SLURM. It primarily
documents those changes that are of interest to users and admins. 
* Changes in SLURM 1.1.0-pre9
=============================
 -- Fix bug that could temporariy make nodes DOWN when they are really 
    responding. 
 -- Fix bug preventing backup slurmctld from responding to PING RPCs.
 -- Set "CFLAGS=-DISO8601" before configuration to get ISO8601 format 
    times for all SLURM commands. NOTE: This may break Moab, Maui, and/or 
    LSF schedulers.
=============================
 -- Fix bug in enforcement of partition's MaxNodes limit.
 -- BLUEGENE - added support for srun -w option also fixed the geometry option
    for srun.
 -- Accounting works for aix systems, use jobacct/aix
 -- Support large (over 2GB) files on 32-bit linux systems
 -- changed all writes to safe_write in srun
 -- added $float to globals.example in the testsuite
 -- Set job's num_proc correctly for jobs that do not have exclusive use 
    of it's allocated nodes.
 -- Change in support for test suite: 'testsuite/expect/globals.example'
    is now 'testsuite/expect/globals' and you can override variable 
    settings with a new file 'testsuite/expect/globals.local'.
 -- Job suspend now sends SIGTSTP, sleep(1), sends SIGSTOP for better
    MPI support.
Moe Jette's avatar
Moe Jette committed
 -- Plug a bunch of memory leaks in various places.
 -- Bluegene - before assigning a job to a block the plugin will check the bps
    to make sure they aren't in error state.
 -- Change time format in job completion logging (JobCompType=jobcomp/filetxt)
    from "MM/DD HH:MM:SS" to "YYYY-MM-DDTHH:MM:SS", conforming with the ISO8601 
* Changes in SLURM 1.1.0-pre6
=============================
 -- Added logic to "stat" a running job with sacct option -S use -j to specify
    job.step 
 -- removed jobacct/bluegene (no real need for this) meaning, I don't think 
    there is a way to gather the data yet.
 -- Added support for mapping "%h" in configured SlurmdLog to the hostname.
 -- Add PropagatePrioProcess to control propagation of a user's nice value 
    to spawned tasks (based upon work by Daniel Christians, HP).
* Changes in SLURM 1.1.0-pre5
=============================
 -- Added step completion RPC logic
 -- Vastly changed sacct and the jobacct plugin.  Read documentation for full
    details.
 -- Added jobacct plugin for AIX and BlueGene, they currently don't work, 
    but infrastructure is in place.
 -- Add support for srun option --ctrl-comm-ifhn to set PMI communications
    address (Hongjia Cao, National University of Defense Technology).
Danny Auble's avatar
Danny Auble committed
 -- Moved safe_read/write to slurm_protocol_defs.h removing multiple copies.
 -- Remove vestigial functions slurm_allocate_resources_and_run() and 
    slurm_free_resource_allocation_and_run_response_msg().
 -- Added support for different executable files and arguments by task based
    upon a configuration file. See srun's --multi-prog option (based upon 
    work by Hongjia Cao, National University of Defense Technology).
 -- moved the way forward logic waited for fanout logic mostly eliminating 
    problems with scalability issues.
Danny Auble's avatar
Danny Auble committed
 -- changed -l option in sacct to display different params see sacct/sacct.h
    for details.
* Changes in SLURM 1.1.0-pre4
=============================
 -- Bluegene specific - Added support to set bluegene block state to 
    free/error via scontrol update BlockName 
Moe Jette's avatar
Moe Jette committed
 -- Add needed symbol to select/bluegene in order to load plugin.
Moe Jette's avatar
Moe Jette committed
* Changes in SLURM 1.1.0-pre3
=============================
 -- Added framework for XCPU job launch support.
 -- New general configuration file parser and slurm.conf handling code.
    Allows long lines to be continued on the next line by ending with a "\".
    Whitespace is allowed between the key and "=", and between the "=" and
    value.
    WARNING: A NodeName may now occur only once in a slurm.conf file.
             If you want to temporarily make nodes DOWN in the slurm.conf,
             use the new DownNodes keyword (see "man slurm.conf").
 -- Gracefully handle request to submit batch job from within an existing 
    batch job.
 -- Warn user attempting to create a job allocation from within an existing job
    allocation.
 -- Add web page description for proctrack plugin.
 -- Add new function slurm_get_rem_time() for job's time limit.
 -- JobAcct plugin renamed from "log" to "linux" in preparation for support of 
    new system types. 
    WARNING: "JobAcctType=jobacct/log" is no longer supported.
 -- Removed vestigal 'bg' names from bluegene plugin and smap
 -- InactiveLimit parameter is not enforced for RootOnly partitions.
 -- Update select/cons_res web page (Susanne Balle, HP, 
    cons_res_doc_patch_3_29_06).
 -- Build a "slurmd.test" along with slurmd. slurmd.test has the path to 
    slurmstepd set allowing it to run unmodified out of the builddir for 
    testing (Mark Grondona).
* Changes in SLURM 1.1.0-pre2
=============================
 -- Added "bcast" command to transmit copies of a file to compute nodes
    with message fanout.
 -- Bluegene specific - Added support for overlapping partitions and 
    dynamic partitioning. 
 -- Bluegene specific - Added support for nodecard sized blocks.
 -- Added logic to accept 1k for 1024 and so on for --nodes option of srun. 
    This logic is through display tools such as smap, sinfo, scontrol, and 
    squeue.
 -- Added bluegene.conf man page.
 -- Added support for memory affinity, see srun --mem_bind option.
* Changes in SLURM 1.1.0-pre1
=============================
 -- New --enable-multiple-slurmd configure parameter to allow running
    more than one copy of slurmd on a node at the same time.  Only
    really useful for developers.
 -- New communication is now branched on all processes to slurmd's from 
    slurmctld and srun launch command.  This is done with a tree type 
    algorithm.  Spawn and batch mode work the same as before.  New slurm.conf
    variable TreeWidth=50 is default.  This is the number of threads per 
    stop on the tree.  
 -- Configuration parameter HeartBeatInterval is depracated. Now used half
    of SlurmdTimeout and SlurmctldTimeout for communications to slurmd and
    slurmctld daemons repsectively.
 -- Add hash tables for select/cons_res plugin (Susanne Balle, HP, 
    patch_02222006).
 -- Remove some use of cr_enabled flag in slurmctld job record, use 
    new flag "test_only" in select_g_job_test() instead.
* Changes in SLURM 1.0.13
=========================
 -- Fix for AllowGroups option to work when the /etc/group file doesn't 
    contain all users in group by adding the uids of the names in /etc/passwd
    that have a gid of that which we are looking for.
 -- Fix bug in InactiveLimit support that can potentially purge active jobs.
 
* Changes in SLURM 1.0.12
=========================
 -- Report node state of DRAIN rather than DOWN if DOWN with DRAIN flag set.
 -- Initialize job->mail_type to 0 (NONE) for job submission.
 -- Fix for stalled task stdout/stderr when buffered I/O is used, and
    a single line exceeds 4096 bytes.
 -- Memory leak fixes for maui plugin (hjcao@nudt.edu.cn)
 -- Fix for spinning srun when the terminal to which srun is talking
    goes away.
 -- Don't set avail_node_bitmap for DRAINED nodes on slurmctld reconfig
    (can schedule a job on drained node after reconfig).
* Changes in SLURM 1.0.11
=========================
 -- Fix for slurmstepd hang when launching a task. (Needed to install
    list library's atfork handlers).
 -- Fix memory leak on AIX (and possibly other architectures) due to
    missing pthread_attr_destroy() calls.
 -- Fix rare task standard I/O setup bug.  When the bug hit, stdin, stdout,
    or stderr could be an invalid file descriptor.
 -- General slurmstepd file descriptor cleanup.
 -- Fix memory leak in job accounting logic (Andy Riebs, HP, memory_leak.patch).
* Changes in SLURM 1.0.10
=========================
 -- Fix for job accounting logic submitted from Andy Riebs to handle issues
    with suspending jobs and such. patch file named requeue.patch
 -- Make select/cons_res interoperate with mpi/lam plugin for task counts.
 -- Fix race condition where srun could seg-fault due to use of logging functions
    within pthread after calling log_fini.
 -- Code changes for clean build with gcc 2.96 (gcc_2_96.patch, Takao Hatazaki, HP).
 -- Add CacheGroups configuration support in configurator.html (configurator.patch,
    Takao Hatazaki, HP).
 -- Fix bug preventing use of mpich-gm plugin (mpichgm.patch, Takao Hatazaki, HP).
* Changes in SLURM 1.0.9
========================
 -- Fix job accounting logic to open new log file on slurmctld reconfig.
    (Andy Riebs, slurm.hp.logfile.patch).
 -- Fix bug which allows a user to run a batch script on a node not allocated
    by the slurmctld.
 -- Fix poe MP_HOSTFILE handling bug on AIX.

* Changes in SLURM 1.0.8
========================
 -- Fix to communication between slurmd and slurmstepd to allow for partial
    reads and writes on their communication pipes.

* Changes in SLURM 1.0.7
========================
 -- Change in how AuthType=auth/dummy is handled for security testing.
 -- Fix for bluegene systems to allow full system partitions to stay booted 
    when other jobs are submitted to the queue.

* Changes in SLURM 1.0.6
========================
 -- Prevent slurmstepd from crashing when srun attaches to batch job.

* Changes in SLURM 1.0.5
========================
Loading
Loading full blame...