Skip to content
Snippets Groups Projects
NEWS 75.4 KiB
Newer Older
Christopher J. Morrone's avatar
Christopher J. Morrone committed
This file describes changes in recent versions of SLURM. It primarily
documents those changes that are of interest to users and admins. 
 -- Accounting works for aix systems, use jobacct/aix
 -- Support large (over 2GB) files on 32-bit linux systems
 -- changed all writes to safe_write in srun
 -- added $float to globals.example in the testsuite
 -- Limit attempted BGL jobs scheduled at once to 100 (too slow with
    dynamic scheduling to handle hundreds of jobs at once).
 -- Set job's num_proc correctly for jobs that do not have exclusive use 
    of it's allocated nodes.
* Changes in SLURM 1.1.0-pre6
=============================
 -- Added logic to "stat" a running job with sacct option -S use -j to specify
    job.step 
 -- removed jobacct/bluegene (no real need for this) meaning, I don't think 
    there is a way to gather the data yet.
 -- Added support for mapping "%h" in configured SlurmdLog to the hostname.
 -- Add PropagatePrioProcess to control propagation of a user's nice value 
    to spawned tasks (based upon work by Daniel Christians, HP).
* Changes in SLURM 1.1.0-pre5
=============================
 -- Added step completion RPC logic
 -- Vastly changed sacct and the jobacct plugin.  Read documentation for full
    details.
 -- Added jobacct plugin for AIX and BlueGene, they currently don't work, 
    but infrastructure is in place.
 -- Add support for srun option --ctrl-comm-ifhn to set PMI communications
    address (Hongjia Cao, National University of Defense Technology).
Danny Auble's avatar
Danny Auble committed
 -- Moved safe_read/write to slurm_protocol_defs.h removing multiple copies.
 -- Remove vestigial functions slurm_allocate_resources_and_run() and 
    slurm_free_resource_allocation_and_run_response_msg().
 -- Added support for different executable files and arguments by task based
    upon a configuration file. See srun's --multi-prog option (based upon 
    work by Hongjia Cao, National University of Defense Technology).
 -- moved the way forward logic waited for fanout logic mostly eliminating 
    problems with scalability issues.
Danny Auble's avatar
Danny Auble committed
 -- changed -l option in sacct to display different params see sacct/sacct.h
    for details.
* Changes in SLURM 1.1.0-pre4
=============================
 -- Bluegene specific - Added support to set bluegene block state to 
    free/error via scontrol update BlockName 
Moe Jette's avatar
Moe Jette committed
 -- Add needed symbol to select/bluegene in order to load plugin.
Moe Jette's avatar
Moe Jette committed
* Changes in SLURM 1.1.0-pre3
=============================
 -- Added framework for XCPU job launch support.
 -- New general configuration file parser and slurm.conf handling code.
    Allows long lines to be continued on the next line by ending with a "\".
    Whitespace is allowed between the key and "=", and between the "=" and
    value.
    WARNING: A NodeName may now occur only once in a slurm.conf file.
             If you want to temporarily make nodes DOWN in the slurm.conf,
             use the new DownNodes keyword (see "man slurm.conf").
 -- Gracefully handle request to submit batch job from within an existing 
    batch job.
 -- Warn user attempting to create a job allocation from within an existing job
    allocation.
 -- Add web page description for proctrack plugin.
 -- Add new function slurm_get_rem_time() for job's time limit.
 -- JobAcct plugin renamed from "log" to "linux" in preparation for support of 
    new system types. 
    WARNING: "JobAcctType=jobacct/log" is no longer supported.
 -- Removed vestigal 'bg' names from bluegene plugin and smap
 -- InactiveLimit parameter is not enforced for RootOnly partitions.
 -- Update select/cons_res web page (Susanne Balle, HP, 
    cons_res_doc_patch_3_29_06).
 -- Build a "slurmd.test" along with slurmd. slurmd.test has the path to 
    slurmstepd set allowing it to run unmodified out of the builddir for 
    testing (Mark Grondona).
* Changes in SLURM 1.1.0-pre2
=============================
 -- Added "bcast" command to transmit copies of a file to compute nodes
    with message fanout.
 -- Bluegene specific - Added support for overlapping partitions and 
    dynamic partitioning. 
 -- Bluegene specific - Added support for nodecard sized blocks.
 -- Added logic to accept 1k for 1024 and so on for --nodes option of srun. 
    This logic is through display tools such as smap, sinfo, scontrol, and 
    squeue.
 -- Added bluegene.conf man page.
 -- Added support for memory affinity, see srun --mem_bind option.
* Changes in SLURM 1.1.0-pre1
=============================
 -- New --enable-multiple-slurmd configure parameter to allow running
    more than one copy of slurmd on a node at the same time.  Only
    really useful for developers.
 -- New communication is now branched on all processes to slurmd's from 
    slurmctld and srun launch command.  This is done with a tree type 
    algorithm.  Spawn and batch mode work the same as before.  New slurm.conf
    variable TreeWidth=50 is default.  This is the number of threads per 
    stop on the tree.  
 -- Configuration parameter HeartBeatInterval is depracated. Now used half
    of SlurmdTimeout and SlurmctldTimeout for communications to slurmd and
    slurmctld daemons repsectively.
 -- Add hash tables for select/cons_res plugin (Susanne Balle, HP, 
    patch_02222006).
 -- Remove some use of cr_enabled flag in slurmctld job record, use 
    new flag "test_only" in select_g_job_test() instead.
* Changes in SLURM 1.0.12
=========================
 -- Report node state of DRAIN rather than DOWN if DOWN with DRAIN flag set.
 -- Initialize job->mail_type to 0 (NONE) for job submission.
 -- Fix for stalled task stdout/stderr when buffered I/O is used, and
    a single line exceeds 4096 bytes.
 -- Fix for spinning srun when the terminal to which srun is talking
    goes away.
 -- Don't set avail_node_bitmap for DRAINED nodes on slurmctld reconfig
    (can schedule a job on drained node after reconfig).
* Changes in SLURM 1.0.11
=========================
 -- Fix for slurmstepd hang when launching a task. (Needed to install
    list library's atfork handlers).
 -- Fix memory leak on AIX (and possibly other architectures) due to
    missing pthread_attr_destroy() calls.
 -- Fix rare task standard I/O setup bug.  When the bug hit, stdin, stdout,
    or stderr could be an invalid file descriptor.
 -- General slurmstepd file descriptor cleanup.
 -- Fix memory leak in job accounting logic (Andy Riebs, HP, memory_leak.patch).
* Changes in SLURM 1.0.10
=========================
 -- Fix for job accounting logic submitted from Andy Riebs to handle issues
    with suspending jobs and such. patch file named requeue.patch
 -- Make select/cons_res interoperate with mpi/lam plugin for task counts.
 -- Fix race condition where srun could seg-fault due to use of logging functions
    within pthread after calling log_fini.
 -- Code changes for clean build with gcc 2.96 (gcc_2_96.patch, Takao Hatazaki, HP).
 -- Add CacheGroups configuration support in configurator.html (configurator.patch,
    Takao Hatazaki, HP).
 -- Fix bug preventing use of mpich-gm plugin (mpichgm.patch, Takao Hatazaki, HP).
* Changes in SLURM 1.0.9
========================
 -- Fix job accounting logic to open new log file on slurmctld reconfig.
    (Andy Riebs, slurm.hp.logfile.patch).
 -- Fix bug which allows a user to run a batch script on a node not allocated
    by the slurmctld.
 -- Fix poe MP_HOSTFILE handling bug on AIX.

* Changes in SLURM 1.0.8
========================
 -- Fix to communication between slurmd and slurmstepd to allow for partial
    reads and writes on their communication pipes.

* Changes in SLURM 1.0.7
========================
 -- Change in how AuthType=auth/dummy is handled for security testing.
 -- Fix for bluegene systems to allow full system partitions to stay booted 
    when other jobs are submitted to the queue.

* Changes in SLURM 1.0.6
========================
 -- Prevent slurmstepd from crashing when srun attaches to batch job.

* Changes in SLURM 1.0.5
========================
 -- Restructure logic for scheduling BlueGene small block jobs. Added
    "test_only" flag to select_p_job_test() in select plugin.
 -- Correct squeue "NODELIST" output for BlueGene small block jobs.
 -- Fix possible deadlock situations on BlueGene plugin on errors.

* Changes in SLURM 1.0.4
========================
 -- Release job allocation if step creation fails (especially for BlueGene).
 -- Fix bug select/bluegene warm start with changed bglblock layout.
 -- Fix bug for queuing full-system BlueGene jobs.

* Changes in SLURM 1.0.3
========================
 -- Fix bug that could refuse to queue batch jobs for BlueGene system.
 -- Add BlueGene plugin mutex lock for reconfig.
 -- Ignore BlueGene bgljobs in ERROR state (don't try to kill).
 -- Fix job accounting for batch jobs (Andy Riebs, HP, 
    slurm.hp.jobacct_divby0a.patch).
 -- Added proctrack/linuxproc.so to the main RPM.
 -- Added mutex around bridge api file to avoid locking up the api.
 -- BlueGene mod: Terminate slurm_prolog and slurm_epilog immediately if 
    SLURM_JOBID environment variable is invalid.
 -- Federation driver: allow selection of a sepecific switch interface
    (sni0, sni1, etc.) with -euidevice/MP_EUIDEVICE.
 -- Return an error for "scontrol reconfig" if there is already one in
    progress
* Changes in SLURM 1.0.2
========================
 -- Correctly report DRAINED node state as type OTHER for "sinfo --summarize".
 -- Fixes in sacct use of malloc (Andy Riebs, HP, sacct_malloc.patch).
 -- Smap mods: eliminate screen flicker, fix window resize, report more clear
    message if window too small (Dan Palermo, HP, patch.1.0.0.1.060126.smap).
Loading
Loading full blame...