Skip to content
Snippets Groups Projects
NEWS 65.4 KiB
Newer Older
Christopher J. Morrone's avatar
Christopher J. Morrone committed
This file describes changes in recent versions of SLURM. It primarily
documents those changes that are of interest to users and admins. 
Moe Jette's avatar
Moe Jette committed
* Changes in SLURM 1.0.1
========================
 -- Assorted updates and clarifications in documentation.
Danny Auble's avatar
Danny Auble committed
 -- able to detect which munge installation to use 32/64 bit
* Changes in SLURM 1.0.0
Moe Jette's avatar
Moe Jette committed
========================
 -- Fix sinfo filtering bug, especially "sinfo -R" output.
Moe Jette's avatar
Moe Jette committed
 -- Fix node state change bug, resuming down or drained nodes.
 -- Fix "scontrol show config" to display JobCredentialPrivateKey instead
    of JobCredPrivateKey and JobCredentialPublicCertificate instead of
    JobCredPublicKey.  They now match the options in the slurm.conf.
 -- Fix bug in job accounting for very long node list records (Andy Riebs,
    HP, sacct_buf.patch).
Danny Auble's avatar
Danny Auble committed
 -- BLUEGENE SPECIFIC - added load function to smap to load an already 
    exsistant bluegene.conf file.
 -- Fix bug in sacct: If user requests specific job or job step ID,
    only the last one with that ID will be reported. If multiple 
    nodes fail, the job has its state recorded as "JOB_TERMINATED...nf"
    (Andy Riebs, HP, slurm.hp.sacct_dup.patch).
 -- Fix some inconsistencies in sacct's help message (Andy Riebs, HP, 
    slurm.hp.sacct_help.patch).
 -- Validate input to sacct command and allows embedded spaces in 
    arguments (Andy Riebs, HP, slurm.hp.sacct_validate.patch).
* Changes in SLURM 0.7.0-pre8
=============================
Danny Auble's avatar
Danny Auble committed
 -- BGL specific -- bug fix for smap configure function down configuration
 -- Add support for job suspend/resume.
 -- Add slurmd cache for group IDs (Takao Hatazaki, HP).
 -- Fix bug in processing of "#SLURM" batch script option parsing.
* Changes in SLURM 0.7.0-pre7
=============================
 -- Fix issue with NODE_STATE_COMPLETING, could start job on node before
    epilog completed.
 -- Added some infrastructure for job suspend/resume (scontrol, api, and 
    slurmctld stub).
 -- Set job's num_procs to the actual processor count allocated to the job.
 -- Fix bug in HAVE_FRONT_END support for cluster emulation.
* Changes in SLURM 0.7.0-pre6
=============================
 -- Added support for task affinity for binding tasks to CPUs (Daniel
    Palermo, HP).
Moe Jette's avatar
Moe Jette committed
 -- Integrate task affinity support with configuration, add validation 
    test.
* Changes in SLURM 0.7.0-pre5
=============================
 -- Enhanced performance and debugging for slurmctld reconfiguration.
 -- Add "scontrol update Jobid=# Nice=#" support.
 -- Basic slurmctld and tool functionality validated to 16k nodes.
 -- squeue and smap now display correct info for jobs in bluegene enviornment.
Moe Jette's avatar
Moe Jette committed
 -- Fix setting of SLURM_NODELIST for batch jobs.
 -- Add SubmitTime to job information available for display.
Moe Jette's avatar
Moe Jette committed
 -- API function slurm_confirm_allocation() has been marked OBSOLETE
    and will go away in some future version of SLURM.  Use
Moe Jette's avatar
Moe Jette committed
    slurm_allocation_lookup() instead.
 -- New API calls slurm_signal_job and slurm_signal_job_step to send
    signals directly to the slurmds without triggering the shutdown sequence.
 -- remove "uid" from old_job_alloc_msg_t, no longer needed.
 -- Several bug fixes in maui scheduler plugin from Dave Jackon 
    (Cluster Resources).
* Changes in SLURM 0.7.0-pre4
=============================
 -- Remove BNR libary functions and add those for PMI (KVS and basic
    MPI-1 functions only for now)
Danny Auble's avatar
Danny Auble committed
 -- Added Hostfile support for POE and srun.  MP_HOSTFILE env var to set
    location of hostfile.  Tasks will run from list order in the file.  
 -- Removes the slurmd's use of SysV shared memory.  Instead the slurmd
    communicates with the slurmstepd processes through the slurmstepd's
    new named unix domain socket.  The "stepd_api" is used to talk to the
    slurmstepd (src/slurmd/common/stepd_api.[ch]).
 -- Bluegene specific - bluegene block allocator will find most any 
    partition size now.  Added support to start at any point in smap 
    to request a partition instead of always starting at 000.
 -- Bluegene specific - Support to smap to down or bring up nodes in 
    configure mode.  Added commands include allup, alldown, 
    up [range], down [range]
 -- Time format in sinfo/squeue/smap/sacct changed from D:HH:MM:SS to 
    D-HH:MM:SS per POSIX standards document.
 -- Treat scontrol update request without any requested changes as an 
    error condition.
Danny Auble's avatar
Danny Auble committed
 -- Bluegene plugin renamed with BG instead of BGL.  partition_allocator moved 
    into bluegene plugin and renamed block_allocator.  Format for bluegene.conf
    file changed also.  Read bluegene html page.  Code is backwards compatable
    smap will generate in new form
 -- Add srun option --nice to give user some control over job priority.
Moe Jette's avatar
Moe Jette committed
* Changes in SLURM 0.7.0-pre3
=============================
 -- Restructure node states: DRAINING and DRAINED states are replaced 
    with a DRAIN flag. COMPLETING state is changed to a COMPLETING flag. 
 -- Test suite moved into testsuite/expect from separate repository.
Moe Jette's avatar
Moe Jette committed
 -- Added new document describing slurm APIs (doc/html/api.html).
 -- Permit nodes to be in multiple partitions simultaneously.
* Changes in SLURM 0.7.0-pre2
=============================
 -- New stdio protocol.  Now srun has just a single TCP stream to each node
    of a job-step.  srun and slurmd comminicate over the TCP stream using a
    simple messaging protocol.
 -- Added task plugin and use task prolog/epilog(s).
Danny Auble's avatar
Danny Auble committed
 -- New slurmd_step functionality added.  Fork exec instead of using shared
    memory.  Not completely tested.
Danny Auble's avatar
Danny Auble committed
 -- BGL small partition logic in place in plugin and smap.  Scheduler needs  
    to be rewritten to handle multiple partitions on a single node. No 
    documentation written on process yet.
 -- If running select/bluegene plugin without access to BGL DB2, then 
    full-system bglblock is of system size defined in bluegene.conf.
* Changes in SLURM 0.7.0-pre1
=============================
 -- Support defered initiation of job (e.g. srun --begin=11:30 ...).
 -- Add support for srun --cpus-per-task through task allocation in 
    slurmctld.
 -- fixed partition_allocator to work without curses
 -- made change to srun to start message thread before other threads 
    to make sure localtime doesn't interfere.   
 -- Added new RPCs for slurmctld REQUEST_TERMINATE_JOB or TASKS, 
    REQUEST_KILL_JOB/TASKS changed to REQUEST_SIGNAL_JOB/TASKS.
 -- Add support for e-mail notification on job state changes.
 -- Some infrastructure added for task launch controls (slurm.conf:
    TaskProlog, TaskEpilog, TaskPlugin; srun --task-prolog, --task-epilog).
* Changes in SLURM 0.6.11
=========================
 -- Fix bug in sinfo partition sorting order.
 -- Fix bugs in srun use of #SLURM options in batch script.
* Changes in SLURM 0.6.10
=========================
 -- Fix for slurmd job termination logic (could hang in COMPLETING state).
 -- Sacct bug fixes: Report correct user name for job step, show "uid.gid"
    as fifth field of job step record (Andy Riebs, slurm.hp.sacct_uid.patch).
 -- Add job_id to maui scheduler plugin start job status message.
 -- Fix for srun's handling of null characters in stdout or stderr.
 -- Update job accounting for larger systems (Andy Riebs, uptodate.patch).
 -- Fixes for proctrack/linuxproc and mpich-gm support (Takao Hatazaki, HP).
 -- Fix bug in switch/elan for large task count job having irregular task 
    distribution across nodes.
* Changes in SLURM 0.6.9
========================
 -- Fix bug in mpi plugin to set the ID correctly
 -- Accounting bug causing segv fixed (Andy Riebs, 14oct.jobacct.patch)
 -- Fix for failed launch of a debugged job (e.g. bad executable name).
 -- Wiki plugin fix for tracking allocated nodes (Ernest Artiaga, BSC).
 -- Fix memory leaks in slurmctld and federation plugin.
 -- Fix sefault in federation plugin function fed_libstate_clear().
 -- Align job accounting data (Andy Riebs, slurm.hp.unal_jobacct.patch)
 -- Restore switch state in backup controller restarts
* Changes in SLURM 0.6.8
========================
 -- Invalid AllowGroup value in slurm.conf to not cause seg fault.
 -- Fix bug that would cause slurmctld to seg-fault with select/cons_res
    and batch job containing more than one step.
* Changes in SLURM 0.6.7
========================
 -- Make proctrack/linuxproc thread safe, could cause slurmd seg fault.
 -- Propagate umask from srun to spawned tasks.
 -- Fix problem in switch/elan error handling that could hang a slurmd 
    step manager process.
 -- Build on AIX with -bmaxdata:0x70000000 for memory limit more than 256MB.
 -- Restore srun's return code support.
* Changes in SLURM 0.6.6
========================
 -- Fix for bad socket close() in the spawn-io code.

* Changes in SLURM 0.6.5
========================
 -- Sacct to report on job steps that never actually started.
 -- Added proctrack/rms to elan rpm.
 -- Restructure slurmctld/agent.c logic to insure timely reaping of 
    terminating pthreads.
 -- Srun not to hang if job fails before task launches not all completed.
 -- Fix for consumable resources properly scheduling nodes that have more 
    nodes than configured (Susanne Balle, HP, cons_res_patch.10.14.2005)

* Changes in SLURM 0.6.4
========================
 -- Bluegene plugin drains an entire bglblock on repeated boot failures
    only if it has not identified a specific node as being bad.

* Changes in SLURM 0.6.3
========================
 -- Fix slurmctld mem leaks (step name and hostlist struct).
 -- Bluegene plugin sets end time for job terminated due to removed 
    bglblock.

* Changes in SLURM 0.6.2
========================
Loading
Loading full blame...