NEWS

This file describes changes in recent versions of SLURM. It primarily
documents those changes that are of interest to users and admins. 

* Changes in SLURM 0.7.0-pre4
=============================
 -- Remove BNR libary functions and add those for PMI (KVS and basic
    MPI-1 functions only for now)
 -- Added Hostfile support for POE and srun.  MP_HOSTFILE env var to set
    location of hostfile.  Tasks will run from list order in the file.  
 -- Removes the slurmd's use of SysV shared memory.  Instead the slurmd
    communicates with the slurmstepd processes through the slurmstepd's
    new named unix domain socket.  The "stepd_api" is used to talk to the
    slurmstepd (src/slurmd/common/stepd_api.[ch]).
 -- Bluegene specific - bluegene block allocator will find most any 
    partition size now.  Added support to start at any point in smap 
    to request a partition instead of always starting at 000.
 -- Bluegene specific - Support to smap to down or bring up nodes in 
    configure mode.  Added commands include allup, alldown, 
    up [range], down [range]
 -- Time format in sinfo/squeue/smap/sacct changed from D:HH:MM:SS to 
    D-HH:MM:SS per POSIX standards document.
 -- Treat scontrol update request without any requested changes as an 
    error condition.

* Changes in SLURM 0.7.0-pre3
=============================
 -- Restructure node states: DRAINING and DRAINED states are replaced 
    with a DRAIN flag. COMPLETING state is changed to a COMPLETING flag. 
 -- Test suite moved into testsuite/expect from separate repository.
 -- Added new document describing slurm APIs (doc/html/api.html).
 -- Permit nodes to be in multiple partitions simultaneously.

* Changes in SLURM 0.7.0-pre2
=============================
 -- New stdio protocol.  Now srun has just a single TCP stream to each node
    of a job-step.  srun and slurmd comminicate over the TCP stream using a
    simple messaging protocol.
 -- Added task plugin and use task prolog/epilog(s).
 -- New slurmd_step functionality added.  Fork exec instead of using shared
    memory.  Not completely tested.
 -- BGL small partition logic in place in plugin and smap.  Scheduler needs  
    to be rewritten to handle multiple partitions on a single node. No 
    documentation written on process yet.
 -- If running select/bluegene plugin without access to BGL DB2, then 
    full-system bglblock is of system size defined in bluegene.conf.

* Changes in SLURM 0.7.0-pre1
=============================
 -- Support defered initiation of job (e.g. srun --begin=11:30 ...).
 -- Add support for srun --cpus-per-task through task allocation in 
    slurmctld.
 -- fixed partition_allocator to work without curses
 -- made change to srun to start message thread before other threads 
    to make sure localtime doesn't interfere.   
 -- Added new RPCs for slurmctld REQUEST_TERMINATE_JOB or TASKS, 
    REQUEST_KILL_JOB/TASKS changed to REQUEST_SIGNAL_JOB/TASKS.
 -- Add support for e-mail notification on job state changes.
 -- Some infrastructure added for task launch controls (slurm.conf:
    TaskProlog, TaskEpilog, TaskPlugin; srun --task-prolog, --task-epilog).

* Changes in SLURM 0.6.9
========================
 -- Fix bug in mpi plugin to set the ID correctly
 -- Accounting bug causing segv fixed (Andy Riebs, 14oct.jobacct.patch)
 -- Fix for failed launch of a debugged job (e.g. bad executable name).
 -- Wiki plugin fix for tracking allocated nodes (Ernest Artiaga, BSC).

* Changes in SLURM 0.6.8
========================
 -- Invalid AllowGroup value in slurm.conf to not cause seg fault.
 -- Fix bug that would cause slurmctld to seg-fault with select/cons_res
    and batch job containing more than one step.

* Changes in SLURM 0.6.7
========================
 -- Make proctrack/linuxproc thread safe, could cause slurmd seg fault.
 -- Propagate umask from srun to spawned tasks.
 -- Fix problem in switch/elan error handling that could hang a slurmd 
    step manager process.
 -- Build on AIX with -bmaxdata:0x70000000 for memory limit more than 256MB.
 -- Restore srun's return code support.

* Changes in SLURM 0.6.6
========================
 -- Fix for bad socket close() in the spawn-io code.

* Changes in SLURM 0.6.5
========================
 -- Sacct to report on job steps that never actually started.
 -- Added proctrack/rms to elan rpm.
 -- Restructure slurmctld/agent.c logic to insure timely reaping of 
    terminating pthreads.
 -- Srun not to hang if job fails before task launches not all completed.
 -- Fix for consumable resources properly scheduling nodes that have more 
    nodes than configured (Susanne Balle, HP, cons_res_patch.10.14.2005)

* Changes in SLURM 0.6.4
========================
 -- Bluegene plugin drains an entire bglblock on repeated boot failures
    only if it has not identified a specific node as being bad.

* Changes in SLURM 0.6.3
========================
 -- Fix slurmctld mem leaks (step name and hostlist struct).
 -- Bluegene plugin sets end time for job terminated due to removed 
    bglblock.

* Changes in SLURM 0.6.2
========================
 -- Fix sinfo and squeue formatting to properly handle slurm nodes, 
    jobs, and other names containing "%".

* Changes in SLURM 0.6.1
========================
 -- Fixed smap -Db to display slurm partitions correctly (take 2).
 -- Add srun fork() retry logic for very heavily loaded system.
 -- Fix possible srun hang on task launch failure.
 -- Add support for mvapich v0.9.4, 0.9.5 and gen2.

* Changes in SLURM 0.6.0
========================
 -- Add documentation for ProctrackType=proctrack/rms.
 -- Make proctrack/rms be the default for switch/elan.
 -- Do not preceed SIGKILL or SIGTERM to job step with (non-requested) SIGCONT.
 -- Fixed smap -Db to display slurm partitions correctly.  
 -- Explicitly disallow ProctrackType=proctrack/linuxproc with 
    SwitchType=switch/elan. They will not work properly together.

* Changes in SLURM 0.6.0-pre8
=============================
 -- Remove debugging xassert in switch/federation that were accidentally
    committed
 -- Make slurmd step manager retry slurm_container_destroy() indefinitely
    instead of giving up after 30 seconds.  If something prevents a job
    step's processes from being killed, the job will be stuck in the
    completing until the container destroy succeeds.

* Changes in SLURM 0.6.0-pre7
=============================
 -- Disable localtime_r() calls from forked processes (semaphore set 
    in another pthread can deadlock calls to localtime_r made from 
    the forked process, this will be properly fixed in the next 
    major release of SLURM).
 -- Added SLURM_LOCALID environment variable for spawned tasks
    (Dan Palermo, HP).
 -- Modify switch logic to restore state based exclusively upon
    recovered job steps (not state save file).
 -- Gracefully refuse job if there are too many job steps in slurmd.
 -- Fix race condition in job completion that can leave nodes in 
    COMPLETING state after job is COMPLETED.
 -- Added frees for BGL BrigeAPI strdups that were to this point unknown.
 -- smap scrolls correctly for BGL systems.
 -- slurm_pid2jobid() API call will now return the jobid for a step
    manager slurmd process.

* Changes in SLURM 0.6.0-pre6
=============================
 -- Added logic to return scheduled nodes to Maui scheduler (David
    Jackson, Cluster Resources)
 -- Fix bug in handling job request with maximum node count.
 -- Fix node selection scheduling bug with heterogeneous nodes and
    srun --cpus-per-task option
 -- Generate error file to note prolog failures.

* Changes in SLURM 0.6.0-pre5
=============================
 -- Modify sfree (BGL command) so that --all option no longer requires
    an argument.
 -- Modify smap so it shows all nodes and partitions by default (even 
    nodes that the user can't access, otherwise there are holes in 
    its maps).
 -- Added module to parse time string (src/common/parse_time.c) for 
    future use.
 -- Fix BlueGene hostlist processing for non-rectangular prisms and
    add string length checking.
 -- Modify orphan batch job time calculation for BGL to account for 
    slowness when booting many bglblocks at the same time.

* Changes in SLURM 0.6.0-pre4
=============================
 -- Added etc/slurm.epilog.clean to kill processes initiated outside of 
    slurm when a user's last job on a node terminates.
 -- Added config.xml and configurator.html files for use by OSCAR.
 -- Increased maximum job step count from 64 to 130 for BGL systems only.

* Changes in SLURM 0.6.0-pre3
=============================
 -- Add code so job request for shared nodes gets explicitly requested 
    nodes, but lightly loaded nodes otherwise.
 -- Add job step name field.
 -- Add job step network specification field.
 -- Add proctrack/rms plugin
 -- Change the proctrack API to send a slurmd_job_t pointer to both
    slurm_container_create() and slurm_container_add().  One of those
    functions MUST set job->cont_id.
 -- Remove vestigial node_use (virtual or coprocessor) field from job
    request RPC.
 -- Fix mpich-gm bugs, thanks to Takao Hatazaki (HP).
 -- Fix code for clean build with gcc 2.96, Takao Hatazaki (HP).
 -- Add node update state of "RESUME" to return DRAINED, DRAINING, or