Skip to content
Snippets Groups Projects
NEWS 61 KiB
Newer Older
Christopher J. Morrone's avatar
Christopher J. Morrone committed
This file describes changes in recent versions of SLURM. It primarily
documents those changes that are of interest to users and admins. 
* Changes in SLURM 0.7.0-pre4
=============================
 -- Remove BNR libary functions and add those for PMI (KVS and basic
    MPI-1 functions only for now)
Danny Auble's avatar
Danny Auble committed
 -- Added Hostfile support for POE and srun.  MP_HOSTFILE env var to set
    location of hostfile.  Tasks will run from list order in the file.  
 -- Removes the slurmd's use of SysV shared memory.  Instead the slurmd
    communicates with the slurmstepd processes through the slurmstepd's
    new named unix domain socket.  The "stepd_api" is used to talk to the
    slurmstepd (src/slurmd/common/stepd_api.[ch]).
 -- Bluegene specific - bluegene block allocator will find most any 
    partition size now.  Added support to start at any point in smap 
    to request a partition instead of always starting at 000.
 -- Bluegene specific - Support to smap to down or bring up nodes in 
    configure mode.  Added commands include allup, alldown, 
    up [range], down [range]
 -- Time format in sinfo/squeue/smap/sacct changed from D:HH:MM:SS to 
    D-HH:MM:SS per POSIX standards document.
 -- Treat scontrol update request without any requested changes as an 
    error condition.
Moe Jette's avatar
Moe Jette committed
* Changes in SLURM 0.7.0-pre3
=============================
 -- Restructure node states: DRAINING and DRAINED states are replaced 
    with a DRAIN flag. COMPLETING state is changed to a COMPLETING flag. 
 -- Test suite moved into testsuite/expect from separate repository.
Moe Jette's avatar
Moe Jette committed
 -- Added new document describing slurm APIs (doc/html/api.html).
 -- Permit nodes to be in multiple partitions simultaneously.
* Changes in SLURM 0.7.0-pre2
=============================
 -- New stdio protocol.  Now srun has just a single TCP stream to each node
    of a job-step.  srun and slurmd comminicate over the TCP stream using a
    simple messaging protocol.
 -- Added task plugin and use task prolog/epilog(s).
Danny Auble's avatar
Danny Auble committed
 -- New slurmd_step functionality added.  Fork exec instead of using shared
    memory.  Not completely tested.
Danny Auble's avatar
Danny Auble committed
 -- BGL small partition logic in place in plugin and smap.  Scheduler needs  
    to be rewritten to handle multiple partitions on a single node. No 
    documentation written on process yet.
 -- If running select/bluegene plugin without access to BGL DB2, then 
    full-system bglblock is of system size defined in bluegene.conf.
* Changes in SLURM 0.7.0-pre1
=============================
 -- Support defered initiation of job (e.g. srun --begin=11:30 ...).
 -- Add support for srun --cpus-per-task through task allocation in 
    slurmctld.
 -- fixed partition_allocator to work without curses
 -- made change to srun to start message thread before other threads 
    to make sure localtime doesn't interfere.   
 -- Added new RPCs for slurmctld REQUEST_TERMINATE_JOB or TASKS, 
    REQUEST_KILL_JOB/TASKS changed to REQUEST_SIGNAL_JOB/TASKS.
 -- Add support for e-mail notification on job state changes.
 -- Some infrastructure added for task launch controls (slurm.conf:
    TaskProlog, TaskEpilog, TaskPlugin; srun --task-prolog, --task-epilog).
* Changes in SLURM 0.6.9
========================
 -- Fix bug in mpi plugin to set the ID correctly
 -- Accounting bug causing segv fixed (Andy Riebs, 14oct.jobacct.patch)
 -- Fix for failed launch of a debugged job (e.g. bad executable name).
 -- Wiki plugin fix for tracking allocated nodes (Ernest Artiaga, BSC).
* Changes in SLURM 0.6.8
========================
 -- Invalid AllowGroup value in slurm.conf to not cause seg fault.
 -- Fix bug that would cause slurmctld to seg-fault with select/cons_res
    and batch job containing more than one step.
* Changes in SLURM 0.6.7
========================
 -- Make proctrack/linuxproc thread safe, could cause slurmd seg fault.
 -- Propagate umask from srun to spawned tasks.
 -- Fix problem in switch/elan error handling that could hang a slurmd 
    step manager process.
 -- Build on AIX with -bmaxdata:0x70000000 for memory limit more than 256MB.
 -- Restore srun's return code support.
* Changes in SLURM 0.6.6
========================
 -- Fix for bad socket close() in the spawn-io code.

* Changes in SLURM 0.6.5
========================
 -- Sacct to report on job steps that never actually started.
 -- Added proctrack/rms to elan rpm.
 -- Restructure slurmctld/agent.c logic to insure timely reaping of 
    terminating pthreads.
 -- Srun not to hang if job fails before task launches not all completed.
 -- Fix for consumable resources properly scheduling nodes that have more 
    nodes than configured (Susanne Balle, HP, cons_res_patch.10.14.2005)

* Changes in SLURM 0.6.4
========================
 -- Bluegene plugin drains an entire bglblock on repeated boot failures
    only if it has not identified a specific node as being bad.

* Changes in SLURM 0.6.3
========================
 -- Fix slurmctld mem leaks (step name and hostlist struct).
 -- Bluegene plugin sets end time for job terminated due to removed 
    bglblock.

* Changes in SLURM 0.6.2
========================
 -- Fix sinfo and squeue formatting to properly handle slurm nodes, 
    jobs, and other names containing "%".

* Changes in SLURM 0.6.1
========================
 -- Fixed smap -Db to display slurm partitions correctly (take 2).
 -- Add srun fork() retry logic for very heavily loaded system.
 -- Fix possible srun hang on task launch failure.
 -- Add support for mvapich v0.9.4, 0.9.5 and gen2.

* Changes in SLURM 0.6.0
========================
 -- Add documentation for ProctrackType=proctrack/rms.
 -- Make proctrack/rms be the default for switch/elan.
 -- Do not preceed SIGKILL or SIGTERM to job step with (non-requested) SIGCONT.
 -- Fixed smap -Db to display slurm partitions correctly.  
 -- Explicitly disallow ProctrackType=proctrack/linuxproc with 
    SwitchType=switch/elan. They will not work properly together.

* Changes in SLURM 0.6.0-pre8
=============================
 -- Remove debugging xassert in switch/federation that were accidentally
    committed
 -- Make slurmd step manager retry slurm_container_destroy() indefinitely
    instead of giving up after 30 seconds.  If something prevents a job
    step's processes from being killed, the job will be stuck in the
    completing until the container destroy succeeds.

* Changes in SLURM 0.6.0-pre7
=============================
 -- Disable localtime_r() calls from forked processes (semaphore set 
    in another pthread can deadlock calls to localtime_r made from 
    the forked process, this will be properly fixed in the next 
    major release of SLURM).
 -- Added SLURM_LOCALID environment variable for spawned tasks
    (Dan Palermo, HP).
 -- Modify switch logic to restore state based exclusively upon
    recovered job steps (not state save file).
 -- Gracefully refuse job if there are too many job steps in slurmd.
 -- Fix race condition in job completion that can leave nodes in 
    COMPLETING state after job is COMPLETED.
 -- Added frees for BGL BrigeAPI strdups that were to this point unknown.
 -- smap scrolls correctly for BGL systems.
 -- slurm_pid2jobid() API call will now return the jobid for a step
    manager slurmd process.
* Changes in SLURM 0.6.0-pre6
=============================
 -- Added logic to return scheduled nodes to Maui scheduler (David
    Jackson, Cluster Resources)
 -- Fix bug in handling job request with maximum node count.
 -- Fix node selection scheduling bug with heterogeneous nodes and
    srun --cpus-per-task option
 -- Generate error file to note prolog failures.

* Changes in SLURM 0.6.0-pre5
=============================
 -- Modify sfree (BGL command) so that --all option no longer requires
    an argument.
 -- Modify smap so it shows all nodes and partitions by default (even 
    nodes that the user can't access, otherwise there are holes in 
 -- Added module to parse time string (src/common/parse_time.c) for 
    future use.
 -- Fix BlueGene hostlist processing for non-rectangular prisms and
    add string length checking.
 -- Modify orphan batch job time calculation for BGL to account for 
    slowness when booting many bglblocks at the same time.
* Changes in SLURM 0.6.0-pre4
=============================
 -- Added etc/slurm.epilog.clean to kill processes initiated outside of 
    slurm when a user's last job on a node terminates.
 -- Added config.xml and configurator.html files for use by OSCAR.
 -- Increased maximum job step count from 64 to 130 for BGL systems only.
Christopher J. Morrone's avatar
Christopher J. Morrone committed
* Changes in SLURM 0.6.0-pre3
 -- Add code so job request for shared nodes gets explicitly requested 
    nodes, but lightly loaded nodes otherwise.
 -- Add job step name field.
 -- Add job step network specification field.
Christopher J. Morrone's avatar
Christopher J. Morrone committed
 -- Add proctrack/rms plugin
 -- Change the proctrack API to send a slurmd_job_t pointer to both
    slurm_container_create() and slurm_container_add().  One of those
    functions MUST set job->cont_id.
 -- Remove vestigial node_use (virtual or coprocessor) field from job
    request RPC.
 -- Fix mpich-gm bugs, thanks to Takao Hatazaki (HP).
 -- Fix code for clean build with gcc 2.96, Takao Hatazaki (HP).
 -- Add node update state of "RESUME" to return DRAINED, DRAINING, or 
Loading
Loading full blame...