Skip to content
Snippets Groups Projects
NEWS 58.9 KiB
Newer Older
Christopher J. Morrone's avatar
Christopher J. Morrone committed
This file describes changes in recent versions of SLURM. It primarily
documents those changes that are of interest to users and admins. 
* Changes in SLURM 0.7.0-pre2
=============================
 -- New stdio protocol.  Now srun has just a single TCP stream to each node
    of a job-step.  srun and slurmd comminicate over the TCP stream using a
    simple messaging protocol.
 -- Added task plugin and use task prolog/epilog(s).
Danny Auble's avatar
Danny Auble committed
 -- New slurmd_step functionality added.  Fork exec instead of using shared
    memory.  Not completely tested.
Danny Auble's avatar
Danny Auble committed
 -- BGL small partition logic in place in plugin and smap.  Scheduler needs  
    to be rewritten to handle multiple partitions on a single node. No 
    documentation written on process yet.
 -- If running select/bluegene plugin without access to BGL DB2, then 
    full-system bglblock is of system size defined in bluegene.conf.
* Changes in SLURM 0.7.0-pre1
=============================
 -- Support defered initiation of job (e.g. srun --begin=11:30 ...).
 -- Add support for srun --cpus-per-task through task allocation in 
    slurmctld.
 -- fixed partition_allocator to work without curses
 -- made change to srun to start message thread before other threads 
    to make sure localtime doesn't interfere.   
 -- Added new RPCs for slurmctld REQUEST_TERMINATE_JOB or TASKS, 
    REQUEST_KILL_JOB/TASKS changed to REQUEST_SIGNAL_JOB/TASKS.
 -- Add support for e-mail notification on job state changes.
 -- Some infrastructure added for task launch controls (slurm.conf:
    TaskProlog, TaskEpilog, TaskPlugin; srun --task-prolog, --task-epilog).
* Changes in SLURM 0.6.7
========================
 -- Make proctrack/linuxproc thread safe, could cause slurmd seg fault.
 -- Propagate umask from srun to spawned tasks.
 -- Fix problem in switch/elan error handling that could hang a slurmd 
    step manager process.
* Changes in SLURM 0.6.6
========================
 -- Fix for bad socket close() in the spawn-io code.

* Changes in SLURM 0.6.5
========================
 -- Sacct to report on job steps that never actually started.
 -- Added proctrack/rms to elan rpm.
 -- Restructure slurmctld/agent.c logic to insure timely reaping of 
    terminating pthreads.
 -- Srun not to hang if job fails before task launches not all completed.
 -- Fix for consumable resources properly scheduling nodes that have more 
    nodes than configured (Susanne Balle, HP, cons_res_patch.10.14.2005)

* Changes in SLURM 0.6.4
========================
 -- Bluegene plugin drains an entire bglblock on repeated boot failures
    only if it has not identified a specific node as being bad.

* Changes in SLURM 0.6.3
========================
 -- Fix slurmctld mem leaks (step name and hostlist struct).
 -- Bluegene plugin sets end time for job terminated due to removed 
    bglblock.

* Changes in SLURM 0.6.2
========================
 -- Fix sinfo and squeue formatting to properly handle slurm nodes, 
    jobs, and other names containing "%".

* Changes in SLURM 0.6.1
========================
 -- Fixed smap -Db to display slurm partitions correctly (take 2).
 -- Add srun fork() retry logic for very heavily loaded system.
 -- Fix possible srun hang on task launch failure.
 -- Add support for mvapich v0.9.4, 0.9.5 and gen2.

* Changes in SLURM 0.6.0
========================
 -- Add documentation for ProctrackType=proctrack/rms.
 -- Make proctrack/rms be the default for switch/elan.
 -- Do not preceed SIGKILL or SIGTERM to job step with (non-requested) SIGCONT.
 -- Fixed smap -Db to display slurm partitions correctly.  
 -- Explicitly disallow ProctrackType=proctrack/linuxproc with 
    SwitchType=switch/elan. They will not work properly together.

* Changes in SLURM 0.6.0-pre8
=============================
 -- Remove debugging xassert in switch/federation that were accidentally
    committed
 -- Make slurmd step manager retry slurm_container_destroy() indefinitely
    instead of giving up after 30 seconds.  If something prevents a job
    step's processes from being killed, the job will be stuck in the
    completing until the container destroy succeeds.

* Changes in SLURM 0.6.0-pre7
=============================
 -- Disable localtime_r() calls from forked processes (semaphore set 
    in another pthread can deadlock calls to localtime_r made from 
    the forked process, this will be properly fixed in the next 
    major release of SLURM).
 -- Added SLURM_LOCALID environment variable for spawned tasks
    (Dan Palermo, HP).
 -- Modify switch logic to restore state based exclusively upon
    recovered job steps (not state save file).
 -- Gracefully refuse job if there are too many job steps in slurmd.
 -- Fix race condition in job completion that can leave nodes in 
    COMPLETING state after job is COMPLETED.
 -- Added frees for BGL BrigeAPI strdups that were to this point unknown.
 -- smap scrolls correctly for BGL systems.
 -- slurm_pid2jobid() API call will now return the jobid for a step
    manager slurmd process.
* Changes in SLURM 0.6.0-pre6
=============================
 -- Added logic to return scheduled nodes to Maui scheduler (David
    Jackson, Cluster Resources)
 -- Fix bug in handling job request with maximum node count.
 -- Fix node selection scheduling bug with heterogeneous nodes and
    srun --cpus-per-task option
 -- Generate error file to note prolog failures.

* Changes in SLURM 0.6.0-pre5
=============================
 -- Modify sfree (BGL command) so that --all option no longer requires
    an argument.
 -- Modify smap so it shows all nodes and partitions by default (even 
    nodes that the user can't access, otherwise there are holes in 
 -- Added module to parse time string (src/common/parse_time.c) for 
    future use.
 -- Fix BlueGene hostlist processing for non-rectangular prisms and
    add string length checking.
 -- Modify orphan batch job time calculation for BGL to account for 
    slowness when booting many bglblocks at the same time.
* Changes in SLURM 0.6.0-pre4
=============================
 -- Added etc/slurm.epilog.clean to kill processes initiated outside of 
    slurm when a user's last job on a node terminates.
 -- Added config.xml and configurator.html files for use by OSCAR.
 -- Increased maximum job step count from 64 to 130 for BGL systems only.
Christopher J. Morrone's avatar
Christopher J. Morrone committed
* Changes in SLURM 0.6.0-pre3
 -- Add code so job request for shared nodes gets explicitly requested 
    nodes, but lightly loaded nodes otherwise.
 -- Add job step name field.
 -- Add job step network specification field.
Christopher J. Morrone's avatar
Christopher J. Morrone committed
 -- Add proctrack/rms plugin
 -- Change the proctrack API to send a slurmd_job_t pointer to both
    slurm_container_create() and slurm_container_add().  One of those
    functions MUST set job->cont_id.
 -- Remove vestigial node_use (virtual or coprocessor) field from job
    request RPC.
 -- Fix mpich-gm bugs, thanks to Takao Hatazaki (HP).
 -- Fix code for clean build with gcc 2.96, Takao Hatazaki (HP).
 -- Add node update state of "RESUME" to return DRAINED, DRAINING, or 
    DOWN node to service (IDLE or ALLOCATED state).
 -- smap keeps trying to connect to slurmctld in iterative mode rather 
    than just aborting on failure.
 -- Add squeue option --node to filter by node name.
 -- Modify squeue --user option to accept not only user names, but also
    user IDs.
Christopher J. Morrone's avatar
Christopher J. Morrone committed

* Changes in SLURM 0.6.0-pre2
=============================
 -- Removed "make rpm" target.
* Changes in SLURM 0.6.0-pre1
=============================
 -- Added bgl/partition_allocator/smap changes from 0.5.7.
 -- Added configurable resource limit propagation  (Daniel Christians, HP).
Danny Auble's avatar
Danny Auble committed
 -- Added mpi plugin specify at start of srun.
 -- Changed SlurmUser ID from 16-bit to 32-bit.
 -- Added MpiDefault slurm.conf parameter.
 -- Remove KillTree configuration parameter (replace with
    "ProctrackType=proctrack/linuxproc")
 -- Remove MpichGmDirectSupport configuration parameter (replace with
    "MpiDefault=mpich-gm")
 -- Make default plugin be "none" for mpi.
 -- Added mpi/none plugin and made it the default.
 -- Replace extern program_invocation_short_name with program_invocation_name
    due to short name being truncated to 16 bytes on some systems.
 -- Added support for Elan clusters with different CPU counts on nodes
    (Chris Holmes, HP).
 -- Added Consumable Resources web page (Susanne Balle, HP).
Christopher J. Morrone's avatar
Christopher J. Morrone committed
 -- "Session manager" slurmd process has been eliminated.
 -- switch/federation fixes migrated from 0.5.*
 -- srun pthreads really set detached, fixes scaling problem
 -- srun spawns message handler process so it can now be stopped (via 
    Ctrl-Z or TotalView) without inducing failures.

* Changes in SLURM 0.5.7
========================
 -- added infrastructure for (eventual) support of AIX checkpointing
    of slurm batch and interactive poe jobs
 -- added wiring for BGL to do wiring for physical location first and then
    logical.
 -- only one thread used to query database before polling thread is there.

* Changes in SLURM 0.5.6
Loading
Loading full blame...