Newer
Older
This file describes changes in recent versions of SLURM. It primarily
documents those changes that are of interest to users and admins.
* Changes in SLURM 1.2.2
========================
-- srun --get-user-env now sends su's stderr to /dev/null
-- MPICHGM support bug fixes from Ernest Artiaga, BSC.
-- Support longer hostlist strings, from Ernest Artiaga, BSC.
-- Srun to use env vars for SLURM_PROLOG, SLURM_EPILOG, SLURM_TASK_PROLOG,
and SLURM_TASK_EPILOG. patch.1.2.0-pre11.070201.envproepilog from
Dan Palermo, HP.
-- Documenation update. patch.1.2.0-pre11.070201.mchtml from Dan Palermo, HP.
-- Set SLURM_DIST_CYCLIC = 1 (needed for HP MPI, slurm.hp.env.patch).
* Changes in SLURM 1.2.0-pre15
==============================
-- Fix for another spot where the backup controller calls switch/federation
code before switch/federation is initialized.
* Changes in SLURM 1.2.0-pre14
==============================
-- In sched/wiki2, clear required nodes list when a job is requeued.
Note that the required node list is set to every node used when
a job is started via sched/wiki2.
-- BLUEGENE - Added display of deallocating blocks to smap and other tools.
-- Make slurmctld's working directory be same as SlurmctldLogFile (if any),
otherwise StateSaveDir (which is likely a shared directory, possibly
making core file identification more difficult).
-- Fix bug in switch/federation that results in the backup controller
aborting if it receives an epilog-complete message.
* Changes in SLURM 1.2.0-pre13
==============================
-- Fix for --get-user-env.
* Changes in SLURM 1.2.0-pre12
==============================

Danny Auble
committed
-- BLUEGENE - Added correct node info for sinfo and sview for viewing
allocated nodes in a partition.
-- BLUEGENE - Added state save on slurmctld shutdown of blocks in an error
state on real systems and total block config on emulation systems.
-- Major update to Slurm's PMI internal logic for better scalability.
Communications now supported directly between application tasks via
Slurm's PMI library. Srun sends single message to one task on each node
and that tasks forwards key-pairs to other tasks on that nodes. The old
code sent key-pairs directly to each task.
NOTE: PMI applications must re-link with this new library.
-- For multi-core support: Fix task distribution bug and add automated
tests, patch.1.2.0-pre11.070111.plane from Dan Palermo (HP).
* Changes in SLURM 1.2.0-pre11
==============================
-- Add multi-core options to slurm_step_launch API.
-- Add man pages for slurm_step_launch() and related functions.

Danny Auble
committed
-- Jobacct plugin only looks at the proctrack list instead of the entire
list of processes running on the node. Cutting down a lot of unnecessary
file opens in linux and cutting down the time to query the procs by
more than half.
-- Multi-core bug fix, mask re-use with multiple job steps,
patch.1.2.0-pre10.061214.affinity_stepid from Dan Palermo (HP).
-- Modify jobacct/linux plugin to completely eliminate open /proc files.
-- Added slurm_sched_plugin_reconfig() function to re-read config files.
-- BLUEGENE - --reboot option to srun, salloc, and sbatch actually works.
-- Modified step context and step launch APIs.
* Changes in SLURM 1.2.0-pre10
==============================
-- Fix for sinfo node state counts by state (%A and %F output options).
-- Add ability to change a node's features via "scontrol update". NOTE:
Update slurm.conf also to preserve changes over slurmctld restart or
reconfig.
NOTE: Job and node state information can not be preserved from earlier
versions.
-- Added new slurm.conf parameter TaskPluginParam.
-- Fix for job requeue and credential revoke logic from Hongjia Cao (NUDT).
-- Fix for incorrectly generated masks for task/affinity plugin,
patch.1.2.0-pre9.061207.bitfmthex from Dan Palermo (HP).
-- Make mask_cpu options of srun and slaunch commands not requeue prefix
of "0x". patch.1.2.0-pre9.061208.srun_maskparse from Dan Palermo (HP).
-- Add -c support to the -B automatic mask generation for multi-core
support, patch.1.2.0-pre9.061208.mcore_cpuspertask from Dan Palermo (HP).
-- Fix bug in MASK_CPU calculation,
patch.1.2.0-pre9.061211.avail_cpuspertask from Dan Palermo (HP).
-- BLUEGENE - Added --reboot option to srun, salloc, and sbatch commands.
-- Add "scontrol listpids [JOBID[.STEPID]]" support.
-- Multi-core support patches, fixed SEGV and clean up output for large
task counts, patch.1.2.0-pre9.061212.cpubind_verbose from Dan Palermo (HP).
-- Make sure jobacct plugin files are closed before exec of user tasks to
prevent problems with job checkpoint/restart (based on work by
Hongjia Cao, NUDT).
* Changes in SLURM 1.2.0-pre9
=============================
-- Fix for select/cons_res state preservation over slurmctld restart,
patch.1.2.0-pre7.061130.cr_state from Dan Palermo.
-- Validate product of socket*core*thread count on node registration rather
than individual values. Correct values will need to be specified in slurm.conf
with FastSchedule=1 for correct multi-core scheduling behavior.
* Changes in SLURM 1.2.0-pre8
=============================
-- Modity job state "reason" field to report why a job failed (previously
previously reported only reason waiting to run). Requires cold-start of
slurmctld (-c option).
-- For sched/wiki2 job state request, return REJMESSAGE= with reason for
a job's failure.
-- New FastSchedule configuration parameter option "2" means to base
scheduling decisions upon the node's configuration as specified in
slurm.conf and ignore the node's actual hardware configuration. This
can be useful for testing.
-- Add sinfo output format option "%C" for CPUs (active/idle/other/total).
Based upon work by Anne-Marie Wunderlin (BULL).
-- Assorted multi-core bug fixes (patch1.2.0-pre7.061128.mcorefixes).
-- Report SelectTypeParameters from "scontrol show config".
-- Build sched/wiki plugin for Maui Scheduler (based upon new sched/wiki2
code for Moab Scheduler).

Danny Auble
committed
-- BLUEGENE - changed way of keeping track of smaller partitions using
ionode range instead of quarter nodecard notation.
(i.e. bgl000[0-3] instead of bgl000.0.0)
-- Patch from Hongjia Cao (EINPROGRESS error message change)
-- Fix for correct requid for jobacct plugin
-- Added subsec timing display for sacct
* Changes in SLURM 1.2.0-pre7
=============================
-- BLUEGENE - added configurable images for bluegene block creation.
-- Support processors, core, and physical IDs that are not in numeric
order (in slurmd to gathering node state information, based on patch
by Don Albert, Bull).
-- Fixed bug with aix not looking in the correct dir for the proctrack
include files
-- Removed global_srun.* from common merged it into srun proper
-- Added bluegene section to troubleshooting guide (web page).
-- NOTE: Requires cold-start when moving from 1.2.0-pre6, save state
info for jobs changed.
-- BLUEGENE - Changed logic for wiring bgl blocks to be more maintainable.
(Haven't tested on large system yet, works on 2 base partition system)
-- Do not read the select/cons_res state save file if slurmctld is
cold-started (with the "-c" option).
* Changes in SLURM 1.2.0-pre6
=============================
-- Maintain actually job step run time with suspend/resume use.
-- Allow slurm.conf options to appear multiple times. SLURM will use the
last instance of any particular option.
-- Add version number to node state save file. Will not recover node
state information on restart from older version.
-- Add logic to save/restore multi-core state information.
-- Updated multi-core logic to use types uint16_t and uint32_t instead
of just type int.
-- Race condition for forwarding logic fix from Hongjia Cao
-- Add support for Portable Linux Processor Affinity (PLPA, see
http://www.open-mpi.org/software/plpa).
-- When a job epilog completes on all non-DOWN nodes, immediately purge
it's job steps that lack switch windows. Needed for LSF operation.
Based upon slurm.hp.node_fail.patch.
-- Modify srun to ignore entries on --nodelist for job step creation
if their count exceeds the task count. Based on slurm.hp.srun.patch.
* Changes in SLURM 1.2.0-pre5
=============================
-- Patch from HP patch.1.2.0.pre4.061017.crcore_hints, supports cores as
consumable resource.
-- Added node_inx to job_step_info_t to get the node indecies for mapping out

Danny Auble
committed
steps in a job by nodes.
-- sview grid added
-- BLUEGENE node_inx added to blocks for reference.
-- Automatic CPU_MASK generation for task launch, new srun option -B.
-- Automatic logical to physical processor identification and mapping.
-- Added new srun options to --cpu_bind: sockets, cores, and threads
-- Updated select/cons_res to operate as socket granularity.
-- New srun task distribution options to -m: plane
-- Multi-core support in sinfo, squeue, and scontrol.
-- Memory can be treated as a consumable resource.
-- New srun options --ntasks-per-[node|socket|core].
* Changes in SLURM 1.2.0-pre3
=============================
-- Remove configuration parameter ShedulerAuth (defunct).
-- Add NextJobId to "scontrol show config" output.
-- Add new slurm.conf parameter MailProg.
-- New forwarding logic. New recieve_msg functions depending on what you
are expecting to get back. No srun_node_id anymore passed around in
a slurm_msg_t
-- Remove sched/wiki plugin (use sched/wiki2 for now)
-- Disable pthread_create() for PMI_send when TotalView is running for
better performance.
-- Fixed certain tests in test suite to not run with bluegene or front-end

Danny Auble
committed
systems
-- Removed addresses from slurm_step_layout_t
-- Added new job field, "comment". Set by srun, salloc and sbatch. See
with "scontrol show job". Used in sched/wiki2.
-- Report a job's exit status in "scontrol show job".
-- In sched/wiki2: add support for JOBREQUEUE command.
* Changes in SLURM 1.2.0-pre2
=============================
-- Added function slurm_init_slurm_msg to be used to init any slurm_msg_t
you no longer need do any other type of initialization to the type.
* Changes in SLURM 1.2.0-pre2
=============================
-- Fixed task dist to work with hostfile and warn about asking for more tasks
than you have nodes for in arbitray mode.
-- Added "account" field to job and step accounting information and sacct output.

Danny Auble
committed
-- Moved task layout to slurmctld instead of srun. Job step create returns
step_layout structure with hostnames and addresses that corrisponds
to those nodes.
-- Changed api slurm_lookup_allocation params,
resource_allocation_response_msg_t changed to job_alloc_info_response_msg_t
this structure is being renamed so contents are the same.
-- alter resource_allocation_response_msg_t see slurm.h.in
-- remove old_job_alloc_msg_t and function slurm_confirm_alloc
-- Slurm configuration files now support an "Include" directive to
include other files inline.

Danny Auble
committed
-- BLUEGENE New --enable-bluegene-emulation configure parameter to allow
running system in bluegene emulation mode. Only
really useful for developers.
-- New added new tool sview GUI for displaying slurm info.
-- fixed bug in step layout to lay out tasks correctly
* Changes in SLURM 1.2.0-pre1
=============================
-- Fix bug that could run a job's prolog more than once
-- Permit batch jobs to be requeued, scontrol requeue <jobid>
-- Send overcommit flag from srun in RPCs and have slurmd set SLURM_OVERCOMMIT
flag at batch job launch time.
-- Added new configuration parameter MessageTimeout (replaces #define in
the code)
* Changes in SLURM 1.1.33
=========================
- sched/wiki - Do not wait for job completion before permitting
additional jobs to be scheduled.
* Changes in SLURM 1.1.32
=========================
- If a job's stdout/err file names are unusable (bad path), use the
default names.
- sched/wiki - Fix logic to be compatible with select/cons_res plugin
for allocating individual processors within nodes.
- Fix job end time calculation when changed from an initial value of
INFINITE.

Christopher J. Morrone
committed
* Changes in SLURM 1.1.31
=========================
- Correctly identify a user's login shell when running "srun -b --uid"
as root. Use the --uid field for the /etc/passwd lookup instead of
getuid().
* Changes in SLURM 1.1.30
=========================

Christopher J. Morrone
committed
- Fix to make sure users don't include and exclude the same node in
their srun line.
- mpi/mvapich: Forcibly terminate job 60s after first MPI_Abort()
to avoid waiting indefinitely for hung processes.
- proctrack/sgi_job: Fix segv when destroying an active job container
with processes still running.
- Abort a job's stdout/err to srun if not processed within 5 minutes
(prevents node hanging in completing state if the srun is stopped).
* Changes in SLURM 1.1.29
=========================
- Fix bug which could leave orphan process put into background from
batch script.
* Changes in SLURM 1.1.28
=========================
- BLUEGENE - Fixed issue with nodes that return to service outside of an
admin state is now updated in the bluegene plugin.
- Fix for --get-user-env parsing of non-printing characters in users' logins.
- Restore "squeue -n localhost" support.
- Report lack of PATH env var as verbose message, not error in srun.
* Changes in SLURM 1.1.27
=========================
- Fix possible race condition for two simultaneous "scontrol show config"
calls resulting in slurm_xfree() Error: from read_config.c:642
- BLUEGENE - Put back logic to make a block fail a boot 3 times before
cancelling a users job.
- Fix problem using srun --exclude option for a job step.
- Fix problem generating slurmd error "Unrecognized request: 0" with
some compilers.
* Changes in SLURM 1.1.26
=========================
- In sched/wiki2, fixes for support of job features.
- In sched/wiki2, add "FLAGS=INTERACTIVE;" to GETJOBS response for
non-batch (not srun --batch) jobs.
* Changes in SLURM 1.1.25
=========================
- switch/elan: Fix for "Failed to initialise stats structure" from
libelan when ELAN_STATKEY > MAX_INT.
- Tune PMI support logic for better scalability and performance.
- Fix for running a task on each node of an allocation if not specified.
- In sched/wiki2, set TASKLIST for running jobs.
- In sched/wiki2, set STARTDATE for pending jobs with deferred start.
- Added srun --get-user-env option (for Moab scheduler).
* Changes in SLURM 1.1.24
=========================
- In sched/wiki2, add support for direct "srun --dependency=" use.
- mpi/mvapich: Add support for MVAPICH protocol version 6.
- In sched/wiki2, change "JOBMODIFY" command to "MODIFYJOB".
- In sched/wiki2, change "JOBREQUEUE" command to "REQUEUEJOB".
- For sched/wiki2, permit normal user to specify arbitrary job id.
- In sched/wiki2, set buffer pointer to NULL after free() to avoid
possible memory corruption.
- In sched/wiki2, report a job's exit code on completion.
- For AIX, fix mail for job event notification.
- Add documentation for propagation options in man srun and slurm.conf.
* Changes in SLURM 1.1.23
=========================
- Fix bug in non-blocking connect() code affecting AIX.

Christopher J. Morrone
committed
* Changes in SLURM 1.1.22
=========================
- Add squeue option to print a job step's task count (-o %A).
- Initialize forward_struct to avoid trying to free a bad pointer,
patch from Anton Blanchard (SAMBA).
- In sched/wiki2, fix fatal race condition on slurmctld startup.
- Fix for displaying launching verbose messages for each node under the
tree instead of just the head one.
- Fix job suspend bug, job accounting plugin would SEGV when given a
bad job ID.
* Changes in SLURM 1.1.21
=========================
- BLUEGENE - Wait on a fini to make sure all threads are finished before
cleaning up.
- BLUEGENE - replacements to not destroy lists but just empty it to avoid
losing the pointer to the list in the block allocator.
- BLUEGENE - added --enable-bluegene-emulation configure option to 1.1
- In sched/wiki2, enclose a job's COMMENT value in double quotes.
- In sched/wiki2, support newly defined SIGNALJOB command.
- In sched/wiki2, maintain open event socket, don't open and close
for each event.
- In sched/wiki2, fix for scalability problem starting large jobs.
- Fix logic to execute a batch job step (under an existing resource
allocation) as needed by LSF.
- Patches from Hongjia Cao (pmi finialize issues and type declaration)
- Delete pending job if it's associated partition is deleted.
- fix for handling batch steps completing correctly and setting the
return code.
- Altered ncurses check to make sure programs can link before saying we
have a working curses lib and header.
- Fixed an init issue with forward_struct_init not being set correctly in
a few locations in the slurmd.
- Fix for user to use the NodeHostname (when specified in the slurm.conf file)
to start jobs on.

Christopher J. Morrone
committed
* Changes in SLURM 1.1.20
=========================
- Added new SPANK plugin hook slurm_spank_local_user_init() called
from srun after node allocation.
- Fixed bug with hostfile support not working on a direct srun
* Changes in SLURM 1.1.19
=========================
- BLUEGENE - make sure the order of blocks read in from the bluegene.conf
are created in that order (static mode).
- Fix logic in connect(), slurmctld fail-over was broken in v1.1.18.
- Fix logic to calculate the correct timeout for fan out.
* Changes in SLURM 1.1.18
=========================
- In sched/wiki2, add support for EHost and EHostBackup configuration
parameters in wiki.conf file
- In sched/wiki2, fix memory management bug for JOBWILLRUN command.
- In sched/wiki2, consider job Busy while in Completing state for
KillWait+10 seconds (used to be 30 seconds).
- BLUEGENE - Fixes to allow full block creation on the system and not to add
passthrough nodes to the allocation when creating a block.
- BLUEGENE - Fix deadlock issue with starting and failing jobs at the same
time
- Make connect() non-blocking and poll() with timeout to avoid huge
waits under some conditions.
- Set "ENVIRONMENT=BATCH" environment variable for "srun --batch" jobs only.
- Add logic to save/restore select/cons_res state information.

Christopher J. Morrone
committed
- BLUEGENE - make all sprintf's into snprintf's

Christopher J. Morrone
committed
- Fix for "srun -A" segfault on a node failure.
* Changes in SLURM 1.1.17
=========================
- BLUEGENE - fix to make dynamic partitioning not go create block where
there are nodes that are down or draining.
- Fix srun's default node count with an existing allocation when neither
SLURM_NNODES nor -N are set.
- Stop srun from setting SLURM_DISTRIBUTION under job steps when a
specific was not explicitly requested by the user.
* Changes in SLURM 1.1.16
=========================
- BLUEGENE - fix to make prolog run 5 minutes longer to make sure we have
enough time to free the overlapping blocks when starting a new job on a
block.
- BLUEGENE - edit to the libsched_if.so to read env and look at
MPIRUN_PARTITION to see if we are in slurm or running mpirun natively.
- Plugins are now dlopened RTLD_LAZY instead of RTLD_NOW.
* Changes in SLURM 1.1.15
=========================
- BLUEGENE - fix to be able to create static partitions
- Fixed fanout timeout logic.
- Fix for slurmctld timeout on outgoing message (Hongjia Cao, NUDT.edu.cn).
* Changes in SLURM 1.1.14
=========================
- In sched/wiki2: report job/node id and state only if no changes since
time specified in request.
- In sched/wiki2: include a job's exit code in job state information.
- In sched/wiki2: add event notification logic on job submit and completion.
- In sched/wiki2: add support for JOBWILLRUN command type.
- In sched/wiki2: for job info, include required HOSTLIST if applicable.
- In sched/wiki2: for job info, replace PARTITIONMASK with RCLASS (report
partition name associated with a job, but no task count)
- In sched/wiki2: for job and node info, report all data if TS==0,
volitile data if TS<=update_time, state only if TS>update_time
- In sched/wiki2: add support for CMD=JOBSIGNAL ARG=jobid SIGNAL=name or #
- In sched/wiki2: add support for CMD=JOBMODIFY ARG=jobid [BANK=name]
[TIMELIMIT=minutes] [PARTITION=name]
- In sched/wiki2: add support for CMD=INITIALIZE ARG=[USEHOSTEXP=T|F]
[EPORT=#]; RESPONSE=EPORT=# USEHOSTEXP=T
- In sched/wiki2: fix memory leak.
- Fix sinfo node state filtering when asking for idle nodes that are also
draining.
- Add Fortran extension to slurm_get_rem_time() API.
- Fix bug when changing the time limit of a running job that has previously
been suspended (formerly failed to account for suspend time in setting
termination time).
- fix for step allocation to be able to specify only a few nodes in a
step and ask for more that specified.
- patch from Hongjia Cao for forwarding logic
- BLUEGENE - able to allocate specific nodes without locking up.
- BLUEGENE - better tracking of blocks that are created dynamically,
less hitting the db2.
* Changes in SLURM 1.1.13
=========================
- Fix hang in sched/wiki2 if Moab stops responding responding when
response is outgoing.
- BLUEGENE - fix to make sure the block is good to go when picking it
- BLUEGENE - add libsched_if.so so mpirun doesn't try to create a block
by itself.
- Enable specification of srun --jobid=# option with --batch (for user root).
- Verify that job actually starts when requested by sched/wiki2.
- Add new wiki.conf parameters: EPort and JobAggregationTime for event
notification logic (see wiki.conf man page for details)
* Changes in SLURM 1.1.12
=========================
- Sched/wiki2 to report a job's account as COMMENT response to GETJOBS
request.
- Add srun option "--comment" (maps to job account until slurm v1.2,
needed for Moab scheduler functionality).
- fixed some timeout issues in the controller hopefully stopping all the
issues with excessive timeouts.
- unit conversion (i.e. 1024 => 1k) only happens on bgl systems for node
count.
- Sched/wiki2 to report a job's COMPETETIME and SUSPENDTIME in GETJOBS
response.
- Added support for Mellanox's version of mvapich-0.9.7.
* Changes in SLURM 1.1.11
=========================
- Update file headers adding permission to link with OpenSSL.
- Enable sched/wiki2 message authentication.
- Fix libpmi compilation issue.
- Remove "gcc-c++ python" from slurm.spec BuildRequires. It breaks
the AIX build, so we'll have to find another way to deal with that.
* Changes in SLURM 1.1.10
=========================
-- task distribution fix for steps that are smaller than job allocation.
-- BLUEGENE - fix to only send a success when block was created when trying
to allocate the block.
-- fix so if slurm_send_recv_node_msg fails on the send the auth_cred returned
by the resp is NULL.
-- Fix switch/federation plugin so backup controller can assume control
repeatedly without leaking or corrupting memory.
-- Add new error code (for Maui/Moab scheduler): ESLURM_JOB_HELD
-- Tweak slurmctld's node ping logic to better handle failed nodes with
hierarchical communications fail-over logic.
-- Add support for sched/wiki specific configuration file "wiki.conf".
-- Added sched/wiki2 plugin (new experimental wiki plugin).
* Changes in SLURM 1.1.9
========================

Christopher J. Morrone
committed
-- BLUEGENE - fix to handle a NO_VAL sent in as num procs in the job
description.
-- Fix bug in slurmstepd code for parsing --multi-prog command script.
Parser was failing for commands with no arguments.
-- Fix bug to check unsigned ints correctly in bitstring.c
-- Alter node count covert to kilo to only convert number divisible by
1024 or 512
* Changes in SLURM 1.1.8
========================
-- Added bug fixes (fault-tolerance and memory leaks) from Hongjia Cao
<hjcao@nudt.edu.cn>
-- Gixed some potential BLUEGENE issues with the bridge log file not having
a mutex around the fclose and fopen.
-- BLUEGENE - srun -n procs now regristers correctly
-- Fixed problem with reattach double allocating step_layout->tids
-- BLUEGENE - fix race condition where job is finished before it starts.
* Changes in SLURM 1.1.7
========================
-- BLUEGENE - fixed issue with doing an allocation for nodes since asking
for 32,128, or 512 all mean 1 to the controller.

Christopher J. Morrone
committed
-- Add "Include" directive to slurm.conf files. If "Include" is found
at the beginning of a line followed by whitespace and then
the full path to a file, that file is included inline with the current
slurm.conf file.
* Changes in SLURM 1.1.6
========================
-- Improved task layout for relative positions
-- Fixed heterogeous cpu overcommit issue
-- Fix bug where srun would hang if it ran on one node and that
node's slurmd died
-- Fix bug where srun task layout would be bad when min-max node range is
specified (e.g. "srun -N1-4 ...")
-- Made slurmctld_conf.node_prefix only be set on Bluegene systems.
-- Fixed a race condition in the controller to make it so a plugin thread
wouldn't be able to access the slurmctld_conf structure before it was
filled.
* Changes in SLURM 1.1.5
========================
-- Ignore partition's MaxNodes for SlurmUser and root.
-- Fix possible memory corruption with use of PMI_KVS_Create call.
-- Fix race condition when multiple PMI_KVS_Barrier calls.
-- Fix logic in which slurmctld outgoing RPC requests could get delayed.
-- Fix logic for laying out steps without a hostlist.
* Changes in SLURM 1.1.4
========================
-- Improve error handling in hierarchical communications logic.
* Changes in SLURM 1.1.3
========================
-- Fix big-endian bug in the bitstring code which plagued AIX.
-- Fix bug in handling srun's --multi-prog option, could go off end of buffer.
-- Added support for job step completion (and switch window release) on
subset of allocated nodes.
-- BLUEGENE - removed configure option --with-bg-link bridge is linked with
dlopen now no longer needing fake database so files on frontend node.
-- BLUEGENE - implemented use of rm_get_partition_info instead of
...partitions_info which has made a much better design improving stability.
-- Streamline PMI communications and increase timeouts for highly parallel
jobs. Improves scalability of PMI.
* Changes in SLURM 1.1.2
========================
-- Fix bug in jobcomp/filetxt plugin to report proper NodeCnt when a job
fails due to a node failure.
-- Fix Bluegene configure to work with the new 64bit libs.
-- Fix bug in controller that causes it to segfault when hit with a malformed
message.
-- For "srun --attach=X" to other users job, report an error and exit (it
previously just hung).
-- BLUEGENE - fix for doing correct small block logic on user error.
-- BLUEGENE - Added support in slurmd to create a fake libdb2.so if it
doesn't exist so smap won't seg fault
-- BLUEGENE - "scontrol show job" reports "MaxProcs=None" and "Start=None"
if values are not specified at job submit time
-- Add retry logic for PMI communications, may be needed for highly parallel
jobs.
-- Fix bug in slurmd where variable is used in logging message after freed
(slurmstepd rank info).
-- Fix bug in scontrol show daemons if NodeName=localhost will work now to
display slurmd as place where it is running.
-- Patch from HP for init nodes before init_bitmaps
-- ctrl-c killed sruns will result in job state as cancelled instead of
completed.
-- BLUEGENE - added configure option --with-bg-link to choose dynamic linking
or static linking with the bridgeapi.
* Changes in SLURM 1.1.1
========================
-- Fix bug in packing job suspend/resume RPC.
-- If a user breaks out of srun before the allocation takes place, mark the
job as CANCELLED rather than COMPLETED and change its start and end time
to that time.
-- Fix bug in PMI support that prevented use of second PMI_Barrier call.
This fix is needed for MVAPICH2 use.
-- Add "-V" options to slurmctld and slurmd to print version number and exit.
-- Fix scalability bug in sbcast.
-- Fix bug in cons_res allocation strategy.
-- Fix bug in forwarding with mpi
-- Fix bug sacct forwarding with stat option
-- Added nodeid to sacct stat information
-- cleaned up way slurm_send_recv_node_msg works no more clearing errno
-- Fix error handling bug in the networking code that causes the slurmd to
xassert if the server is not running when the slurmd tries to register.
* Changes in SLURM 1.1.0
========================
Loading
Loading full blame...