Newer
Older
This file describes changes in recent versions of SLURM. It primarily
documents those changes that are of interest to users and admins.
* Changes in SLURM 2.5.0.pre1
=============================
-- Add new output to "scontrol show configuration" of LicensesUsed. Output is
"name:used/total"
-- Changed jobacct_gather plugin infrastructure to be cleaner and easier to
maintain.
-- Change license option count separator from "*" to ":" for consistency with
the gres option (e.g. "--licenses=foo:2 --gres=gpu:2"). The "*" will still
be accepted, but is no longer documented.
-- Permit more than 100 jobs to be scheduled per node (new limit is 10,000
jobs).
-- Restructure of srun code to allow outside programs to utilize existing
logic.
-- Cray - Improve support for zero compute note resource allocations.
Partition used can now be configured with no nodes nodes.
-- BGQ - make it so srun -i<taskid> works correctly.
-- Fix parse_uint32/16 to complain if a non-digit is given.
-- Add SUBMITHOST to job state passed to Moab vial sched/wiki2. Patch by Jon
Bringhurst (LANL).
-- BGQ - Fix issue when running with AllowSubBlockAllocations=Yes without
compiling with --enable-debug
-- Modify scontrol to require "-dd" option to report batch job's script. Patch
from Don Albert, Bull.
-- Modify SchedulerParamters option to match documentation: "bf_res="
changed to "bf_resolution=". Patch from Rod Schultz, Bull.
-- Fix bug that clears job pending reason field. Patch fron Don Lipari, LLNL.
-- In etc/init.d/slurm move check for scontrol after sourcing
/etc/sysconfig/slurm. Patch from Andy Wettstein, University of Chicago.
-- Fix in scheduling logic that can delay jobs with min/max node counts.
-- BGQ - fix issue where if a step uses the entire allocation and then
the next step in the allocation only uses part of the allocation it gets
the correct cnodes.
-- BGQ - Fix checking for IO on a block with new IBM driver V1R1M1 previous
function didn't always work correctly.
-- BGQ - Fix issue when a nodeboard goes down and you want to combine blocks
to make a larger small block and are running with sub-blocks.
-- BLUEGENE - Better logic for making small blocks around bad nodeboard/card.
-- BGQ - When using an old IBM driver cnodes that go into error because of
a job kill timeout aren't always reported to the system. This is now
handled by the runjob_mux plugin.
-- BGQ - Added information on how to setup the runjob_mux to run as SlurmUser.
-- Improve memory consumption on step layouts with high task count.
-- BGQ - quiter debug when the real time server comes back but there are
still messages we find when we poll but haven't given it back to the real
time yet.
-- BGQ - fix for if a request comes in smaller than the smallest block and
we must use a small block instead of a shared midplane block.
-- Fix issues on large jobs (>64k tasks) to have the correct counter type when
packing the step layout structure.
-- BGQ - fix issue where if a user was asking for tasks and ntasks-per-node
but not node count the node count is correctly figured out.
-- Move logic to always use the 1st alphanumeric node as the batch host for
batch jobs.
-- BLUEGENE - fix race condition where if a nodeboard/card goes down at the
same time a block is destroyed and that block just happens to be the
smallest overlapping block over the bad hardware.
-- Fix bug when querying accounting looking for a job node size.
-- BLUEGENE - fix possible race condition if cleaning up a block and the
removal of the job on the block failed.
-- BLUEGENE - fix issue if a cable was in an error state make it so we can
check if a block is still makable if the cable wasn't in error.
-- Put nodes names in alphabetic order in node table.
-- If preempted job should have a grace time and preempt mode is not cancel
but job is going to be canceled because it is interactive or other reason
it now receives the grace time.
-- BGQ - Modified documents to explain new plugin_flags needed in bg.properties
in order for the runjob_mux to run correctly.
-- BGQ - change linking from libslurm.o to libslurmhelper.la to avoid warning.
-- Improve task binding logic by making fuller use of HWLOC library,
especially with respect to Opteron 6000 series processors. Work contributed
by Komoto Masahiro.
-- Add new configuration parameter PriorityFlags, based upon work by
Carles Fenoy (Barcelona Supercomputer Center).
-- Modify the step completion RPC between slurmd and slurmstepd in order to
eliminate a possible deadlock. Based on work by Matthieu Hautreux, CEA.
-- Change the owner of slurmctld and slurmdbd log files to the appropriate
user. Without this change the files will be created by and owned by the
user starting the daemons (likely user root).
-- Reorganize the slurmstepd logic in order to better support NFS and
Kerberos credentials via the AUKS plugin. Work by Matthieu Hautreux, CEA.
-- Fix bug in allocating GRES that are associated with specific CPUs. In some
cases the code allocated first available GRES to job instead of allocating
GRES accessible to the specific CPUs allocated to the job.
-- spank: Add callbacks in slurmd: slurm_spank_slurmd_{init,exit}
and job epilog/prolog: slurm_spank_job_{prolog,epilog}
-- spank: Add spank_option_getopt() function to api
-- Change resolution of switch wait time from minutes to seconds.
-- Added CrpCPUMins to the output of sshare -l for those using hard limit
accounting. Work contributed by Mark Nelson.
-- Added mpi/pmi2 plugin for complete support of pmi2 including acquiring
additional resources for newly launched tasks. Contributed by Hongjia Cao,
NUDT.
-- BGQ - fixed issue where if a user asked for a specific node count and more
tasks than possible without overcommit the request would be allowed on more
nodes than requested.
-- Add support for new SchedulerParameters of bf_max_job_user, maximum number
of jobs to attempt backfilling per user. Work by Bjørn-Helge Mevik,
University of Oslo.
-- BLUEGENE - fixed issue where MaxNodes limit on a partition only limited
larger than midplane jobs.
-- Added cpu_run_min to the output of sshare --long. Work contributed by
Mark Nelson.
-- BGQ - allow regular users to resolve Rack-Midplane to AXYZ coords.
-- Add sinfo output format option of "%R" for partition name without "*"
appended for default partition.
-- Cray - Add support for zero compute note resource allocation to run batch
script on front-end node with no ALPS reservation. Useful for pre- or post-
processing.
-- Support for cyclic distribution of cpus in task/cgroup plugin from Martin
Perry, Bull.
-- GrpMEM limit for QOSes and associations added Patch from Bjørn-Helge Mevik,
University of Oslo.
-- Various performance improvements for up to 500% higher throughput depending
upon configuration. Work supported by the Oak Ridge National Laboratory
Extreme Scale Systems Center.
-- Added jobacct_gather/cgroup plugin. It is not advised to use this in
production as it isn't currently complete and doesn't provide an equivalent
substitution for jobacct_gather/linux yet. Work by Martin Perry, Bull.
* Changes in SLURM 2.4.0.pre4
=============================
-- Add logic to cache GPU file information (bitmap index mapping to device
file number) in the slurmd daemon and transfer that information to the
slurmstepd whenever a job step is initiated. This is needed to set the
appropriate CUDA_VISIBLE_DEVICES environment variable value when the
devices are not in strict numeric order (e.g. some GPUs are skipped).
Based upon work by Nicolas Bigaouette.
-- BGQ - Remove ability to make a sub-block with a geometry with one or more
of it's dimensions of length 3. There is a limitation in the IBM I/O
subsystem that is problematic with multiple sub-blocks with a dimension
of length 3, so we will disallow them to be able to be created. This
mean you if you ask the system for an allocation of 12 c-nodes you will
be given 16. If this is ever fix in BGQ you can remove this patch.
-- BLUEGENE - Better handling blocks that go into error state or deallocate
while jobs are running on them.
-- BGQ - fix for handling mix of steps running at same time some of which
are full allocation jobs, and others that are smaller.
-- BGQ - fix for core dump after running multiple sub-block jobs on static
blocks.
-- BGQ - fixed sync issue where if a job finishes in SLURM but not in mmcs
for a long time after the SLURM job has been flushed from the system
we don't have to worry about rebooting the block to sync the system.
-- BGQ - In scontrol/sview node counts are now displayed with
CnodeCount/CnodeErrCount so to point out there are cnodes in an error state
on the block. Draining the block and having it reboot when all jobs are
gone will clear up the cnodes in Software Failure.
-- Change default SchedulerParameters max_switch_wait field value from 60 to
300 seconds.
-- BGQ - catch errors from the kill option of the runjob client.
-- BLUEGENE - make it so the epilog runs until slurmctld tells it the job is
gone. Previously it had a timelimit which has proven to not be the right
thing.
-- FRONTEND - fix issue where if a compute node was in a down state and
an admin updates the node to idle/resume the compute nodes will go
instantly to idle instead of idle* which means no response.
-- Fix regression in 2.4.0.pre3 where number of submitted jobs limit wasn't
being honored for QOS.
-- Cray - Enable logging of BASIL communications with environment variables.
Set XML_LOG to enable logging. Set XML_LOG_LOC to specify path to log file
or "SLURM" to write to SlurmctldLogFile or unset for "slurm_basil_xml.log".
Patch from Steve Tronfinoff, CSCS.
-- FRONTEND - if a front end unexpectedly reboots kill all jobs but don't
mark front end node down.
-- FRONTEND - don't down a front end node if you have an epilog error
-- BLUEGENE - if a job has an epilog error don't down the midplane it was
running on.
-- BGQ - added new DebugFlag (NoRealTime) for only printing debug from
state change while the realtime server is running.
-- Fix multi-cluster mode with sview starting on a non-bluegene cluster going
to a bluegene cluster.
-- BLUEGENE - ability to show Rack Midplane name of midplanes in sview and
scontrol.
* Changes in SLURM 2.4.0.pre3
=============================
-- Let a job be submitted even if it exceeds a QOS limit. Job will be left
in a pending state until the QOS limit or job parameters change. Patch by
Phil Eckert, LLNL.
-- Add sacct support for the option "--name". Work by Yuri D'Elia, Center for
Biomedicine, EURAC Research, Italy.
-- Add an srun shepard process to cancel a job and/or step of the srun process
is killed abnormally (e.g. SIGKILL).
-- BGQ - handle deadlock issue when a nodeboard goes into an error state.
-- BGQ - more thorough handling of blocks with multiple jobs running on them.
-- Fix man2html process to compile in the build directory instead of the
source dir.
-- Behavior of srun --multi-prog modified so that any program arguments
specified on the command line will be appended to the program arguments
specified in the program configuration file.
-- Add new command, sdiag, which reports a variety of job scheduling
Loading
Loading full blame...