Newer
Older
This file describes changes in recent versions of SLURM. It primarily
documents those changes that are of interest to users and admins.
* Changes in SLURM 1.4.0-pre2
=============================
-- Remove srun's --ctrl-comm-ifhn-addr option (for PMI/MPICH2). It is no
longer needed.
-- Modify power save mode so that nodes can be powered off when idle. See
https://computing.llnl.gov/linux/slurm/power_save.html or
"man slurm.conf" (SuspendProgram and related parameters) for more
information.
-- Added configuration parameter PrologSlurmctld, which can be used to boot
nodes into a particular state for each job. See "man slurm.conf" for
details.
-- Add configuration parameter CompleteTime to control how long to wait for
a job's completion before allocating already released resources to pending
jobs. This can be used to reduce fragmentation of resources. See
"man slurm.conf" for details.
-- Make default CryptoType=crypto/munge. OpenSSL is now completely optional.
-- Make default AuthType=auth/munge rather than auth/none.
* Changes in SLURM 1.4.0-pre1
=============================
-- Save/restore a job's task_distribution option on slurmctld retart.
NOTE: SLURM must be cold-started on converstion from version 1.3.x.
-- Remove task_mem from job step credential (only job_mem is used now).
-- Remove --task-mem and --job-mem options from salloc, sbatch and srun
(use --mem-per-cpu or --mem instead).
-- Remove DefMemPerTask from slurm.conf (use DefMemPerCPU or DefMemPerNode
instead).
-- Modify slurm_step_launch API call. Move launch host from function argument
to element in the data structure slurm_step_launch_params_t, which is
used as a function argument.
-- Add state_reason_string to job state with optional details about why
a job is pending.
-- Make "scontrol show node" output match scontrol input for some fields
("Cores" changed to "CoresPerSocket", etc.).
-- Add support for a new node state "FUTURE" in slurm.conf. These node records
are created in SLURM tables for future use without a reboot of the SLURM
daemons, but are not reported by any SLURM commands or APIs.
* Changes in SLURM 1.3.9
========================
-- Fix jobs being cancelled by ctrl-C to have correct cancelled state in
accounting.
-- Slurmdbd will only cache user data, made for faster start up
-- Improved support for job steps in FRONT_END systems
-- Added support to dump and load association information in the controller
on start up if slurmdbd is unresponsive
-- BLUEGENE - Added support for sched/backfill plugin
-- sched/backfill modified to initiate multiple jobs per cycle.
-- Increase buffer size in srun to hold task list expressions. Critical
for jobs with 16k tasks or more.
-- Added support for eligible jobs and downed nodes to be sent to accounting
from the controller the first time accounting is turned on.
-- Correct srun logic to support --tasks-per-node option without task count.
-- Logic in place to handle multiple versions of RPCs within the slurmdbd.
THE SLURMDBD MUST BE UPGRADED TO THIS VERSION OR IT WILL NOT WORK TALK WITH
THE SLURMCTLD. Older versions of the slurmctld will continue to talk to
the new slurmdbd.
-- Add support for new job dependency type: singleton. Only one job from a
given user with a given name will execute with this dependency type.
From Matthieu Hautreux, CEA.
-- Updated contribs/python/hostlist: See "CHANGES" file in that directory
for details. From Kent Engstrom, NSC.
* Changes in SLURM 1.3.8
========================
-- Added PrivateData flags for Users, Usage, and Accounts to Accounting.
If using slurmdbd, set in the slurmdbd.conf file. Otherwise set in the
slurm.conf file. See "man slurm.conf" or "man slurmdbd.conf" for details.
-- Reduce frequency of resending job kill RPCs. Helpful in the event of
network problems or down nodes.
-- Fix memory leak caused under heavy load when running with select/cons_res
plus sched/backfill.
-- For salloc, if no local command is specified, execute the user's default
shell.
-- BLUEGENE - patch to make sure when starting a job blocks required to be
freed are checked to make sure no job is running on them. If one is found
we will requeue the new job. No job will be lost.
-- BLUEGENE - Set MPI environment variables from salloc.
-- BLUEGENE - Fix threading issue for overlap mode
-- Reject batch scripts containing DOS linebreaks.
-- BLUEGENE - Added wait for block boot to salloc
* Changes in SLURM 1.3.7
========================
-- Add jobid/stepid to MESSAGE_TASK_EXIT to address race condition when
a job step is cancelled, another is started immediately (before the
first one completely terminates) and ports are reused.
NOTE: This change requires that SLURM be updated on all nodes of the
cluster at the same time. There will be no impact upon currently running
jobs (they will ignore the jobid/stepid at the end of the message).
-- Added Python module to process hostslists as used by SLURM. See
contribs/python/hostlist. Supplied by Kent Engstrom, National
Supercomputer Centre, Sweden.
-- Report task termination due to signal (restored functionality present
in slurm v1.2).
-- Remove sbatch test for script size being no larger than 64k bytes.
The current limit is 4GB.
-- Disable FastSchedule=0 use with SchedulerType=sched/gang. Node
configuration must be specified in slurm.conf for gang scheduling now.
-- For sched/wiki and sched/wiki2 (Maui or Moab scheduler) disable the ability
of a non-root user to change a job's comment field (used by Maui/Moab for
storing scheduler state information).
-- For sched/wiki (Maui) add pending job's future start time to the state
info reported to Maui.
-- Improve reliability of job requeue logic on node failure.
-- Add logic to ping non-responsive nodes even if SlurmdTimeout=0. This permits
the node to be returned to use when it starts responding rather than
remaining in a non-usable state.
-- Honor HealthCheckInterval values that are smaller than SlurmdTimeout.
-- For non-responding nodes, log them all on a single line with a hostlist
expression rather than one line per node. Frequency of log messages is
dependent upon SlurmctldDebug value from 300 seconds at SlurmctldDebug<=3
to 1 second at SlurmctldDebug>=5.
-- If a DOWN node is resumed, set its state to IDLE & NOT_RESPONDING and
ping the node immediately to clear the NOT_RESPONDING flag.
-- Log that a job's time limit is reached, but don't sent SIGXCPU.
-- Fixed gid to be set in slurmstepd when run by root
-- Changed getpwent to getpwent_r in the slurmctld and slurmd
-- Increase timeout on most slurmdbd communications to 60 secs (time for
substantial database updates).
-- Treat srun option of --begin= with a value of now without a numeric
component as a failure (e.g. "--begin=now+hours").
-- Eliminate a memory leak associated with notifying srun of allocated
nodes having failed.
-- Add scontrol shutdown option of "slurmctld" to just shutdown the
slurmctld daemon and leave the slurmd daemons running.
-- Do not require JobCredentialPrivateKey or JobCredentialPublicCertificate
in slurm.conf if using CryptoType=crypto/munge.
* Changes in SLURM 1.3.6
========================
-- Add new function to get information for a single job rather than always
getting information for all jobs. Improved performance of some commands.
NOTE: This new RPC means that the slurmctld daemons should be updated
before or at the same time as the compute nodes in order to process it.
-- In salloc, sbatch, and srun replace --task-mem options with --mem-per-cpu
(--task-mem will continue to be accepted for now, but is not documented).
Replace DefMemPerTask and MaxMemPerTask with DefMemPerCPU, DefMemPerNode,
MaxMemPerCPU and MaxMemPerNode in slurm.conf (old options still accepted
for now, but mapped to "PerCPU" parameters and not documented). Allocate
a job's memory memory at the same time that processors are allocated based
upon the --mem or --mem-per-cpu option rather than when job steps are
initiated.
-- Altered QOS in accounting to be a list of admin defined states, an
account or user can have multiple QOS's now. They need to be defined using
'sacctmgr add qos'. They are no longer an enum. If none are defined
Normal will be the QOS for everything. Right now this is only for use
with MOAB. Does nothing outside of that.
-- Added spank_get_item support for field S_STEP_CPUS_PER_TASK.
-- Make corrections in spank_get_item for field S_JOB_NCPUS, previously
reported task count rather than CPU count.
-- Convert configuration parameter PrivateData from on/off flag to have
separate flags for job, partition, and node data. See "man slurm.conf"
for details.
-- Fix bug, failed to load DisableRootJobs configuration parameter.
-- Altered sacctmgr to always return a non-zero exit code on error and send
error messages to stderr.
-- Fix processing of auth/munge authtentication key for messages originating
in slurmdbd and sent to slurmctld.
-- If srun is allocating resources (not within sbatch or salloc) and MaxWait
is configured to a non-zero value then wait indefinitely for the resource
allocation rather than aborting the request after MaxWait time.
-- For Moab only: add logic to reap defunct "su" processes that are spawned by
slurmd to load user's environment variables.
-- Added more support for "dumping" account information to a flat file and
read in again to protect data incase something bad happens to the database.
-- Sacct will now report account names for job steps.
-- For AIX: Remove MP_POERESTART_ENV environment variable, disabling
poerestart command. User must explicitly set MP_POERESTART_ENV before
executing poerestart.
-- Put back notification that a job has been allocated resources when it was
pending.
-- Some updates to man page formatting from Gennaro Oliva, ICAR.
-- Smarter loading of plugins (doesn't stat every file in the plugin dir)
-- In sched/backfill avoid trying to schedule jobs on DOWN or DRAINED nodes.
-- forward exit_code from step completion to slurmdbd
-- Add retry logic to socket connect() call from client which can fail
when the slurmctld is under heavy load.
-- Fixed bug when adding associations to add correctly.
-- Added support for associations for user root.
-- For Moab, sbatch --get-user-env option processed by slurmd daemon
rather than the sbatch command itself to permit faster response
for Moab.
-- IMPORTANT FIX: This only effects use of select/cons_res when allocating
resources by core or socket, not by CPU (default for SelectTypeParameter).
We are not saving a pending job's task distribution, so after restarting
slurmctld, select/cons_res was over-allocating resources based upon an
invalid task distribution value. Since we can't save the value without
changing the state save file format, we'll just set it to the default
value for now and save it in Slurm v1.4. This may result in a slight
variation on how sockets and cores are allocated to jobs, but at least
Loading
Loading full blame...