Newer
Older
This file describes changes in recent versions of Slurm. It primarily
documents those changes that are of interest to users and administrators.
* Changes in Slurm 17.11.0pre3
==============================
-- Added the following jobcomp/script environment variables: CLUSTER,
DEPENDENCY, DERIVED_EC, EXITCODE, GROUPNAME, QOS, RESERVATION, USERNAME.
The format of LIMIT (job time limit) has been modified to D-HH:MM:SS.
-- Fix QOS usage factor applying to individual TRES run minute usage.
-- Print numbers using exponential format if required to fit in allocated
field width. The sacctmgr and sshare commands are impacted.
-- Make it so a backup DBD doesn't attempt to create database tables and
relies on the primary to do so.

Danny Auble
committed
-- By default have Slurm dynamically link to libslurm.so instead of static
linking. If static linking is desired configure with
--without-shared-libslurm.

Danny Auble
committed
-- Change --workdir in sbatch to be --chdir as in all other commands (salloc,
srun).
-- Add WorkDir to the job record in the database.
-- Make the UsageFactor of a QOS work when a qos has the nodecay flag.
-- Add MaxQueryTimeRange option to slurmdbd.conf to limit accounting query
ranges when fetching job records.
-- Add LaunchParameters=batch_step_set_cpu_freq to allow the setting of the cpu
frequency on the batch step.
-- CRAY - Fix statically linked applications to CRAY's PMI.
-- Fix - Raise an error back to the user when trying to update currently
unsupported core-based reservations.
-- Do not print TmpDisk space as part of 'slurmd -C' line.

Alejandro Sanchez
committed
-- Fix to test MaxMemPerCPU/Node partition limits when scheduling, previously
only checked on submit.
-- Work for heterogeneous job support (complete solution in v17.11):
* Set SLURM_PROCID environment variable to reflect global task rank (needed
by MPI).
* Set SLURM_NTASKS environment variable to reflect global task count (needed
by MPI).
* In srun, if only some steps are allocated and one step allocation fails,
then delete all allocated steps.
* Get SPANK plungins working with heterogeneous jobs. The
spank_init_post_opt() function is executed once per job component.
* Modify sbcast command and srun's --bcast option to support heterogeneous
jobs.
* Set more environment variables for MPI: SLURM_GTIDS and SLURM_NODEID.
* Prevent a heterogeneous job allocation from including the same nodes in
multiple components (required by MPI jobs spanning components).
* Modify step create logic so that call components of a heterogeneous job
launched by a single srun command have the same step ID value.
-- Modify output of "--mpi=list" to avoid duplicates for version numbers in
mpi/pmix plugin names.
* Changes in Slurm 17.11.0pre2
==============================
-- Initial work for heterogeneous job support (complete solution in v17.11):
* Modified salloc, sbatch and srun commands to parse command line, job
script and environment variables to recognize requests for heterogeneous
jobs. Same commands also modified to set environment variables describing
each component of the heterogeneous job.
* Modified job allocate, batch job submit and job "will-run" requests to
pass a list of job specifications and get a list of responses.
* Modify slurmctld daemon to process a heterogeneous job request and create
multiple job records as needed.
* Added new fields to job record: pack_job_id, pack_job_offset and
pack_job_set (set of job IDs). Added to slurmctld state save/restore
logic and job information reported.
* Display new job fields in "scontrol show job" output.
* Modify squeue command to display heterogeneous job records using "#+#"
format. The squeue --job=# output lists all components of a heterogeneous
job.
* Modify scancel logic to cancel all components of a heterogeneous job with
a single request/RPC.
* Configuration parameter DebugFlags value of "HeteroJobs" added.
* Job requeue and suspend/resume modified to operate on all components of
a heterogeneous job with a single request/RPC.
* New web page added to describe heterogeneous jobs.
* Descriptions of new API added to man pages.
* Modified email notifications to only operate on the first job component.
* Purge heterogeneous job records at the same time and not by individual
* Modified logic for heterogeneous jobs submitted to multiple clusters
("--clusters=...") so the job will be routed to the cluster that is
expected to start all components earliest.
* Modified srun to create multiple job steps for heterogeneous job
allocations.
* Modified launch plugin to accept a pointer to job step options structure
rather than work from a single/common data structure.
-- Improve backfill scheduling algorithm with respect to starting jobs as soon
as possible while avoiding advanced reservations.
-- Add URG as an option to 'scancel --signal'.

Dominik Bartkiewicz
committed
-- Check if the buffer returned from slurm_persist_msg_pack() isn't NULL.
-- Modify all daemons to re-open log files on receipt of SIGUSR2 signal. This
is much than using SIGHUP to re-read the configuration file and rebuild
various tables.
-- Add PrivateData=events configuration parameter
-- Work for heterogeneous job support (complete solution in v17.11):
* Add pointer to job option structure to job_step_create_allocation()
* Parallelize task launch for heterogeneous job allocations (initial work).
* Make packjobid, packjoboffset, and packjobidset fields available in squeue
output.
* Modify smap command to display heterogeneous job records using "#+#"
format.
* Add srun --pack-group and --mpi-combine options to control job step
launch behaviour (not fully implemented).
* Add pack job component ID to srun --label output (e.g. "P0 1:" for
job component 0 and task 1).
* jobcomp/elasticsearch: Add pack_job_id and pack_job_offset fields.
* sview: Modified to display pack job information.
* Major re-write of task state container logic to support for list of
containers rather than one container per srun command.
* Add srun pack job environment variables when performing job allocation.
-- Set Reason=dependency over Reason=JobArrayTaskLimit for pending jobs.
-- Add slurm.conf configuration parameters SlurmctldSyslogDebug and
SlurmdSyslogDebug to control which messages from the slurmctld and slurmd
daemons get written to syslog.
-- Add slurmdbd.conf configuration parameter DebugLevelSyslog to control which
messages from the slurmdbd daemon get written to syslog.
-- Fix handling of GroupUpdateForce option.
-- Work for heterogeneous job support (complete solution in v17.11):
* Add support to sched/backfill for concurrent allocation of all pack job
components including support of --time-min option.
* Defer initiation of a heterogeneous job until a components can be started
at the same time, taking into consideration association and QOS limits
for the job as a whole.
* Perform limit check on heterogeneous job as a whole at submit time to
reject jobs that will never be able to run.
* Add pack_job_id and pack_job_offset to accounting database.
* Modified sacct to accept pack job ID specification using "#+#" notation.
* Modified sstat to accept pack job ID specification using "#+#" notation.
-- Clear a job's "wait reason" value of BeginTime" after that time has passed.
Previously a readon of "BeginTime" could be reported long after the job's
requested begin time had passed.
-- Split group_info in slurm_ctl_conf_t into group_force and group_time.
-- Work for heterogeneous job support (complete solution in v17.11):
* Fix I/O race condition on step termination for srun launching multiple
pack job groups.
* If prolog is running when attempting to signal a step, then return EAGAIN
and retry rather than simply returning SLURM_ERROR and aborting.
* Modify launch/slurm plugin to signal all components of a pack job rather
than just the one (modify to use a list of step context records).
* Add logic to support srun --mpi-combine option.
* Set up debugger data structures.
* Disable cancellation of individual component while the job is pending.
* Modify scontrol job hold/release and update to operate with heterogeneous
job id specification (e.g. "scontrol hold 123+4").
* If srun lacks application specification for some component, the next one
specified will be used for earlier components.
* Changes in Slurm 17.11.0pre1
==============================
-- Interpet all format options in output/error file to log prolog errors. Prior
logic only supported "%j" (job ID) option.

Danny Auble
committed
-- Add the configure option --with-shared-libslurm which will link to
libslurm.so instead of libslurm.o thus reducing the footprint of all the
binaries.
-- In switch plugin, added plugin_id symbol to plugins and wrapped
switch_jobinfo_t with dynamic_plugin_data_t in interface calls in
order to pass switch information between clusters with different switch
types.
-- Switch naming of acct_gather_infiniband to acct_gather_interconnect
-- Add a last_sched_eval timestamp to record when a job was last evaluated
by the main scheduler or backfill.
-- Add scancel "--hurry" option to avoid staging out any burst buffer data.
-- Simplify the sched plugin interface.
-- Add new advanced reservation flags of "weekday" (repeat on each weekday;
Monday through Friday) and "weekend" (repeat on each weekend day; Saturday
and Sunday).
-- Add new advanced reservation flag of "flex", which permits jobs requesting
the reservation to begin prior to the reservation's start time and use
resources inside or outside of the reservation. A typical use case is to
prevent jobs not explicitly requesting the reservation from using those
reserved resources rather than forcing jobs requesting the reservation to
use those resources in the time frame reserved.
-- Node "OS" field expanded from "sysname" to "sysname release version" (e.g.
change from "Linux" to
"Linux 4.8.0-28-generic #28-Ubuntu SMP Sat Feb 8 09:15:00 UTC 2017").
-- jobcomp/elasticsearch - Add "job_name" and "wc_key" fields to stored
information.
-- jobcomp/filetxt - Add ArrayJobId, ArrayTaskId, ReservationName, Gres,
Account, QOS, WcKey, Cluster, SubmitTime, EligibleTime, DerivedExitCode and
ExitCode.
-- scontrol modified to report core IDs for reservation containing individual
cores.
-- MYSQL - Get rid of table join during rollup which speeds up the process
dramatically on large job/step tables.
-- Add ability to define features on clusters for directing federated jobs to
different clusters.
-- Add new RPC to process multiple federation RPCs in a single communication.
-- Modify slurm_load_jobs() function to load job information from all clusters
in a federation.
-- Add squeue --local and --sibling options to modify filtering of jobs on
federated clusters.
-- Add SchedulerParameters option of bf_max_job_user_part to specifiy the
maximum number of jobs per user for any single partition. This differs from
bf_max_job_user in that a separate counter is applied to each partition
rather than having a single counter per user applied to all partitions.
-- Modify backfill logic so that bf_max_job_user, bf_max_job_part and
bf_max_job_user_part options can all be used independently of each other.
-- Add sprio -p/--partition option to filter jobs by partition name.
Loading
Loading full blame...