RELEASE_NOTES 7.54 KiB
RELEASE NOTES FOR SLURM VERSION 2.0
10 February 2009 (after SLURM 1.4.0-pre7 released)
IMPORTANT NOTE:
SLURM state files in version 2.0 are different from those of version 1.3.
After installing SLURM version 2.0, plan to restart without preserving
jobs or other state information. While SLURM version 1.3 is still running,
cancel all pending and running jobs (e.g.
"scancel --state=pending; scancel --state=running"). Then stop and restart
daemons with the "-c" option or use "/etc/init.d/slurm startclean".
If using the slurmdbd (SLURM DataBase Daemon) you must update this first.
The 2.0 slurmdbd will work with SLURM daemons at version 1.3.7 and above.
You will not need to update all clusters at the same time, but it is very
important to update slurmdbd first and having it running before updating
any other clusters making use of it. No real harm will come from updating
your systems before the slurmdbd, but they will not talk to each other
until you do.
There are substantial changes in the slurm.conf configuration file. It
is recommended that you rebuild your configuration file using the tool
doc/html/configurator.html that comes with the distribution.
SLURM can continue to be used as a simple resource manager, but optional
plugins support sophisticated scheduling algorithms. These plugins do require
the use of a database containing user and bank account information, so
more administration work is required. SLURM's modular design lets you
control the functionality that you want it to provide.
HIGHLIGHTS
* Sophisticated scheduling algorithms are available in a new plugin. Jobs
can be prioritized based upon their age, size and/or fair-share resource
allocation using hierarchical bank accounts. For more information see:
https://computing.llnl.gov/linux/slurm/job_priority.html
* An assortment of resource limits can be imposed upon individual users
and/or hierarchical bank accounts such as maximum job time limit, maximum
job size and maximum number of running jobs. For more information see:
https://computing.llnl.gov/linux/slurm/resource_limits.html
* Advanced reservations can be made to insure resources will be available when
needed. For more information see:
https://computing.llnl.gov/linux/slurm/reservations.html
* Idle nodes can now be completely powered down when idle and automatically
restarted when their is work available. For more information see:
https://computing.llnl.gov/linux/slurm/power_save.html
* SLURM has been modified to allocate specific cores to jobs and job steps in
the centralized scheduler rather than the daemons running on the individual
compute nodes. This permits effective preemption or gang schedule jobs.
* A new configuration parameter, PrologSlurmctld, can be used to support the
booting of different operating systems for each job. See "man slurm.conf"
for details.
* Preemption of jobs from lower priority partitions in order to execute jobs
in higher priority partitions is now supported. The jobs from the lower
priority partition will resume once preempting job completes. For more
information see:
https://computing.llnl.gov/linux/slurm/preempt.html
* Add support for optimize resource allocation with respect to network
topology. Requires switch configuration information be added to slurm.conf.
* Support added for Sun Constellation system with optimized resource allocation
for a 3-dimensional torus interconnect. For more information see:
https://computing.llnl.gov/linux/slurm/sun_const.html
* Support added for IBM BlueGene/P systems, including High Throughput Computing
(HTC) mode.
CONFIGURATION FILE CHANGES (see "man slurm.conf" for details)
* The default AuthType is now "auth/munge" rather than "auth/none".
* The default CryptoType is now "crypto/munge". OpenSSL is no longer required
by SLURM in the default configuration.
* DefaultTime has been added to specify a default job time limit in the
partition. If not set, uses the partition's MaxTime.
* PrologSlurmctld has been added and can be used to boot nodes into a
particular state for each job.
* DefMemPerTask has been removed. Use DefMemPerCPU or DefMemPerNode instead.
* KillOnBadExit added to immediately terminate a job step whenever any tasks
terminates with a non-zero exit code.
* Added new node state of "FUTURE". These node records are created in SLURM
tables for future use without a reboot of the SLURM daemons, but are not
reported by any SLURM commands or APIs.
* BatchStartTime has been added to control how long to wait for a batch job
to start (complete Prolog, load environment for Moab, etc.).
* CompleteTime has been added to control how long to wait for a job's
completion before allocating already released resources to pending jobs.
* OverTimeLimit added to permit jobs to exceed their (soft) time limit by a
configurable amount. Backfill scheduling will be based upon the soft time
limit.
* For select/cons_res or sched/gang only: Each nodes processor count must be
specified in the configuration file. Additional resources found by SLURM
daemons on the compute nodes will not be used.
* DebugFlags added to provide detailed logging for specific subsystems.
* Added job priority plugin. Default for PriorityType is "priority/basic"
which is the same logic SLURM has today (job priorities are assigned at
submit time with decreasing value). "priority/multifactor" is a new plugin
which utilizes logic to set a priority on a job based on many different
configuration parameters as described here:
https://computing.llnl.gov/linux/slurm/job_priority.html
* The task/affinity plugin will automatically bind a job step to the CPUs
it has been allocated. The entity bound to (sockets, cores or threads)
will be automatically set based upon the allocation size and task count
SLURM's SPANK cpuset plugin is no longer be needed.
* Added switch topology configuration options: TopologyPlugin, SwitchName,
Nodes, Switches.
* BLUEGENE - Added option DenyPassthrough in the bluegene.conf. Can be set
to any combination of X,Y,Z to not allow passthroughs when running in
dynamic layout mode. (see "man bluegene.conf" for details)
COMMAND CHANGES (see man pages for details)
* --task-mem and --job-mem options have been removed from salloc, sbatch and
srun. Use --mem-per-cpu or --mem instead.
* Added the srun option --preserve-env to pass the current values of
environment variables SLURM_NNODES and SLURM_NPROCS through to the
executable, rather than computing them from commandline parameters.
* --ctrl-comm-ifhn-addr option has been removed from the srun command (it is
no longer useful).
* Batch jobs have an environment variable SLURM_RESTART_COUNT set when
restarted.
* To create a partition using the scontrol command, use the "create" command
rather than "update" with a new partition name.
* Time format of all SLURM command set to ISO 8601 (yyyy-mm-ddThh:mm:ss)
unless the configure option "--disable-iso8601" is used at build time.
* sacct -S to status a job will no longer work. Use sstat from now on.
* sacct and sstat have been rewritten to have a more sacctmgr like feel
ACCOUNTING CHANGES
* Added ability for slurmdbd to archive and purge step and/or job records.
* Added support for Workload Characterization Key (WCKey) in accounting
records. This is an optional string that can be used to identify the type of
work being performed (in addition to user ID, account name, job name, etc.).
OTHER CHANGES
* Modify PMI_Get_clique_ranks() to return an array of integers rather
than a char * to satisfy PMI standard. Correct logic in
PMI_Get_clique_size() for when srun --overcommit option is used.