Skip to content
Snippets Groups Projects
Commit 4e8d01fd authored by Moe Jette's avatar Moe Jette
Browse files

major mods to start RELEASE_NOTES for slurm v2.3

parent 8de2a07a
No related branches found
No related tags found
No related merge requests found
RELEASE NOTES FOR SLURM VERSION 2.2
1 December 2010
RELEASE NOTES FOR SLURM VERSION 2.3
3 January 2011
IMPORTANT NOTE:
If using the slurmdbd (SLURM DataBase Daemon) you must update this first.
The 2.2 slurmdbd will work with SLURM daemons of version 2.1.3 and above.
The 2.3 slurmdbd will work with SLURM daemons of version 2.1.3 and above.
You will not need to update all clusters at the same time, but it is very
important to update slurmdbd first and having it running before updating
any other clusters making use of it. No real harm will come from updating
......@@ -18,335 +18,79 @@ innodb_buffer_pool_size=64M
under the [mysqld] reference and restarting the mysqld. This is needed when
converting large tables over to the new database schema.
SLURM can be upgraded from version 2.1 to version 2.2 without loss of jobs or
SLURM can be upgraded from version 2.2 to version 2.3 without loss of jobs or
other state information.
HIGHLIGHTS
==========
* Slurmctld restart/reconfiguration operations have been altered.
NOTE: There will be no change in behavior unless partition configuration
or node Features/Weight are altered using the scontrol command to differ
from the contents of the slurm.conf configuration file.
Preserve current partition state information plus node Feature and Weight
state information after slurmctld receives a SIGHUP signal or is restarted
with the -R option. Recreate partition plus node information (except node
State and Reason) from slurm.conf file after executing "scontrol reconfig"
or restarting slurmctld *without* the -R option.
OPERATION ACTION
slurmctld -R Recover all job, node and partition state
slurmctld Recover job state, recreate node and partition state
slurmctld -c Recover no jobs, recreate node and partition state
SIGHUP to slurmctld Preserve all job, node and partition state
scontrol reconfig Preserve job state, recreate node and partition state
Old logic preserved node Feature plus partition state after "slurmctld" or
"scontrol reconfig" rather than recreating it from slurm.conf. Node Weight
was formerly always recreated from slurm.conf.
* SLURM commands (squeue, sinfo, sview, etc...) can now operate between
clusters. Jobs can also be submitted with sbatch to other cluster(s) with the
job routed to the one cluster expected to initiated the job first.
* Accounting through the SlurmDBD with the MySQL plugin can now support
a default account and wckey per cluster.
* For architectures where the slurmd daemon executes on front end nodes (Cray
and BlueGene systems) more than one slurmd daemon may be executed using more
than one front end node for improved fault-tolerance and performance
CONFIGURATION FILE CHANGES (see "man slurm.conf" for details)
=============================================================
* A hash of the slurm.conf running on each node in the cluster is sent when
registering with the slurmctld so it can verify the slurm.conf is the same
as the one it is running. If not an error message is displayed. To
silence this message add NO_CONF_HASH to DebugFlags in your slurm.conf.
* Added VSizeFactor to enforce virtual memory limits for jobs and job steps as
a percentage of their real memory allocation.
* Added new option for SelectTypeParameters of CR_ONE_TASK_PER_CORE. This
option will allocate one task per core by default. Without this option,
by default one task will be allocated per thread on nodes with more than
one ThreadsPerCore configured (i.e. no change in behavior without this
option).
* Add new configuration parameters GroupUpdateForce and GroupUpdateTime. These
control when slurmctld updates its information of which users are in the
groups allowed to use partitions. NOTE: There is no change in the default
behavior.
* Added new configuration parameters SlurmSchedLogFile and SlurmSchedLogLevel
to support writing scheduling events to a separate log file.
* Added new configuration parameter JobSubmitPlugins which provides a mechanism
to set default job parameters or perform other site-configurable actions at
job submit time. Site-specific job submission plugins may be written either C
or LUA.
* MaxJobCount changed from 16-bit to 32-bit field. The default MaxJobCount was
changed from 5,000 to 10,000.
* Added support for a PropagatePrioProcess configuration parameter value of 2
to restrict spawned task nice values to that of the slurmd daemon plus 1.
This insures that the slurmd daemon always have a higher scheduling priority
than spawned tasks. Also added support in slurmctld, slurmd and slurmdbd for
option of "-n <value>" to reset the daemon's nice value.
* Support has been added for the allocation of generic resources (GRES). A
new configuration parameter, GresPlugins, has been added along with a node-
specific parameter, Gres. There is also a gres.conf file to be configured on
each node. For more information, see the web page
https://computing.llnl.gov/linux/slurm/gang_scheduling.html
Support for enforcement of these allocations using Linux CGroup will be
provided in a later release.
* Added support for new partition states of DRAIN (run queued jobs, but accept
no new jobs) and INACTIVE (do not accept or run any more jobs) and new
partition option of "Alternate" (alternate partition to use for jobs
submitted to partitions that are currently in a state of DRAIN or INACTIVE).
* Added the ability to configure PreemptMode on a per-partition or per-QOS
basis.
* Modified the meaning of InactiveLimit slightly. It will now cancel the job
allocation created using the salloc or srun command if those commands cease
responding for the InactiveLimit regardless of any running job steps. This
parameter will no longer effect jobs spawned using sbatch.
* Added SchedulerParameters option of bf_window to control how far into the
future that the backfill scheduler will look when considering jobs to start.
The default value is one day.
* Added the ability to specify a range of ports in the SlurmctldPort parameter
for better handling of high bursts of RPCs (e.g. "SlurmctldPort=1234-1237").
COMMAND CHANGES (see man pages for details)
===========================================
* sinfo -R now has the user and timestamp in separate fields from the reason.
* Job submission commands (salloc, sbatch and srun) have a new option,
--time-min, that permits the job's time limit to be reduced to the extent
required to start early through backfill scheduling with the minimum value
as specified.
* scontrol now has the ability to change a job step's time limit.
* scontrol now has the ability to shrink a job's size. Use a command of
"scontrol update JobId=# NumNodes=#" or
"scontrol update JobId=# NodeList=<names>". This command generates a script
to be executed in order to reset SLURM environment variables for proper
execution of subsequent job steps.
* We have given Operators, Administrators, and bank account Coordinators (as
defined in the SLURM database) the ability to invoke commands that view/modify
user jobs and reservations. Previously, one had to be root to invoke
"scontrol update JobId" for example. In addition, Administrators have the
ability to view/modify node and partition info without having to become root.
For moredetails, see AUTHORIZATION section of the man pages for the
following commands: scontrol, scancel and sbcast.
* In order to support more than one front end node, new parameters have been
added to support a new data structure: FrontendName, FrontendAddr, Port,
State and Reason.
* Users can hold and release their own jobs. Submit in held state using srun
or sbatch --hold or -H options. Hold after submission using the command
"scontrol hold <jobid>". Release with "scontrol release <jobid>". Users can
not release jobs held by a system administrator unless the adminstrator uses
the command "scontrol uhold <jobid>" ("uhold" for "user hold").
* DebugFlags of Frontend added
* Add support for slurmctld and slurmd option of "-n <value>" to reset the
daemon's nice value.
* srun's --core option has been removed. Use the SPANK "Core" plugin from
http://code.google.com/p/slurm-spank-plugins/ for continued support.
* Added salloc and sbatch option --wait-for-nodes. If set non-zero, job
initiation will be delayed until all allocated nodes have booted. Salloc
will log the delay with the messages "Waiting for nodes to boot" and "Nodes
are ready for use".
* Added scontrol "wait_job <job_id>" option to wait for nodes to boot as needed.
Useful for batch jobs (in Prolog, PrologSlurmctld or the script) if powering
down idle nodes.
* Modified sview to display database configuration and add/remove visible tabs.
* Modified sview to save default configuration in .slurm/sviewrc file.
Default setting can be set by using the menus Options->Set Default Settings
or typing Ctrl-S.
COMMAND CHANGES (see man pages for details)
===========================================
* scontrol has the ability to get and set front end node state.
* Modified select/cons_res plugin so that if MaxMemPerCPU is configured and a
job specifies it's memory requirement, then more CPUs than requested will
automatically be allocated to a job to honor the MaxMemPerCPU parameter.
BLUEGENE SPECIFIC CHANGES
=========================
OTHER CHANGES
=============
* Added support for a default account and wckey per cluster within accounting.
* Added support for several new trigger types: SlurmDBD failure/restart,
Database failure/restart, Slurmctld failure/restart.
* Support has been added for TotalView to attach to a subset of launched tasks
instead of requiring that all tasks be attached to. This is the default
behavior unless an option of "--enable-partial-attach=no" be passed to the
configure (build) script.
* A web application (chart_stats.cgi) has been added that invokes sreport to
retrieve from the accounting storage db a user's request for job usage or
machine utilization statistics and charts the results to a browser.
* Much functionality has been added to account_storage/pgsql. The plugin
is still in a very beta state.
* SLURM's PMI library (for MPICH2) has been modified to properly execute an
executable program stand-alone (single MPI task launched without srun).
* The PMI was also modified to use more socket connections for better
scalability and to clear state between job step invocations.
* Added support for spank_get_item() to get S_STEP_ALLOC_CORES and
S_STEP_ALLOC_MEM. Support will remain for S_JOB_ALLOC_CORES and
S_JOB_ALLOC_MEM.
* Changed error message from "Requested time limit exceeds partition limit"
to "Requested time limit is invalid (exceeds some limit)". The error can be
triggered by a time limit exceeding the user/bank limit or the time-min
exceeding the job or partition's time limit.
* Added proctrack/cgroup plugin which uses Linux control groups (aka cgroup) to
track processes on Linux systems with this feature (kernel >= 2.6.24).
* Added the derived_ec (exit code) member to job_info_t. exit_code captures
the exit code of the job script (or salloc) while derived_ec contains the
highest exit code of all the job steps.
* Added the derived exit code and derived exit string fields to the database's
job record. Both can be modified by the user after the job completes. See
job_exit_code.html
API CHANGES
===========
Changed members of the following structs
========================================
job_info_t
num_procs -> num_cpus
job_min_cpus -> pn_min_cpus
job_min_memory -> pn_min_memory
job_min_tmp_disk -> pn_min_tmp_disk
min_sockets -> sockets_per_node
min_cores -> cores_per_socket
min_threads -> threads_per_core
job_desc_msg_t
num_procs -> min_cpus
job_min_cpus -> pn_min_cpus
job_min_memory -> pn_min_memory
job_min_tmp_disk -> pn_min_tmp_disk
min_sockets -> sockets_per_node
min_cores -> cores_per_socket
min_threads -> threads_per_core
partition_info_t
state_up (new states added PARTITION_DRAIN and PARTITION_INACTIVE)
default_part -> flags (as PART_FLAG_DEFAULT flag)
disable_root_jobs -> flags (as PART_FLAG_NO_ROOT flag)
hidden -> flags (as PART_FLAG_HIDDEN flag)
root_only -> flags (as PART_FLAG_ROOT_ONLY flag)
slurm_step_ctx_params_t
node_count -> min_nodes
slurm_ctl_conf_t
cache_groups -> group_info (as GROUP_CACHE flag)
Added the following struct definitions
======================================
block_info_t (BlueGene-specific information)
reason
front_end_info_msg_t entirely new structure
front_end_info_t entirely new structure
job_info_t
derived_ec
gres
max_cpus
resize_time
show_flags
time_min
job_desc_msg_t
gres
max_cpus
time_min
wait_all_nodes
job_step_info_t
gres
node_info_t
boot_time
gres
reason_time
reason_uid
slurmd_start_time
partition_info_t
alternate
flags
preempt_mode
slurm_ctl_conf_t
gres_plugins
group_info
hash_val
job_submit_plugins
sched_logfile
sched_log_level
slurmctld_port_count
vsize_factor
slurm_step_ctx_params_t
features
gres
max_nodes
update_node_msg_t
gres
preempt_mode
reason_uid
batch_host name of the host running the batch script
slurm_step_layout
front_end name of front end host running the step
update_front_end_msg_t entirely new structure
Changed the following enums
===========================
NONE
DEBUG_FLAG_FRONT_END added DebugFlags of Frontend
TRIGGER_RES_TYPE_FRONT_END added trigger for frontend state changes
Added the following API's
=========================
slurm_checkpoint_requeue()
slurm_init_update_step_msg()
slurm_job_step_get_pids()
slurm_job_step_pids_free()
slurm_job_step_pids_response_msg_free()
slurm_job_step_stat()
slurm_job_step_stat_free()
slurm_job_step_stat_response_msg_free()
slurm_list_append()
slurm_list_count()
slurm_list_create()
slurm_list_destroy()
slurm_list_find()
slurm_list_is_empty()
slurm_list_iterator_create()
slurm_list_iterator_reset()
slurm_list_iterator_destroy()
slurm_list_next()
slurm_list_sort()
slurm_set_schedlog_level()
slurm_step_launch_fwd_wake()
slurm_update_step()
slurm_free_front_end_info_msg free front end state information
slurm_init_update_front_end_msg initialize data structure for front end update
slurm_load_front_end() load front end state information
slurm_print_front_end_info_msg print all front end state information
slurm_print_front_end_table print state information for one front end node
slurm_sprint_front_end_table output state information for one front end node
slurm_update_front_end update state of front end node
Changed the following API's
===========================
slurm_load_block_info(): Added show_flag parameter
LLNL CHAOS-SPECIFIC RELEASE NOTES FOR SLURM VERSION 2.2
1 December 2010
LLNL CHAOS-SPECIFIC RELEASE NOTES FOR SLURM VERSION 2.3
3 January 2011
This lists only the most significant changes from SLURM v2.1 to v2.2
This lists only the most significant changes from SLURM v2.2 to v2.3
with respect to Chaos systems. See the file RELEASE_NOTES for a more
complete description of changes.
Mostly for system administrators:
* SLURM version 2.2 is able to read version 2.1 state files and preserve all
running and pending state. SLURM version 2.1 is *not* able to use state save
files generated by version 2.2, so this is a non-reversible transition.
* Added new configuration parameter JobSubmitPlugins which provides a mechanism
to set default job parameters or perform other site-configurable actions at
job submit time. Site-specific job submission plugins may be written either C
or LUA.
* We have given Operators, Administrators, and bank account Coordinators (as
defined in the SLURM database) the ability to invoke commands that view/modify
user jobs and reservations. Previously, one had to be root to invoke
"scontrol update JobId" for example. In addition, Administrators have the
ability to view/modify node and partition info without having to become root.
For more details, see AUTHORIZATION section of the man pages for the
following commands: scontrol, scancel and sbcast.
Mostly for users:
* Job submission commands (salloc, sbatch and srun) have a new option,
--time-min, that permits the job's time limit to be reduced to the extent
required to start early through backfill scheduling with the minimum value
as specified.
* Support has been added for TotalView to attach to a subset of launched tasks
instead of requiring that all tasks be attached to.
* scontrol now has the ability to shrink a job's size. Use a command of
"scontrol update JobId=# NumNodes=#" or
"scontrol update JobId=# NodeList=<names>". This command generates a script
to be executed in order to reset SLURM environment variables for proper
execution of subsequent job steps.
* Users can hold and release their own jobs. Submit in held state using srun
or sbatch --hold or -H options. Hold after submission using the command
"scontrol hold <jobid>". Release with "scontrol release <jobid>". Users can
not release jobs held by system administrator.
* Added support for a default account and wckey per cluster within accounting.
* SLURM commands (squeue, sinfo, sbatch, etc...) can now operate between
clusters. Jobs can also be submitted with sbatch to other cluster(s) with the
job routed to the one cluster expected to initiated the job first. This
functionality relies upon the SlurmDBD (SLURM DataBase Daemon) to provide
communication information (address and port) for a command to locate the
SLURM control daemon (slurmctld) on other clusters.
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment