From 4e8d01fd977dcab4be2dda4ce4aac272fac4df44 Mon Sep 17 00:00:00 2001 From: Moe Jette <jette1@llnl.gov> Date: Mon, 3 Jan 2011 19:51:46 +0000 Subject: [PATCH] major mods to start RELEASE_NOTES for slurm v2.3 --- RELEASE_NOTES | 328 +++++---------------------------------------- RELEASE_NOTES_LLNL | 51 +------ 2 files changed, 39 insertions(+), 340 deletions(-) diff --git a/RELEASE_NOTES b/RELEASE_NOTES index 15fd5105baf..1c91bfaf168 100644 --- a/RELEASE_NOTES +++ b/RELEASE_NOTES @@ -1,10 +1,10 @@ -RELEASE NOTES FOR SLURM VERSION 2.2 -1 December 2010 +RELEASE NOTES FOR SLURM VERSION 2.3 +3 January 2011 IMPORTANT NOTE: If using the slurmdbd (SLURM DataBase Daemon) you must update this first. -The 2.2 slurmdbd will work with SLURM daemons of version 2.1.3 and above. +The 2.3 slurmdbd will work with SLURM daemons of version 2.1.3 and above. You will not need to update all clusters at the same time, but it is very important to update slurmdbd first and having it running before updating any other clusters making use of it. No real harm will come from updating @@ -18,335 +18,79 @@ innodb_buffer_pool_size=64M under the [mysqld] reference and restarting the mysqld. This is needed when converting large tables over to the new database schema. -SLURM can be upgraded from version 2.1 to version 2.2 without loss of jobs or +SLURM can be upgraded from version 2.2 to version 2.3 without loss of jobs or other state information. HIGHLIGHTS ========== -* Slurmctld restart/reconfiguration operations have been altered. - NOTE: There will be no change in behavior unless partition configuration - or node Features/Weight are altered using the scontrol command to differ - from the contents of the slurm.conf configuration file. - - Preserve current partition state information plus node Feature and Weight - state information after slurmctld receives a SIGHUP signal or is restarted - with the -R option. Recreate partition plus node information (except node - State and Reason) from slurm.conf file after executing "scontrol reconfig" - or restarting slurmctld *without* the -R option. - - OPERATION ACTION - slurmctld -R Recover all job, node and partition state - slurmctld Recover job state, recreate node and partition state - slurmctld -c Recover no jobs, recreate node and partition state - SIGHUP to slurmctld Preserve all job, node and partition state - scontrol reconfig Preserve job state, recreate node and partition state - - Old logic preserved node Feature plus partition state after "slurmctld" or - "scontrol reconfig" rather than recreating it from slurm.conf. Node Weight - was formerly always recreated from slurm.conf. - -* SLURM commands (squeue, sinfo, sview, etc...) can now operate between - clusters. Jobs can also be submitted with sbatch to other cluster(s) with the - job routed to the one cluster expected to initiated the job first. - -* Accounting through the SlurmDBD with the MySQL plugin can now support - a default account and wckey per cluster. +* For architectures where the slurmd daemon executes on front end nodes (Cray + and BlueGene systems) more than one slurmd daemon may be executed using more + than one front end node for improved fault-tolerance and performance + CONFIGURATION FILE CHANGES (see "man slurm.conf" for details) ============================================================= -* A hash of the slurm.conf running on each node in the cluster is sent when - registering with the slurmctld so it can verify the slurm.conf is the same - as the one it is running. If not an error message is displayed. To - silence this message add NO_CONF_HASH to DebugFlags in your slurm.conf. - -* Added VSizeFactor to enforce virtual memory limits for jobs and job steps as - a percentage of their real memory allocation. - -* Added new option for SelectTypeParameters of CR_ONE_TASK_PER_CORE. This - option will allocate one task per core by default. Without this option, - by default one task will be allocated per thread on nodes with more than - one ThreadsPerCore configured (i.e. no change in behavior without this - option). - -* Add new configuration parameters GroupUpdateForce and GroupUpdateTime. These - control when slurmctld updates its information of which users are in the - groups allowed to use partitions. NOTE: There is no change in the default - behavior. - -* Added new configuration parameters SlurmSchedLogFile and SlurmSchedLogLevel - to support writing scheduling events to a separate log file. - -* Added new configuration parameter JobSubmitPlugins which provides a mechanism - to set default job parameters or perform other site-configurable actions at - job submit time. Site-specific job submission plugins may be written either C - or LUA. - -* MaxJobCount changed from 16-bit to 32-bit field. The default MaxJobCount was - changed from 5,000 to 10,000. - -* Added support for a PropagatePrioProcess configuration parameter value of 2 - to restrict spawned task nice values to that of the slurmd daemon plus 1. - This insures that the slurmd daemon always have a higher scheduling priority - than spawned tasks. Also added support in slurmctld, slurmd and slurmdbd for - option of "-n <value>" to reset the daemon's nice value. - -* Support has been added for the allocation of generic resources (GRES). A - new configuration parameter, GresPlugins, has been added along with a node- - specific parameter, Gres. There is also a gres.conf file to be configured on - each node. For more information, see the web page - https://computing.llnl.gov/linux/slurm/gang_scheduling.html - Support for enforcement of these allocations using Linux CGroup will be - provided in a later release. - -* Added support for new partition states of DRAIN (run queued jobs, but accept - no new jobs) and INACTIVE (do not accept or run any more jobs) and new - partition option of "Alternate" (alternate partition to use for jobs - submitted to partitions that are currently in a state of DRAIN or INACTIVE). - -* Added the ability to configure PreemptMode on a per-partition or per-QOS - basis. - -* Modified the meaning of InactiveLimit slightly. It will now cancel the job - allocation created using the salloc or srun command if those commands cease - responding for the InactiveLimit regardless of any running job steps. This - parameter will no longer effect jobs spawned using sbatch. - -* Added SchedulerParameters option of bf_window to control how far into the - future that the backfill scheduler will look when considering jobs to start. - The default value is one day. - -* Added the ability to specify a range of ports in the SlurmctldPort parameter - for better handling of high bursts of RPCs (e.g. "SlurmctldPort=1234-1237"). - -COMMAND CHANGES (see man pages for details) -=========================================== -* sinfo -R now has the user and timestamp in separate fields from the reason. - -* Job submission commands (salloc, sbatch and srun) have a new option, - --time-min, that permits the job's time limit to be reduced to the extent - required to start early through backfill scheduling with the minimum value - as specified. - -* scontrol now has the ability to change a job step's time limit. - -* scontrol now has the ability to shrink a job's size. Use a command of - "scontrol update JobId=# NumNodes=#" or - "scontrol update JobId=# NodeList=<names>". This command generates a script - to be executed in order to reset SLURM environment variables for proper - execution of subsequent job steps. - -* We have given Operators, Administrators, and bank account Coordinators (as - defined in the SLURM database) the ability to invoke commands that view/modify - user jobs and reservations. Previously, one had to be root to invoke - "scontrol update JobId" for example. In addition, Administrators have the - ability to view/modify node and partition info without having to become root. - For moredetails, see AUTHORIZATION section of the man pages for the - following commands: scontrol, scancel and sbcast. +* In order to support more than one front end node, new parameters have been + added to support a new data structure: FrontendName, FrontendAddr, Port, + State and Reason. -* Users can hold and release their own jobs. Submit in held state using srun - or sbatch --hold or -H options. Hold after submission using the command - "scontrol hold <jobid>". Release with "scontrol release <jobid>". Users can - not release jobs held by a system administrator unless the adminstrator uses - the command "scontrol uhold <jobid>" ("uhold" for "user hold"). +* DebugFlags of Frontend added -* Add support for slurmctld and slurmd option of "-n <value>" to reset the - daemon's nice value. -* srun's --core option has been removed. Use the SPANK "Core" plugin from - http://code.google.com/p/slurm-spank-plugins/ for continued support. - -* Added salloc and sbatch option --wait-for-nodes. If set non-zero, job - initiation will be delayed until all allocated nodes have booted. Salloc - will log the delay with the messages "Waiting for nodes to boot" and "Nodes - are ready for use". - -* Added scontrol "wait_job <job_id>" option to wait for nodes to boot as needed. - Useful for batch jobs (in Prolog, PrologSlurmctld or the script) if powering - down idle nodes. - -* Modified sview to display database configuration and add/remove visible tabs. - -* Modified sview to save default configuration in .slurm/sviewrc file. - Default setting can be set by using the menus Options->Set Default Settings - or typing Ctrl-S. +COMMAND CHANGES (see man pages for details) +=========================================== +* scontrol has the ability to get and set front end node state. -* Modified select/cons_res plugin so that if MaxMemPerCPU is configured and a - job specifies it's memory requirement, then more CPUs than requested will - automatically be allocated to a job to honor the MaxMemPerCPU parameter. BLUEGENE SPECIFIC CHANGES ========================= + OTHER CHANGES ============= -* Added support for a default account and wckey per cluster within accounting. - -* Added support for several new trigger types: SlurmDBD failure/restart, - Database failure/restart, Slurmctld failure/restart. - -* Support has been added for TotalView to attach to a subset of launched tasks - instead of requiring that all tasks be attached to. This is the default - behavior unless an option of "--enable-partial-attach=no" be passed to the - configure (build) script. - -* A web application (chart_stats.cgi) has been added that invokes sreport to - retrieve from the accounting storage db a user's request for job usage or - machine utilization statistics and charts the results to a browser. - -* Much functionality has been added to account_storage/pgsql. The plugin - is still in a very beta state. - -* SLURM's PMI library (for MPICH2) has been modified to properly execute an - executable program stand-alone (single MPI task launched without srun). - -* The PMI was also modified to use more socket connections for better - scalability and to clear state between job step invocations. - -* Added support for spank_get_item() to get S_STEP_ALLOC_CORES and - S_STEP_ALLOC_MEM. Support will remain for S_JOB_ALLOC_CORES and - S_JOB_ALLOC_MEM. - -* Changed error message from "Requested time limit exceeds partition limit" - to "Requested time limit is invalid (exceeds some limit)". The error can be - triggered by a time limit exceeding the user/bank limit or the time-min - exceeding the job or partition's time limit. - -* Added proctrack/cgroup plugin which uses Linux control groups (aka cgroup) to - track processes on Linux systems with this feature (kernel >= 2.6.24). - -* Added the derived_ec (exit code) member to job_info_t. exit_code captures - the exit code of the job script (or salloc) while derived_ec contains the - highest exit code of all the job steps. - -* Added the derived exit code and derived exit string fields to the database's - job record. Both can be modified by the user after the job completes. See - job_exit_code.html API CHANGES =========== + Changed members of the following structs ======================================== -job_info_t - num_procs -> num_cpus - job_min_cpus -> pn_min_cpus - job_min_memory -> pn_min_memory - job_min_tmp_disk -> pn_min_tmp_disk - min_sockets -> sockets_per_node - min_cores -> cores_per_socket - min_threads -> threads_per_core - -job_desc_msg_t - num_procs -> min_cpus - job_min_cpus -> pn_min_cpus - job_min_memory -> pn_min_memory - job_min_tmp_disk -> pn_min_tmp_disk - min_sockets -> sockets_per_node - min_cores -> cores_per_socket - min_threads -> threads_per_core - -partition_info_t - state_up (new states added PARTITION_DRAIN and PARTITION_INACTIVE) - default_part -> flags (as PART_FLAG_DEFAULT flag) - disable_root_jobs -> flags (as PART_FLAG_NO_ROOT flag) - hidden -> flags (as PART_FLAG_HIDDEN flag) - root_only -> flags (as PART_FLAG_ROOT_ONLY flag) - -slurm_step_ctx_params_t - node_count -> min_nodes - -slurm_ctl_conf_t - cache_groups -> group_info (as GROUP_CACHE flag) Added the following struct definitions ====================================== -block_info_t (BlueGene-specific information) - reason +front_end_info_msg_t entirely new structure + +front_end_info_t entirely new structure job_info_t - derived_ec - gres - max_cpus - resize_time - show_flags - time_min - -job_desc_msg_t - gres - max_cpus - time_min - wait_all_nodes - -job_step_info_t - gres - -node_info_t - boot_time - gres - reason_time - reason_uid - slurmd_start_time - -partition_info_t - alternate - flags - preempt_mode - -slurm_ctl_conf_t - gres_plugins - group_info - hash_val - job_submit_plugins - sched_logfile - sched_log_level - slurmctld_port_count - vsize_factor - -slurm_step_ctx_params_t - features - gres - max_nodes - -update_node_msg_t - gres - preempt_mode - reason_uid + batch_host name of the host running the batch script + +slurm_step_layout + front_end name of front end host running the step + +update_front_end_msg_t entirely new structure Changed the following enums =========================== -NONE +DEBUG_FLAG_FRONT_END added DebugFlags of Frontend + +TRIGGER_RES_TYPE_FRONT_END added trigger for frontend state changes Added the following API's ========================= -slurm_checkpoint_requeue() -slurm_init_update_step_msg() -slurm_job_step_get_pids() -slurm_job_step_pids_free() -slurm_job_step_pids_response_msg_free() -slurm_job_step_stat() -slurm_job_step_stat_free() -slurm_job_step_stat_response_msg_free() -slurm_list_append() -slurm_list_count() -slurm_list_create() -slurm_list_destroy() -slurm_list_find() -slurm_list_is_empty() -slurm_list_iterator_create() -slurm_list_iterator_reset() -slurm_list_iterator_destroy() -slurm_list_next() -slurm_list_sort() -slurm_set_schedlog_level() -slurm_step_launch_fwd_wake() -slurm_update_step() +slurm_free_front_end_info_msg free front end state information +slurm_init_update_front_end_msg initialize data structure for front end update +slurm_load_front_end() load front end state information +slurm_print_front_end_info_msg print all front end state information +slurm_print_front_end_table print state information for one front end node +slurm_sprint_front_end_table output state information for one front end node +slurm_update_front_end update state of front end node Changed the following API's =========================== -slurm_load_block_info(): Added show_flag parameter diff --git a/RELEASE_NOTES_LLNL b/RELEASE_NOTES_LLNL index 207d17148ed..de8ffaf3dcb 100644 --- a/RELEASE_NOTES_LLNL +++ b/RELEASE_NOTES_LLNL @@ -1,55 +1,10 @@ -LLNL CHAOS-SPECIFIC RELEASE NOTES FOR SLURM VERSION 2.2 -1 December 2010 +LLNL CHAOS-SPECIFIC RELEASE NOTES FOR SLURM VERSION 2.3 +3 January 2011 -This lists only the most significant changes from SLURM v2.1 to v2.2 +This lists only the most significant changes from SLURM v2.2 to v2.3 with respect to Chaos systems. See the file RELEASE_NOTES for a more complete description of changes. Mostly for system administrators: -* SLURM version 2.2 is able to read version 2.1 state files and preserve all - running and pending state. SLURM version 2.1 is *not* able to use state save - files generated by version 2.2, so this is a non-reversible transition. - -* Added new configuration parameter JobSubmitPlugins which provides a mechanism - to set default job parameters or perform other site-configurable actions at - job submit time. Site-specific job submission plugins may be written either C - or LUA. - -* We have given Operators, Administrators, and bank account Coordinators (as - defined in the SLURM database) the ability to invoke commands that view/modify - user jobs and reservations. Previously, one had to be root to invoke - "scontrol update JobId" for example. In addition, Administrators have the - ability to view/modify node and partition info without having to become root. - For more details, see AUTHORIZATION section of the man pages for the - following commands: scontrol, scancel and sbcast. - Mostly for users: - -* Job submission commands (salloc, sbatch and srun) have a new option, - --time-min, that permits the job's time limit to be reduced to the extent - required to start early through backfill scheduling with the minimum value - as specified. - -* Support has been added for TotalView to attach to a subset of launched tasks - instead of requiring that all tasks be attached to. - -* scontrol now has the ability to shrink a job's size. Use a command of - "scontrol update JobId=# NumNodes=#" or - "scontrol update JobId=# NodeList=<names>". This command generates a script - to be executed in order to reset SLURM environment variables for proper - execution of subsequent job steps. - -* Users can hold and release their own jobs. Submit in held state using srun - or sbatch --hold or -H options. Hold after submission using the command - "scontrol hold <jobid>". Release with "scontrol release <jobid>". Users can - not release jobs held by system administrator. - -* Added support for a default account and wckey per cluster within accounting. - -* SLURM commands (squeue, sinfo, sbatch, etc...) can now operate between - clusters. Jobs can also be submitted with sbatch to other cluster(s) with the - job routed to the one cluster expected to initiated the job first. This - functionality relies upon the SlurmDBD (SLURM DataBase Daemon) to provide - communication information (address and port) for a command to locate the - SLURM control daemon (slurmctld) on other clusters. -- GitLab