From d81cb04a3f94dc6512a0177e979e53dbb043107b Mon Sep 17 00:00:00 2001 From: Danny Auble <da@llnl.gov> Date: Wed, 6 Jan 2010 00:20:06 +0000 Subject: [PATCH] cleared release notes for new version --- RELEASE_NOTES | 149 +++++---------------------------------------- RELEASE_NOTES_LLNL | 36 +++-------- 2 files changed, 22 insertions(+), 163 deletions(-) diff --git a/RELEASE_NOTES b/RELEASE_NOTES index 429ca726c32..b2f4e20ae81 100644 --- a/RELEASE_NOTES +++ b/RELEASE_NOTES @@ -1,17 +1,17 @@ -RELEASE NOTES FOR SLURM VERSION 2.1 +RELEASE NOTES FOR SLURM VERSION 2.2 05 January 2010 (through SLURM 2.1.0) IMPORTANT NOTE: -SLURM state files in version 2.1 are different from those of version 2.0. -After installing SLURM version 2.1, plan to restart without preserving -jobs or other state information. While SLURM version 2.0 is still running, +SLURM state files in version 2.2 are different from those of version 2.1. +After installing SLURM version 2.2, plan to restart without preserving +jobs or other state information. While SLURM version 2.1 is still running, cancel all pending and running jobs (e.g. "scancel --state=pending; scancel --state=running"). Then stop and restart daemons with the "-c" option or use "/etc/init.d/slurm startclean". If using the slurmdbd (SLURM DataBase Daemon) you must update this first. -The 2.1 slurmdbd will work with SLURM daemons at version 2.0.0 and above. +The 2.2 slurmdbd will work with SLURM daemons at version 1.3.0 and above. You will not need to update all clusters at the same time, but it is very important to update slurmdbd first and having it running before updating any other clusters making use of it. No real harm will come from updating @@ -24,77 +24,12 @@ doc/html/configurator.html that comes with the distribution. HIGHLIGHTS -* The sched/gang plugin has been removed. The logic is now directly within the - slurmctld daemon so that gang scheduling and/or job preemption can be - performed with a backfill scheduler. -* Preempted jobs can now be canceled, checkpointed or requeued rather than - only suspended. -* Support for QOS (Quality Of Service) has been added to the accounting - database with configurable limits, priority and preemption rules. -* Added "--signal=<int>@<time>" option to salloc, sbatch and srun commands to - notify programs before reaching the end of their time limit. -* Added squeue option "--start" to report expected start time of pending jobs. - The times are only set if the backfill scheduler is in use. -* The pam_slurm Pluggable Authentication Module for SLURM previously - distributed separately has been moved within the main SLURM distribution - and is packaged as a separate RPM. -* Support has been added for OpenSolaris. -* Added environment variable support to sattach, salloc, sbatch and srun - to permit user control over exit codes so application exit codes can be - distinguished from those generated by SLURM. SLURM_EXIT_ERROR specifies the - exit code when a SLURM error occurs. SLURM_EXIT_IMMEDIATE specifies the - exit code when the --immediate option is specified and resources are not - available. Any other non-zero exit code would be that of the application - run by SLURM. CONFIGURATION FILE CHANGES (see "man slurm.conf" for details) -* Added PreemptType parameter to specify the plugin used to identify - preemptable jobs (partition priority or quality of service) and - PreemptionMode to identify how to preempt jobs (requeue, cancel, checkpoint, - or suspend). Default is no job preemption. -* The sched/gang plugin has be removed, use PreemptType=preempt/partition_prio - and PreemptMode=suspend,gang. -* ControlMachine changed to accept multiple comma-separated hostnames for - support of some high-availability architectures. -* Added MaxTasksPerNode to control how many tasks that the slurmd can launch. - The default value is 128 (same as Slurm v2.0 value). -* Removed SrunIOTimeout parameter. It now uses MessageTimeout value. -* Added SchedulerParameters option of "max_job_bf=#" to control how far down - the queue of pending jobs that SLURM searches in an attempt backfill - schedule them. The default value is 50 jobs. COMMAND CHANGES (see man pages for details) -* Added "sacctmgr show problems" command to display problems in the accounting - database (e.g. accounts with no users, users with no UID, etc.). -* Several redundant squeue output and sorting options have been removed: - "%o" (use %D"), "%b" (use "%S"), "%X", %Y, and "%Z" (use "%z"). -* Standardized on the use of the '-Q' flag for all commands that offer the - --quiet option. -* salloc's --wait=<secs> option deprecated by --immediate=<secs> option to - match the srun command. -* Scalability of sview dramatically improved. -* Added reservation flag of "OVERLAP" to permit a new reservation to use - nodes already in another reservation. -* Added to sacct the ability to use --format NAME%LENGTH similar to sacctmgr. -* For salloc, sbatch and srun commands: - Ignore _maximum_ values for --sockets-per-node, --cores-per-socket and - --threads-per-core options. - Remove --mincores, --minsockets, --minthreads options (map them to - minimum values of -sockets-per-node, --cores-per-socket and - --threads-per-core for now). - Changed the single character option for dependency from "-P" to the - more intuitive, "-d". This obsoletes the use of srun -d to set a - slurmd debug level. Use srun --slurmd-debug instead. -* Changed "scontrol show job" command: - ReqProcs (number of processors requested) is replaced by NumCPUs - (number of cpus requested or actually allocated) - ReqNodes (number of nodes requested) is replaced by NumNodes - (number of nodes requested or actually allocated). - Added a --detail option to "scontrol show job" to display the - cpu/memory allocation information on a node-by-node basis. - Reorganized the output into functional groupings. BLUEGENE SPECIFIC CHANGES @@ -111,76 +46,20 @@ BLUEGENE SPECIFIC CHANGES API CHANGES * General changes: - Replaced use of the term "procs" with "cpus" - Eliminated min/max specifications for sockets/cores/threads + * Changed the following struct definitions: - allocation_msg_thread_t - jobacctinfo_t - select_jobinfo_t - slurm_cred_t - switch_jobinfo_t + * Added the following struct definitions: - block_info_msg_t - block_info_t - job_resources_t - job_sbcast_cred_msg - sbcast_cred_t -* Renamed select_job_res_t to select_nodeinfo_t + +* Renamed + * Changed members of the following structs: - job_descriptor - job_step_info_t - node_info - node_info_msg - partition_info - reserve_info - resv_desc_msg - slurm_ctl_conf - slurm_step_ctx_params_t - slurm_step_launch_params_t -* Changed members of the job_info struct - Note that cpu_count_reps, cpus_per_node, and num_cpu_groups were moved - to new job_resources struct + * Changed the following enums - job_state_reason - node_states - select_print_mode -* Added the select_nodedata_type enum -* Renamed the select_data_type enum to select_jobdata_type + * Added the following API's - slurm_ctl_conf_2_key_pairs() - slurm_free_block_info_msg() - slurm_free_sbcast_cred_msg() - slurm_get_select_nodeinfo() - slurm_init_update_block_msg() - slurm_job_cpus_allocated_on_node() - slurm_job_cpus_allocated_on_node_id() - slurm_job_node_ready() - slurm_load_block_info() - slurm_print_block_info() - slurm_print_block_info_msg() - slurm_sbcast_lookup() - slurm_sprint_block_info() + * Changed the following API's - slurm_jobinfo_ctx_get() - slurm_load_job() - slurm_load_jobs() - slurm_print_node_table() - slurm_sprint_node_table() + OTHER CHANGES -* A mechanism has been added for SPANK plugins to set environment variables - for Prolog, Epilog, PrologSLurmctld and EpilogSlurmctld programs using the - functions spank_get_job_env, spank_set_job_env, and spank_unset_job_env. See - "man spank" for more information. -* Set a node's power_up/configuring state flag while PrologSlurmctld is - running for a job allocated to that node. -* Added sched/wiki2 (Moab) JOBMODIFY command support for VARIABLELIST option - to set supplemental environment variables for pending batch jobs. -* The RPM previously named "slurm-aix-federation-<version>.rpm" has been - renamed to just "slurm-aix-<version>.rpm" (the federation switch plugin may - not be present). -* Environment variables SLURM_TOPOLOGY_ADDR and SLURM_TOPOLOGY_ADDR_PATTERN - added to describe the network topology for each launched task when - TopologyType=topology/tree is configured -* Add new job wait reason, ReqNodeNotAvail: Required node is not available - (down or drained). diff --git a/RELEASE_NOTES_LLNL b/RELEASE_NOTES_LLNL index 5f744e6698f..5bc30b8c689 100644 --- a/RELEASE_NOTES_LLNL +++ b/RELEASE_NOTES_LLNL @@ -1,37 +1,17 @@ -LLNL CHAOS-SPECIFIC RELEASE NOTES FOR SLURM VERSION 2.1 -16 October 2009 +LLNL CHAOS-SPECIFIC RELEASE NOTES FOR SLURM VERSION 2.2 +05 January 2010 -This lists only the most significant changes from SLURM v2.0 to v2.1 +This lists only the most significant changes from SLURM v2.1 to v2.2 with respect to Chaos systems. See the file RELEASE_NOTES for other changes. For system administrators: -* The pam_slurm Pluggable Authentication Module for SLURM previously - distributed separately has been moved within the main SLURM distribution - and is packaged as a separate RPM. -* Added command "sacctmgr show problems" to display problems in the accounting - database (e.g. accounts with no users, users with no UID, etc.). -* Completely disable logging of sched/wiki and sched/wiki2 (Maui & Moab) - message traffic unless DebugFlag=Wiki is configured. Mostly for users: -* Added -"-signal=<int>@<time>" option to salloc, sbatch and srun commands to - notify programs before reaching the end of their time limit. -* Added a --detail option to "scontrol show job" to display the cpu/memory - allocation informaton on a node-by-node basis. -* Add new job wait reason, ReqNodeNotAvail: Required node is not available - (down or drained). -* Added environment variable support to sattach, salloc, sbatch and srun - to permit user control over exit codes so application exit codes can be - distiguished from those generated by SLURM. SLURM_EXIT_ERROR specifies the - exit code when a SLURM error occurs. SLURM_EXIT_IMMEDIATE specifies the - exit code when the --immediate option is specified and resources are not - available. Any other non-zero exit code would be that of the application - run by SLURM. - -SLURM state files in version 2.1 are different from those of version 2.1. -After installing SLURM version 2.1, plan to restart without preserving -jobs or other state information. While SLURM version 2.0 is still running, + +SLURM state files in version 2.2 are different from those of version 2.1. +After installing SLURM version 2.2, plan to restart without preserving +jobs or other state information. While SLURM version 2.1 is still running, cancel all pending and running jobs (e.g. -"scancel --state=pending; scancel --state=running"). Then stop and restart +"scancel --state=pending; scancel --state=running"). Then stop and restart daemons with the "-c" option or use "/etc/init.d/slurm startclean". -- GitLab