diff --git a/NEWS b/NEWS index a1d78354e624bf0907fe73072cafcfe6f5a3f164..ef96e8ce27eb53f5e17da6bbd45fa976d13b89b4 100644 --- a/NEWS +++ b/NEWS @@ -1733,4739 +1733,3 @@ documents those changes that are of interest to users and admins. -- Replaced many calls to getpwuid() with reentrant uid_to_string() -- The slurmstepd will now refresh it's log file handle on a reconfig, previously if a log was rolled any output from the stepd was lost. - -* Changes in SLURM 2.1.0-pre9 -============================= - -- Added the "scontrol update SlurmctldDebug" as the preferred alternative to - the "scontrol setdebug" command. - -- BLUEGENE - made it so when removing a block in an error state the nodes in - the block are set correctly in accounting as not in error. - -- Fixed issue where if slurmdbd is not up qos' are set up correctly for - associations off of cache. - -- scontrol, squeue, sview all display the correct node, cpu count along with - correct corresponding nodelist on completing jobs. - -- Patch (Mark Grondona) fixes serious security vulnerability in SLURM in - the spank_job_env functionality. - -- Improve spank_job_env interface and documentation - -- Add ESPANK_NOT_LOCAL error code to spank_err_t - -- Made the #define DECAY_INTERVAL used in the priority/multifactor plugin - a slurm.conf variable (PriorityCalcPeriod) - -- Added new macro SLURM_VERSION for use in autoconf scripts to determine - current version of slurm installed on system when building against the api. - -- Patch from Matthieu Hautreux that adds an entry into the error file when - a job or step receives a TERM or KILL signal. - -- Make it so env var SLURM_SRUN_COMM_HOST is overwritten if already in - existence in the slurmd. - -* Changes in SLURM 2.1.0-pre8 -============================= - -- Rearranged the "scontrol show job" output into functional groupings - -- Change the salloc/sbatch/srun -P option to -d (dependency) - -- Removed the srun -d option; must use srun --slurmd-debug instead - -- When running the mysql plugin natively MUNGE errors are now eliminated - when sending updates to slurmctlds. - -- Check to make sure we have a default account before looking to - fill in default association. - -- Accounting - Slurmctld and slurmdbd will now set uids of users which were - created after the start of the daemons on reconfig. Slurmdbd will - attempt to set previously non-existant uids every hour. - -- Patch from Aaron Knister and Mark Grondona, to parse correctly quoted - #SBATCH options in a batch script. - -- job_desc_msg_t - in, out, err have been changed to std_in, std_out, - and std_err respectfully. Needed for PySLURM, since Python sees (in) - as a keyword. - -- Changed the type of addr to struct sockaddr_in in _message_socket_accept() - in sattach.c, step_launch.c, and allocate_msg.c, and moved the function - into a common place for all the calls since the code was very similar. - -- proctrack/lua support has been added see contribs/lua/protrack.lua - -- replaced local gtk m4 test with AM_PATH_GTK_2_0 - -- changed AC_CHECK_LIB to AC_SEARCH_LIBS to avoid extra libs in - compile lines. - -- Patch from Matthieu Hautreux to improve error message in slurmd/req.c - -- Added support for split groups from (Matthiu Hautreux CEA) - -- Patch from Mark Grondona to move blcr scripts into pkglibexecdir - -- Patch from Doug Parisek to calculate a job's projected start time under the - builtin scheduler. - -- Removed most global variables out of src/common/jobacct_common.h - -* Changes in SLURM 2.1.0-pre7 -============================= - -- BLUEGENE - make 2.1 run correctly on a real bluegene cluster - -- sacctmgr - Display better debug for when an admin specifies a non-existant - parent account when changing parent accounts. - -- Added a mechanism to the slurmd to defer the epilog from starting until - after a running prolog has finished. - -- If a node reboots inbetween checking status the node is marked down unless - ReturnToService=2 - -- Added -R option to slurmctld to recover partition state also when - restarting or reconfiguring. - -* Changes in SLURM 2.1.0-pre6 -============================= - -- When getting information about nodes in hidden partitions, return a node - name of NULL rather than returning no information about the node so that - node index information is still valid. - -- When querying database for jobs in certain state and a time period is - given only jobs in that state during the period will be returned, - previously if a time period was given in sacct jobs eligible to run or - running would be displayed, which is still the default if no states are - requested. - -- One can now query jobs based on size (nodes and or cpus) (mysql plugin only) - -- Applied patch from Mark Grondona that tests for a missing config file before - any other processing in spank_init(). This now prevents fatal errors from - being mistakenly treated as recoverable. - -- --enable-debug no longer has to be stated at configure time to have - the slurmctld or slurmstepd dump core on a seg fault. - -- Moved the errant slurm_job_node_ready() declaration from job_info.h to - slurm.h and deleted job_info.h. - -- Added the slurm_job_cpus_allocated_on_node_id() - slurm_job_cpus_allocated_on_node() API for working with the - job_resources_t structure. - -- BLUEGENE - speed up start up for systems that have many blocks (100+) - configured on the system. - -* Changes in SLURM 2.1.0-pre5 -============================= - -- Add squeue option "--start" to report expected start time of pending jobs. - -- Sched/backfill plugin modified to set expected start time of pending jobs. - -- Add SchedulerParameters option of "max_job_bf=#" to control how far down - the queue of pending jobs that SLURM searches in an attempt backfill - schedule them. The default value is 50 jobs. - -- Fixed cause of squeue -o "%C" seg fault. - -- Add -"-signal=<int>@<time>" option to salloc, sbatch and srun commands to - notify programs before reaching the end of their time limit. - -- Add scontrol option to update a running job's EndTime (also resets the - job's time limit). - -- Add new job wait reason, ReqNodeNotAvail: Required node is not available - (down or drained). - -- Log when slurmctld or slurmd are started with small core file limit. - -- Permit job's owner to change features, processor count, minimum and - maximun node counts of pending jobs (the operation was previously - restricted to user root) - -- Applied patch from Chuck Clouston for scontrol man page with clarifications - and additional info - -- Change slurm errno name from ESLURM_TOO_MANY_REQUESTED_NODES to - ESLURM_INVALID_NODE_COUNT to better reflect its meaning. - -- Fix bug in sched/backfill which could result in invalid memory reference - when trying to schedule jobs submitted with --exclude option. - -- Fix for slurmctld deadlock at startup with PreemptMode=SUSPEND,GANG. - -- Added preemption plugins to RPM. - -- Completely disable logging of sched/wiki and sched/wiki2 (Maui & Moab) - message traffic unless DebugFlag=Wiki is configured. - -- Change scontrol show job info: ReqProcs (number of processors requested) - is replaced by NumProcs (number of processors requested or actually - allocated) and ReqNodes (number of nodes requested) is replaced by - NumNodes (number of nodes requested or actually allocated). - -- Fixed issue when max nodes wasn't specified and was later set by limit - to not request that as the actual maximum. - -- Move job preemption (for requeue, checkpoint and kill modes only) out of - gang scheduling module. Make identification of preemptable jobs an argument - to the select_g_job_test function rather than calling preempt plugin from - the select plugin. Make output of srun --test-only option include a list - of preempted job IDs. - -- Better record keeping for front end systems when registering. - -- Enable memory allocation logic for jobs step (i.e. allocate resources - within the job's memory allocation and enforce limits). - -- handle error state in sinfo - -- sview and "scontrol show config" now report as SLURM_VERSION the version - of slurmctld rather than that of the command. - -- Change SuspendTime configuration parameter from 16-bits to 32-bits. - -- Add environment variable support to sattach, salloc, sbatch and srun - to permit user control over exit codes so application exit codes can be - distiguished from those generated by SLURM. SLURM_EXIT_ERROR specifies the - exit code when a SLURM error occurs. SLURM_EXIT_IMMEDIATE specifies the - exit code when the --immediate option is specified and resources are not - available. Any other non-zero exit code would be that of the application - run by SLURM. - -- Added a Quality of Service (QOS) html page. - -- In sched/wiki2, JOBWILLRUN command, add support for identification of - preemptable and preempted jobs (both new and old format of commands are - supported). - -- Remove contribs/python/hostlist files. Download the materials as needed - directly from http://www.nsc.liu.se/~kent/python-hostlist. - -- BLUEGENE - Preemption now works on bluegene systems - -- For salloc, sbatch and srun commands, ignore _maximum_ values for - --sockets-per-node, --cores-per-socket and --threads-per-core options. - Remove --mincores, --minsockets, --minthreads options (map them to - minimum values of -sockets-per-node, --cores-per-socket and - --threads-per-core for now). - -* Changes in SLURM 2.1.0-pre4 -============================= - -- Move processing of node configuration information in slurm.conf and - topology information in topology.conf from slurmctld into common and load - that information into slurmd. Use it to set environment variables for jobs - SLURM_TOPOLOGY_ADDR and SLURM_TOPOLOGY_ADDR_PATTERN describing the network - topology for each task. Based upon patch from Mattheu Hautreux (CEA). - -- Correction in computing a job's TotalProcs value when ThreadsPerCore>1 and - allocating by cores or sockets. - -* Changes in SLURM 2.1.0-pre3 -============================= - -- Removed sched/gang plugin and moved the logic directly into the slurmctld - daemon so that job preemption and gang scheduling can be used with the - sched/backfill plugin. Added configuration parameter: - PreemptMode=gang|off|suspend|cancel|requeue|checkpoint - to enable/disable gang scheduling and job preemption logic (both are - disabled by default). - (NOTE: There are some problems with memory management which could prevent a - job from starting when memory would be freed by a job being requeued or - otherwise removed, these are being worked on) - -- Added PreemptType configuration parameter to identify preemptable jobs. - Former users of SchedType=sched/gang should set SchedType=sched/backfill, - PreemptType=preempt/partition_prio and PreemptMode=gang,suspend. See - web and slurm.conf man page for other options. - PreemptType=preempt/qos uses Quality Of Service information in database. - -- In select/linear, optimize job placement across partitions. - -- If the --partition option is used with the sinfo or squeue command then - print information about even hidden partitions. - -- Replaced misc cpu allocation members in job_info_t with select_job_res_t - which will only be populated when requested (show_flags & SHOW_DETAIL) - -- Added a --detail option to "scontrol show job" to display the cpu/mem - allocation info on a node-by-node basis. - -- Added logic to give correct request uid for individual steps that - were cancelled. - -- Created a spank_get_item() option (S_JOB_ALLOC_MEM) that conveys the memory - that the select/cons_res plugin has allocated to a job. - -- BLUEGENE - blocks in error state are now handled correctly in accounting. - -- Modify squeue to print job step information about a specific job ID using - the following syntax: "squeue -j <job_id> -s". - -- BLUEGENE - scontrol delete block and update block can now remove blocks - on dynamic laid out systems. - -- BLUEGENE - Vastly improve Dynamic layout mode algorithm. - -- Address some issues for SLURM support of Solaris. - -- Applied patch from Doug Parisek (Doug.Parisek@bull.com) for speeding up - start of sview by delaying to creation of tooltips until requested. - -- Changed GtkToolTips to GtkToolTip for newer versions of GTK. - -- Applied patch from Rod Schultz (Rod.Schultz@Bull.com) that eliminates - ambiguity in the documentation over use of the terms "CPU" and "socket". - -- Modified get_resource_arg_range() to return full min/max values when input - string is null. This fixes the srun -B option to function as documented. - -- If the job, node, partition, reservation or trigger state file is missing - or too small, automatically try using the previously saved state (file - name with ".old" suffix). - -- Set a node's power_up/configuring state flag while PrologSlurmctld is - running for a job allocated to that node. - -- If PrologSlurmctld has a non-zero exit code, requeue the job or kill it. - -- Added sacct ability to use --format NAME%LENGTH similar to sacctmgr. - -- Improve hostlist logic for multidimensional systems. - -- The pam_slurm Pluggable Authentication Module for SLURM previously - distributed separately has been moved within the main SLURM distribution - and is packaged as a separate RPM. - -- Added configuration parameter MaxTasksPerNode. - -- Remove configuration parameter SrunIOTimeout. - -- Added functionality for sacctmgr show problems. Current problems include - Accounts/Users with no associations, Accounts with no users or subaccounts - attached in a cluster, and Users with No UID on the system. - -- Added new option for sacctmgr list assoc and list cluster WOLimits. This - gives a smaller default format without the limit information. This may - be the new default for list assocations and list clusters. - -- Users are now required to have an association with there default account. - Sacctmgr will now complain when you try to modify a users default account - which they are not associated anywhere. - -- Fix select/linear bug resulting in run_job_cnt underflow message if a - suspended job is cancelled. - -- Add test for fsync() error for state save files. Log and retry as needed. - -- Log fatal errors from slurmd and slurmctld to syslog. - -- Added error detection and cleanup for the case in which a compute node is - rebooted and restarts its slurmd before its "down" state is noticed. - -- BLUEGENE systems only - remove vestigal start location from jobinfo. - -- Add reservation flag of "OVERLAP" to permit a new reservation to use - nodes already in another reservation. - -- Fix so "scontrol update jobid=# nice=0" can clear previous nice value. - -- BLUEGENE - env vars such as SLURM_NNODES, SLURM_JOB_NUM_NODES, and - SLURM_JOB_CPUS_PER_NODE now reference cnode counts instead of midplane - counts. SLURM_NODELIST still references midplane names. - -- Added qos support to salloc/sbatch/srun/squeue - -- Added to scancel the ability to select jobs by account and qos - -- Recycled the "-A" argument indicate "account" for all the commands that - accept the --account argument (srun -A to allocate is no longer supported.) - -- Change sbatch response from "sbatch: Submitted batch job #" written to - stderr to "Submitted batch job #" written to stdout. - -- Made shutdown and cleanup a little safer for the mvapich and mpich1_p4 - plugins. - -- QOS support added with limits, priority and preemption - (no documentation yet). - -- If a slurmd does not have a node listed in it's slurm.conf (slurm.conf's - should be kept the same on all nodes) an error message is printed in the - slurmctld log along with the message already being printed in the slurmd - log for easier debugging. - -* Changes in SLURM 2.1.0-pre2 -============================= - -- Added support for smap to query off node name for display. - -- Slurmdbd modified to set user ID and group ID to SlurmUser if started as - user root. - -- Configuration parameter ControlMachine changed to accept multiple comma- - separated hostnames for support of some high-availability architectures. - -- ALTERED API CALL slurm_get_job_steps 0 has been changed to NO_VAL for both - job and step id to recieve all jobs/steps. Please make adjustments to - your code. - -- salloc's --wait=<secs> option deprecated by --immediate=<secs> option to - match the srun command. - -- Add new slurmctld list for node features with node bitmaps for simplified - scheduling logic. - -- Multiple features can be specified when creating a reservation. Use "&" - (AND) or "|" (OR) separators between the feature names. - -- Changed internal node name caching so that front-end mode would work with - multiple lines of node name definitions. - -- Add node state flag for power-up/configuring. Represented by "#" suffix - on the node state name (e.g. "ALLOCATED#") for command output. - -- Add CONFIGURING/CF job state flag for node power-up/configuring. - -- Modify job step cancel logic for scancel and srun (on reciept of SIGTERM - or three SIGINT) to immediately send SIGKILL to spawned tasks. Previous - logic would send SIGCONT, SIGTERM, wait KillWait seconds, SIGKILL. - -- Created a spank_get_item() option (S_JOB_ALLOC_CORES) that conveys the cpus - that the select/cons_res plugin has allocated to a job. - -- Improve sview performance (outrageously) on very large machines. - -- Add support for licenses in resource reservation. - -- BLUEGENE - Jobs waiting for a block to boot will now be in Configuring - state. - -- bit_fmt now does not return brackets surrounding any set of data. - -* Changes in SLURM 2.1.0-pre1 -============================= - -- Slurmd notifies slurmctld of node boot time to better clean up after node - reboots. - -- Slurmd sends node registration information repeatedly until successful - transmit. - -- Change job_state in job structure to dedicate 8-bits to state flags. - Added macros to get state information (IS_JOB_RUNNING(job_ptr), etc.) - -- Added macros to get node state information (IS_NODE_DOWN(node_ptr), etc). - -- Added support for Solaris. Patch from David Hoppner. - -- Rename "slurm-aix-federation-<version>.rpm" to just - "slurm-aix-<version>.rpm" (federation switch plugin may not be present). - -- Eliminated the redundant squeue output format and sort options of - "%o" and "%b". Use "%D" and "%S" formats respectively. Also eliminated - "%X" and "%Y" and "%Z" formats. Use "%z" instead. - -- Added mechanism for SPANK plugins to set environment variables for - Prolog, Epilog, PrologSLurmctld and EpilogSlurmctld programs using - the functions spank_get_job_env, spank_set_job_env, and - spank_unset_job_env. See "man spank" for more information. - -- Completed the work to begun in 2.0.0 to standardize on using '-Q' as the - --quiet flag for all the commands. - -- BLUEGENE - sinfo and sview now display correct cpu counts for partitions - -- Cleaned up the cons_res plugin. It now uses a ptr to a part_record - instead of having to do strcmp's to find the correct one. - -- Pushed most all the plugin specific info in src/common/node_select.c - into the respected plugin. - -- BLUEGENE - closed some corner cases where a block could had been removed - while a job was waiting for it to become ready because an underlying - part of the block was put into an error state. - -- Modify sbcast logic to prevent a user from moving files to nodes they - have not been allocated (this would be possible in previous versions - only by hacking the sbcast code). - -- Add contribs/sjstat script (Perl tool to report job state information). - Put into new RPM: sjstat. - -- Add sched/wiki2 (Moab) JOBMODIFY command support for VARIABLELIST option - to set supplemental environment variables for pending batch jobs. - -- BLUEGENE - add support for scontrol show blocks. - -- Added support for job step time limits. - -* Changes in SLURM 2.0.10 -========================= - -* Changes in SLURM 2.0.9 -======================== - -- When running the mysql plugin natively MUNGE errors are now eliminated - when sending updates to slurmctlds. - -- Check to make sure we have a default account before looking to - fill in default association. - -- Fix to make it so sched/wiki2 can modify a job's partition or hostlist of - non-pending jobs. - -- Applied slurmctld prolog bug fix from Dennis Leepow to backfill.c - -- fixed quite a few typos (needed for debian packages) - -- make it so slurmctld will core dump without --enable-debug set - -- Fix issue when doing a rollup on reservations before a cluster has been - added. - -- MySQL plugin - When doing archiving end time is now decreased by 1 - which should be more correct. - -- BLUEGENE - Fixed issue where --no-rotate didn't work correctly on job - submissions. - -- BLUEGENE - made the buffer longer when submitting jobs to get the entire - line. Previously the line could be shortened prematurely. - -- BLUEGENE - Fix to make sure we don't erroneously set a connection type - to SMALL. - -- Type cast a negative uint64_t to int64_t to avoid confusion when doing - arithmetic with it in accounting dealing with over commit time. - -* Changes in SLURM 2.0.8 -======================== - -- BLUEGENE - added dub2 of stderr to put error messages sent from underlying - libraries of the bridge api to the bridgeapi.log - -- Fixed issue with sacctmgr when modifing a user and specifying 'where' - after giving the user name also. - -- -L, --allclusters now works with sacct - -- Modified job table to use 32bit u/gids for those with ids greater - than 16 bits. - -- Made minor changes for slurm to compile cleanly under gcc 4.4.1 - -- Fixed issue with task/affinity when an allocation would run multiple sruns - with the --exclusive flag in an allocation with more than 1 node. - Previously when the first node was filled up no tasks would ever run - on the other nodes. - -- Fixed sview and sacct to display correct run time and suspend time when - job has been suspended. - -- Applied patch from Mark Grondona that fixes the validation of the - PluginDir to support the colon separated list of directories as documented. - -- BLUEGENE - squeue -o %R now prints more readable info for small blocks - -- sacct - fixed garbage being printed out on uninitialized variable. - -- Fix for mysql plugin when used without slurmdbd to register the - slurmctld properly. - -- Fix for mysql plugin putting correct hostnames in for running steps. - -* Changes in SLURM 2.0.7 -======================== - -- Fix bug in select/cons_res when nodes are configured in more than one - partition and those partitions have different priorities and sched/gang - is not configured. CPUs were previously over-allocated. - -- Fix core of smap when specifying -i option with invalid argument. - -- Fix issue when using srun --test-only to not put an entry of test - job into accounting. - -- For OpenMPI use of SLURM reserved ports. If any of the tasks fails to - acquire a reserved port and has an exit code of 108 then srun will - kill all remaining tasks and respawn the tasks. Previous code waited - for tasks to exit. - -- MySQL plugin - When doing archiving we now get a correct end time. - Previously it would grab an extra day to archive. - -- BLUEGENE - Handle initial state correctly, previously was setting initial - state to IDLE if UNKNOWN which would make it not set a registration - message to accounting, which could lead to nodes not being listed as up - when they really were. - -- Fixed buffer size issue with scontrol show hostlist. - -- Fixed issue with copy in smap -Dc previously command wouldn't work. - -- BLUEGENE - Update documentation about small blocks in the bluegene.conf - file. - -- In sched/wiki plugin (for Maui) fix possible message truncation on very - large cluster. - -- BLUEGENE - Fix for handling undocumented Deallocating to Configuring to - Free block transition state. - -- BLUEGENE - Fix for overlap mode loading blocks when midplane is in an - error state. - -- Add range check for SuspendTime configuration parameter. - -- Moved unzipped python-hostname tarball out and the tarball in. - -- BLUEGENE - Patched memory leak when running state test. - -- BLUEGENE - fixed slow down generated by slow call rm_get_BG - and polling thread. - -* Changes in SLURM 2.0.6 -======================== - -- Fixed seg fault when "scontrol listpids" is invoked for a specific job step - on a node on which a stepd is not running. - -- Fix bug in sched/backfill which could result in invalid memory reference - when trying to schedule jobs submitted with --exclude option. - -* Changes in SLURM 2.0.5 -======================== - -- BLUEGENE - Added support for emulating systems with a X-dimension of 4. - -- BLUEGENE - When a nodecard goes down on a non-Dynamic system SLURM will - now only drain blocks under 1 midplane, if no such block exists then SLURM - will drain the entire midplane and not mark any block in error state. - Previously SLURM would drain every overlapping block of the nodecard - making it possible for a large block to make other blocks not work since - they overlap some other part of the block that really isn't bad. - -- BLUEGENE - Handle L3 errors on boot better. - -- Don't revoke a pending batch launch request from the slurmctld if the - job is immediately suspended (a normal event with gang scheduling). - -- BLUEGENE - Fixed issue with restart of slurmctld would allow error block - nodes to be considered for building new blocks when testing if a job would - run. This is a visual bug only, jobs would never run on new block, but - the block would appear in slurm tools. - -- Better responsiveness when starting new allocations when running with the - slurmdbd. - -- Fixed race condition when reconfiguring the slurmctld and using the - consumable resources plugin which would cause the controller to core. - -- Fixed race condition that sometimes caused jobs to stay in completing - state longer than necessary after being terminated. - -- Fixed issue where if a parent account has a qos added and then a child - account has the qos removed the users still get the qos. - -- BLUEGENE - New blocks in dynamic mode will only be made in the system - when the block is actually needed for a job, not when testing. - -- BLUEGENE - Don't remove larger block used for small block until job starts. - -- Add new squeue output format and sort option of "%L" to print a job's time - left (time limit minus time used). - -- BLUEGENE - Fixed draining state count for sinfo/sview. - -- Fix for sview to not core when viewing nodes allocated to a partition - and the all jobs finish. - -- Fix cons_res to not core dump when finishing a job running on a - defunct partition. - -- Don't require a node to have --ntasks-per-node CPUs for use when the - --overcommit option is also used. - -- Increase the maximum number of tasks which can be launched by a job step - per node from 64 to 128. - -- sview - make right click on popup window title show sorted list. - -- scontrol now displays correct units for job min memory and min tmp disk. - -- better support for salloc/sbatch arbitrary layout for setting correct - SLURM_TASKS_PER_NODE - -- Env var SLURM_CPUS_ON_NODE is now set correctly depending on the - FastSchedule configuration parameter. - -- Correction to topology/3d_torus plugin calculation when coordinate value - exceeds "9" (i.e. a hex value). - -- In sched/wiki2 - Strip single and double quotes out of a node's reason - string to avoid confusing Moab's parser. - -- Modified scancel to cancel any pending jobs before cancelling any other - -- Updated sview config info - -- Fix a couple of bugs with respect to scheduling with overlapping - reservations (one with a flag of "Maintenance"). - -- Fix bug when updating a pending job's nice value after explicitly setting - it's priority. - -- We no longer add blank QOS' - -- Fix task affinity for systems running fastschedule!=0 and they have less - resources configured than in existence. - -- Slurm.pm loads without warning now on AIX systems - -- modified pmi code to do strncpy's on the correct len - -- Fix for filling in a qos structure to return SLURM_SUCCESS on success. - -- BLUEGENE - Added SLURM_BG_NUM_NODES with cnode count of allocation, - SLURM_JOB_NUM_NODES represents midplane counts until 2.1. - -- BLUEGENE - Added fix for if a block is in error state and the midplane - containing the block is also set to drain/down. This previously - prevented dynamic creation of new blocks when this state was present. - -- Fixed bug where a users association limits were not enforced, only - parent limits were being enforced. - -- For OpenMPI use of SLURM reserved ports, reserve a count of ports equal to - the maximum task count on any node plus one (the plus one is a correction). - -- Do not reset SLURM_TASKS_PER_NODE when srun --preserve-env option is used - (needed by OpenMPI). - -- Fix possible assert failure in task/affinity if a node is configured with - more resources than physically exist. - -- Sview can now resize columns. - -- Avoid clearing a drained node's reason field when state is changed from - down (i.e. returned to service). Note the drain state flag stays set. - -* Changes in SLURM 2.0.4 -======================== - -- Permit node suspend/resume logic to be enabled through "scontrol reconfig" - given appropriate changes to slurm configuration file. - -- Check for return codes on functions with warn_unused_result set. - -- Fix memory leak in getting step information (as used by squeue -s). - -- Better logging for when job's request bad output file locations. - -- Fix issue where if user specified non-existant file to write to slurmstepd - will regain privileges before sending batch script ended to the controller. - -- Fix bug when using the priority_multifactor plugin with no associations - yet. - -- BLUEGENE - we no longer check for the images to sync state. This was - needed long ago when rebooting blocks wasn't a possibility and should - had been removed when that functionality was available. - -- Added message about no connection with the database for sacctmgr. - -- On BlueGene, let srun or salloc exit on SIGINT if slurmctld dies while - booting its block. - -- In select/cons_res fix bug that could result in invalid memory pointer - if node configurations in slurm.conf contains 8 or more distinct - socket/core/thread counts. - -- Modify select/cons_res to recognize updated memory size upon node startup - if FastSchedule=0. - -- Fixed bug if not enforcing associations, but running with them and the - priority/multifactor, the slurmctld will not core dump on processing usage. - -- QOS will not be reset to the default when added back a previously deleted - association. - -- Do not set a job's virtual memory limit based upon the job's specified - memory limit (which should be a real memory limit, not virtual). - -- BLUEGENE - fix for sinfo/sview for displaying proper node count for nodes - in draining state. - -- Fix for sview when viewing a certain part of a group (like 1 job) so it - doesn't core when the part is gone. - -- BLUEGENE - Changed order of SYNC's to be on the front of the list to - avoid having a job terminated with a TERM before the SYNC of the - job happens. - -- Validate configured PluginDir value is a valid directory before trying to - use it. - -- Fix to resolve agent_queue_request symbol from some checkpoint plugins. - -- Fix possible execve error for sbatch script read from stdin. - -- Modify handling of user ID/name and group ID/name in the slurm.conf file - to properly handle user names that contain all digits. Return error code - from uid_from_string() and gid_from_string() functions rather than a uid of - -1, which might be a valid uid or gid on some systems. - -- Fix in re-calcuation of job priorities due to DOWN or DRAINED nodes. - -* Changes in SLURM 2.0.3 -======================== - -- Add reservation creation/update flag of Ignore_Jobs to enable the creation - of a reservation that overlaps jobs expected to still be running when - the reservation starts. This would be especially useful to reserve all - nodes for system maintenence without adjusting time limits of running - jobs before creating the reservation. Without this flag, nodes allocated - jobs expected to running when the reservation begins can not be placed - into a reservation. - -- In task/affinity plugin, add layer of abstraction to logic translating - block masks to physical machine masks. Patch from Matthieu Hautreux, CEA. - -- Fix for setting the node_bitmap in a job to NULL if the job does not - start correctly when expected to start. - -- Fixed bug in srun --pty logic. Output from the task was split up - arbitrarily into stdout and stderr streams, and sometimes was printed - out of order. - -- If job requests minimum and maximum node count range with select/cons_res, - try to satisfy the higher value (formerly only allocated the minimum). - -- Fix for checking for a non-existant job when querying steps. - -- For job steps with the --exclusive option, base initial wait time in - partly upon the process ID for better performance with many job steps - started at the same time. Maintain exponential back-off as needed. - -- Fix for correct step ordering in sview. - -- Support optional argument to srun and salloc --immediate option. Specify - timeout value in seconds for job or step to be allocated resources. - -* Changes in SLURM 2.0.2 -======================== - -- Fix, don't remove job details when a job is cancelled while pending. - -- Do correct type for mktime so garbage isn't returned on 64bit systems - for accounting archival. - -- Better checking in sacctmgr to avoid infinite loops. - -- Fix minor memory leak in fake_slurm_step_layout_create() - -- Fix node weight (scheduling priority) calculation for powered down - nodes. Patch from Hongjia Cao, NUDT. - -- Fix node suspend/resume rate calculations. Patch from Hongjia Cao, NUDT. - -- Change calculations using ResumeRate and SuspendRate to provide higher - resolution. - -- Log the IP address for incoming messages having an invalid protocol - version number. - -- Fix for sacct to show jobs that start the same second as the sacct - command is issued. - -- BLUEGENE - Fix for -n option to work on correct cpu counts for each - midplane instead of treating -n as a c-node count. - -- salloc now sets SLURM_NTASKS_PER_NODE if --ntasks-per-node option is set. - -- Fix select/linear to properly set a job's count of allocated processors - (all processors on the allocated nodes). - -- Fix select/cons_res to allocate proper CPU count when --ntasks-per-node - option is used without a task count in the job request. - -- Insure that no node is allocated to a job for which the CPU count is less - than --ntasks-per-node * --cpus-per-task. - -- Correct AllocProcs reported by "scontrol show node" when ThreadsPerCore - is greater than 1 and select/cons_res is used. - -- Fix scontrol show config for accounting information when values are - not set in the slurm.conf. - -- Added a set of SBATCH_CPU_BIND* and SBATCH_MEM_BIND* env variables to keep - jobsteps launched from within a batch script from inheriting the CPU and - memory affinity that was applied to the batch script. Patch from Matthieu - Hautreux, CEA. - -- Ignore the extra processors on a node above configured size if either - sched/gang or select/cons_res is configured. - -- Fix bug in tracking memory allocated on a node for select/cons_res plugin. - -- Fixed a race condition when writing labelled output with a file per task - or per node, which potentially closed a file before all data was written. - -- BLUEGENE - Fix, for if a job comes in spanning both less than and - over 1 midplane in size we check the connection type appropriately. - -- Make sched/backfill properly schedule jobs with constraints having node - counts. NOTE: Backfill of jobs with constraings having exclusive OR - operators are not fully supported. - -- If srun is cancelled by SIGINT, set the job state to cancelled, not - failed. - -- BLUEGENE - Fix, for if you are setting an subbp into an error mode - where the subbp stated isn't the first ionode in a nodecard. - -- Fix for backfill to not core when checking shared nodes. - -- Fix for scontrol to not core when hitting just return in interactive mode. - -- Improve sched/backfill logic with respect to shared nodes (multiple jobs - per node). - -- In sched/wiki (Maui interface) add job info fields QOS, RCLASS, DMEM and - TASKSPERNODE. Patch from Bjorn-Helge Mevik, University of Oslo. - -* Changes in SLURM 2.0.1 -======================== - -- Fix, truncate time of start and end for job steps in sacct. - -- Initialize all messages to slurmdbd. Previously uninitialized string could - cause slurmctld to fail with invalid memory reference. - -- BLUEGENE - Fix, for when trying to finish a torus on a block already - visited. Even though this may be possible electrically this isn't valid - in the under lying infrastructure. - -- Fix, in mysql plugins change mediumints to int to support full 32bit - numbers. - -- Add sinfo node state filtering support for NO_RESPOND, POWER_SAVE, FAIL, - MAINT, DRAINED and DRAINING states. The state filter of DRAIN still maps - to any node in either DRAINED or DRAINING state. - -- Fix reservation logic when job requests specific nodes that are already - in some reservation the job can not use. - -- Fix recomputation of a job's end time when allocated nodes which are - being powered up. The end time would be set in the past if the job's - time limit was INFINITE, resulting in it being prematurely terminated. - -- Permit regular user to change the time limit of his pending jobs up to - the partition's limit. - -- Fix "-Q" (quiet) option for salloc and sbatch which was previously - ignored. - -- BLUEGENE - fix for finding odd shaped blocks in dynamic mode. - -- Fix logic supporting SuspendRate and ResumeRate configuration parameters. - Previous logic was changing state of one too many nodes per minute. - -- Save new reservation state file on shutdown (even if no changes). - -- Fix, when partitions are deleted the sched and select plugins are notified. - -- Fix for slurmdbd to create wckeyid's when they don't exist - -- Fix linking problem that prevented checkpoint/aix from working. - -* Changes in SLURM 2.0.0 -======================== - -- Fix for bluegene systems to be able to create 32 node blocks with only - 16 psets defined in dynamic layout mode. - -- Improve srun_cr handling of child srun forking. Patch from Hongjia Cao, - NUDT. - -- Configuration parameter ResumeDelay replaced by SuspendTimeout and - ResumeTimeout. - -- BLUEGENE - sview/sinfo now displays correct cnode numbers for drained nodes - or blocks in error state. - -- Fix some batch job launch bugs when powering up suspended nodes. - -- Added option '-T' for sacct to truncate time of start and end and set - default of --starttime to Midnight of current day. - -* Changes in SLURM 2.0.0-rc2 -============================ - -- Change fanout logic to start on calling node instead of first node in - message nodelist. - -- Fix bug so that smap builds properly on Sun Constellation system. - -- Filter white-space out from node feature specification. - -- Fixed issue with duration not being honored when updating start time in - reservations. - -- Fix bug in sched/wiki and sched/wiki2 plugins for reporting job resource - allocation properly when node names are configured out of sort order - with more than one numeric suffix (e.g. "tux10-1" is configured after - "tux5-1"). - -- Avoid re-use of job_id (if specified at submit time) when the existing - job is in completing state (possible race condition with Moab). - -- Added SLURM_DISTRIBUTION to env for salloc. - -- Add support for "scontrol takeover" command for backup controller to - assume control immediately. Patch from Matthieu Hautreux, CEA. - -- If srun is unable to communicate with the slurmd tasks are now marked as - failed with the controller. - -- Fixed issues with requeued jobs not being accounted for correctly in - the accounting. - -- Clear node's POWER_SAVE flag if configuration changes to one lacking a - ResumeProgram. - -- Extend a job's time limit as appropriate due to delays powering up nodes. - -- If sbatch is used to launch a job step within an existing allocation (as - used by LSF) and the required node is powered down, print the message - "Job step creation temporarily disabled, retrying", sleep, and retry. - -- Configuration parameter ResumeDelay added to control how much time must - after a node has been suspended before resume it (e.g. powering it back - up). - -- Fix CPU binding for batch program. Patch from Matthieu Hautreux, CEA. - -- Fix for front end systems non-responding nodes now show up correctly in - sinfo. - -* Changes in SLURM 2.0.0-rc1 -============================ - -- Fix bug in preservation of advanced reservations when slurmctld restarts. - -- Updated perlapi to match correctly with slurm.h structures - -- Do not install the srun command on BlueGene systems (mpirun must be used to - launch tasks). - -- Corrections to scheduling logic for topology/tree in configurations where - nodes are configured in multiple leaf switches. - -- Patch from Matthieu Hautreux for backup mysql deamon support. - -- Changed DbdBackup to DbdBackupHost for slurmdbd.conf file - -- Add support for spank_strerror() function and improve error handling in - general for SPANK plugins. - -- Added configuration parameter SrunIOTimeout to optionally ping srun's tasks - for better fault tolerance (e.g. killed and restarteed SLURM daemons on - compute node). - -- Add slurmctld and slurmd binding to appropriate communications address - based upon NodeAddr, ControllerAddr and BackupAddr configuration - parameters. Based upon patch from Matthieu Hautreux, CEA. - NOTE: Fails when SlurmDBD is configured with some configurations. - NOTE: You must define BIND_SPECIFIC_ADDR to enable this option. - -- Avoid using powered down nodes when scheduling work if possible. - Fix possible invalid memory reference in power save logic. - -* Changes in SLURM 1.4.0-pre13 -============================== - -- Added new partition option AllocNodes which controls the hosts from - which jobs can be submitted to this partition. From Matthieu Hautreux, CEA. - -- Better support the --contiguous option for job allocations. - -- Add new scontrol option: show topology (reports contents of topology.conf - file via RPC if topology/tree plugin is configured). - -- Add advanced reservation display to smap command. - -- Replaced remaining references to SLURM_JOBID with SLURM_JOB_ID - except - when needed for backwards compatibility. - -- Fix logic to properly excise a DOWN node from the allocation of a job - with the --no-kill option. - -- The MySQL and PgSQL plugins for accounting storage and job completion are - now only built if the underlying database libraries exists (previously - the plugins were built to produce a fatal error when used). - -- BLUEGENE - scontrol show config will now display bluegene.conf information. - -* Changes in SLURM 1.4.0-pre12 -============================== - -- Added support for hard time limit by associations with added configuration - option PriorityUsageResetPeriod. This specifies the interval at which to - clear the record of time used. This is currently only available with the - priority/multifactor plugin. - -- Added SLURM_SUBMIT_DIR to sbatch's output environment variables. - -- Backup slurmdbd support implemented. - -- Update to checkpoint/xlch logic from Hongjia Cao, NUDT. - -- Added configuration parameter AccountingStorageBackupHost. - -* Changes in SLURM 1.4.0-pre11 -============================== - -- Fix slurm.spec file for RPM build. - -* Changes in SLURM 1.4.0-pre10 -============================== - -- Critical bug fix in task/affinity when the CoresPerSocket is greater - than the ThreadsPerCore (invalid memory reference). - -- Add DebugFlag parameter of "Wiki" to log sched/wiki and wiki2 - communications in greater detail. - -- Add "-d <slurmstepd_path>" as an option to the slurmd daemon to - specifying a non-stardard slurmstepd file, used for testing purposes. - -- Minor cleanup to crypto/munge plugin. - - Restrict uid allowed to decode job credentials in crypto/munge - - Get slurm user id early in crypto/munge - - Remove buggy error code handling in crypto/munge - -- Added sprio command - works only with the priority/multifactor plugin - -- Add real topology plugin infrastructure (it was initially added - directly into slurmctld code). To specify topology information, - set TopologyType=topology/tree and add configuration information - to a new file called topology.conf. See "man topology.conf" or - topology.html web page for details. - -- Set "/proc/self/oom_adj" for slurmd and slurmstepd daemons based upon - the values of SLURMD_OOM_ADJ and SLURMSTEPD_OOM_ADJ environment - variables. This can be used to prevent daemons being killed when - a node's memory is exhausted. Based upon patch by Hongjia Cao, NUDT. - -- Fix several bugs in task/affinity: cpuset logic was broken and - --cpus-per-task option not properly handled. - -- Ensure slurmctld adopts SlurmUser GID as well as UID on startup. - -* Changes in SLURM 1.4.0-pre9 -============================= - -- OpenMPI users only: Add srun logic to automatically recreate and - re-launch a job step if the step fails with a reserved port conflict. - -- Added TopologyPlugin configuration parameter. - -- Added switch topology data structure to slurmctld (for use by select - plugin) add load it based upon new slurm.conf parameters: SwitchName, - Nodes, Switches and LinkSpeed. - -- Modify select/linear and select/cons_res plugins to optimize resource - allocation with respect to network topology. - -- Added support for new configuration parameter EpilogSlurmctld (executed - by slurmctld daemon). - -- Added checkpoint/blcr plugin, SLURM now support job checkpoint/restart - using BLCR. Patch from Hongjia Cao, NUDT, China. - -- Made a variety of new environment variables available to PrologSlurmctld - and EpilogSlurmctld. See the "Prolog and Epilog Scripts" section of the - slurm.conf man page for details. - -- NOTE: Cold-start (without preserving state) required for upgrade from - version 1.4.0-pre8. - -* Changes in SLURM 1.4.0-pre8 -============================= - -- In order to create a new partition using the scontrol command, use - the "create" option rather than "update" (which will only operate - upon partitions that already exist). - -- Added environment variable SLURM_RESTART_COUNT to batch jobs to - indicated the count of job restarts made. - -- Added sacctmgr command "show config". - -- Added the scancel option --nodelist to cancel any jobs running on a - given list of nodes. - -- Add partition-specific DefaultTime (default time limit for jobs, - if not specified use MaxTime for the partition. Patch from Par - Andersson, National Supercomputer Centre, Sweden. - -- Add support for the scontrol command to be able change the Weight - associated with nodes. Patch from Krishnakumar Ravi[KK] (HP). - -- Add DebugFlag configuration option of "CPU_Bind" for detailed CPU - binding information to be logged. - -- Fix some significant bugs in task binding logic (possible infinite loops - and memory corruption). - -- Add new node state flag of NODE_STATE_MAINT indicating the node is in - a reservation of type MAINT. - -- Modified task/affinity plugin to automatically bind tasks to sockets, - cores, or threads as appropriated based upon resource allocation and - task count. User can override with srun's --cpu_bind option. - -- Fix bug in backfill logic for select/cons_res plugin, resulted in - error "cons_res:_rm_job_from_res: node_state mis-count". - -- Add logic go bind a batch job to the resources allocated to that job. - -- Add configuration parameter MpiParams for (future) OpenMPI port - management. Add resv_port_cnt and resv_ports fields to the job step - data structures. Add environment variable SLURM_STEP_RESV_PORTS to - show what ports are reserved for a job step. - -- Add support for SchedulerParameters=interval=<sec> to control the time - interval between executions of the backfill scheduler logic. - -- Preserve record of last job ID in use even when doing a cold-start unless - there is no job state file or there is a change in its format (which only - happens when there is a change in SLURM's major or minor version number: - v1.3 -> v1.4). - -- Added new configuration parameter KillOnBadExit to kill a job step as soon - as any task of a job step exits with a non-zero exit code. Patch based - on work from Eric Lin, Bull. - -- Add spank plugin calls for use by salloc and sbatch command, see - "man spank" for details. - -- NOTE: Cold-start (without preserving state) required for upgrade from - version 1.4.0-pre7. - -* Changes in SLURM 1.4.0-pre7 -============================= - -- Bug fix for preemption with select/cons_res when there are no idle nodes. - -- Bug fix for use of srun options --exclusive and --cpus-per-task together - for job step resource allocation (tracking of cpus in use was bad). - -- Added the srun option --preserve-env to pass the current values of - environment variables SLURM_NNODES and SLURM_NPROCS through to the - executable, rather than computing them from commandline parameters. - -- For select/cons_res or sched/gang only: Validate a job's resource - allocation socket and core count on each allocated node. If the node's - configuration has been changed, then abort the job. - -- For select/cons_res or sched/gang only: Disable updating a node's - processor count if FastSchedule=0. Administrators must set a valid - processor count although the memory and disk space configuration can - be loaded from the compute node when it starts. - -- Add configure option "--disable-iso8601" to disable SLURM use of ISO 8601 - time format at the time of SLURM build. Default output for all commands - is now ISO 8601 (yyyy-mm-ddThh:mm:ss). - -- Add support for scontrol to explicity power a node up or down using the - configured SuspendProg and ResumeProg programs. - -- Fix book select/cons_res logic for tracking the number of allocated - CPUs on a node when a partition's Shared value is YES or FORCE. - -- Added configure options "--enable-cray-xt" and "--with-apbasil=PATH" for - eventual support of Cray-XT systems. - -* Changes in SLURM 1.4.0-pre6 -============================= - -- Fix job preemption when sched/gang and select/linear are configured with - non-sharing partitions. - -- In select/cons_res insure that required nodes have available resources. - -* Changes in SLURM 1.4.0-pre5 -============================= - -- Correction in setting of SLURM_CPU_BIND environment variable. - -- Rebuild slurmctld's job select_jobinfo->node_bitmap on restart/reconfigure - of the daemon rather than restoring the bitmap since the nodes in a system - can change (be added or removed). - -- Add configuration option "--with-cpusetdir=PATH" for non-standard - locations. - -- Get new multi-core data structures working on BlueGene systems. - -- Modify PMI_Get_clique_ranks() to return an array of integers rather - than a char * to satisfy PMI standard. Correct logic in - PMI_Get_clique_size() for when srun --overcommit option is used. - -- Fix bug in select/cons_res, allocated a job all of the processors on a - node when the --exclusive option is specified as a job submit option. - -- Add NUMA cpu_bind support to the task affinity plugin. Binds tasks to - a set of CPUs that belong NUMA locality domain with the appropriate - --cpu-bind option (ldoms, rank_ldom, map_ldom, and mask_ldom), see - "man srun" for more information. - -* Changes in SLURM 1.4.0-pre4 -============================= - -- For task/affinity, force jobs to use a particular task binding by setting - the TaskPluginParam configuration parameter rather than slurmd's - SLURM_ENFORCED_CPU_BIND environment variable. - -- Enable full preemption of jobs by partition with select/cons_res - (cons_res_preempt.patch from Chris Holmes, HP). - -- Add configuration parameter DebugFlags to provide detailed logging for - specific subsystems (steps and triggers so far). - -- srun's --no-kill option is passed to slurmctld so that a job step is - killed even if the node where srun executes goes down (unless the - --no-kill option is used, previous termination logic would fail if - srun was not responding). - -- Transfer a job step's core bitmap from the slurmctld to the slurmd - within the job step credential. - -- Add cpu_bind, cpu_bind_type, mem_bind and mem_bind_type to job allocation - request and job_details structure in slurmctld. Add support to --cpu_bind - and --mem_bind options from salloc and sbatch commands. - -* Changes in SLURM 1.4.0-pre3 -============================= - -- Internal changes: CPUs per node changed from 32-bit to 16-bit size. - Node count fields changed from 16-bit to 32-bit size in some structures. - -- Remove select plugin functions select_p_get_extra_jobinfo(), - select_p_step_begin() and select_p_step_fini(). - -- Remove the following slurmctld job structure fields: num_cpu_groups, - cpus_per_node, cpu_count_reps, alloc_lps_cnt, alloc_lps, and used_lps. - Use equivalent fields in new "select_job" structure, which is filled - in by the select plugins. - -- Modify mem_per_task in job step request from 16-bit to 32-bit size. - Use new "select_job" structure for the job step's memory management. - -- Add core_bitmap_job to slurmctld's job step structure to identify - which specific cores are allocated to the step. - -- Add new configuration option OverTimeLimit to permit jobs to exceed - their (soft) time limit by a configurable amount. Backfill scheduling - will be based upon the soft time limit. - -- Remove select_g_get_job_cores(). That data is now within the slurmctld's - job structure. - -* Changes in SLURM 1.4.0-pre2 -============================= - -- Remove srun's --ctrl-comm-ifhn-addr option (for PMI/MPICH2). It is no - longer needed. - -- Modify power save mode so that nodes can be powered off when idle. See - https://computing.llnl.gov/linux/slurm/power_save.html or - "man slurm.conf" (SuspendProgram and related parameters) for more - information. - -- Added configuration parameter PrologSlurmctld, which can be used to boot - nodes into a particular state for each job. See "man slurm.conf" for - details. - -- Add configuration parameter CompleteTime to control how long to wait for - a job's completion before allocating already released resources to pending - jobs. This can be used to reduce fragmentation of resources. See - "man slurm.conf" for details. - -- Make default CryptoType=crypto/munge. OpenSSL is now completely optional. - -- Make default AuthType=auth/munge rather than auth/none. - -- Change output format of "sinfo -R" from "%35R %N" to "%50R %N". - -* Changes in SLURM 1.4.0-pre1 -============================= - -- Save/restore a job's task_distribution option on slurmctld retart. - NOTE: SLURM must be cold-started on converstion from version 1.3.x. - -- Remove task_mem from job step credential (only job_mem is used now). - -- Remove --task-mem and --job-mem options from salloc, sbatch and srun - (use --mem-per-cpu or --mem instead). - -- Remove DefMemPerTask from slurm.conf (use DefMemPerCPU or DefMemPerNode - instead). - -- Modify slurm_step_launch API call. Move launch host from function argument - to element in the data structure slurm_step_launch_params_t, which is - used as a function argument. - -- Add state_reason_string to job state with optional details about why - a job is pending. - -- Make "scontrol show node" output match scontrol input for some fields - ("Cores" changed to "CoresPerSocket", etc.). - -- Add support for a new node state "FUTURE" in slurm.conf. These node records - are created in SLURM tables for future use without a reboot of the SLURM - daemons, but are not reported by any SLURM commands or APIs. - -* Changes in SLURM 1.3.17 -========================= - -- Fix bug in configure script that can clear user specified LIBS. - -* Changes in SLURM 1.3.16 -========================= - -- Fix memory leak in forward logic of tree message passing. - -- Fix job exit code recorded for srun job allocation. - -- Bluegene - Bug fix for too many parameters being passed to a debug statement - -- Bluegene - Bug fix for systems running more than 8 in the X dim running - Dynamic mode. - -* Changes in SLURM 1.3.15 -========================= - -- Fix bug in squeue command with sort on job name ("-S j" option) for jobs - that lack a name. Previously generated an invalid memory reference. - -- Permit the TaskProlog to write to the job's standard output by writing - a line containing the prefix "print " to it's standard output. - -- Fix for making the slurmdbd agent thread start up correctly when - stopped and then started again. - -- Add squeue option to report jobs by account (-U or --account). Patch from - Par Andersson, National Supercomputer Centre, Sweden. - -- Add -DNUMA_VERSION1_COMPATIBILITY to Makefile CFLAGS for proper behavior - when building with NUMA version 2 APIs. - -- BLUEGENE - slurm works on a BGP system. - -- BLUEGENE - slurm handles HTC blocks - -- BLUEGENE - Added option DenyPassthrough in the bluegene.conf. Can be set - to any combination of X,Y,Z to not allow passthroughs when running in - dynamic layout mode. - -- Fix bug in logic to remove a job's dependency, could result in abort. - -- Add new error message to sched/wiki and sched/wiki2 (Maui and Moab) for - STARTJOB request: "TASKLIST includes non-responsive nodes". - -- Fix bug in select/linear when used with sched/gang that can result in a - job's required or excluded node specification being ignored. - -- Add logic to handle message connect timeouts (timed-out.patch from - Chuck Clouston, Bull). - -- BLUEGENE - CFLAGS=-m64 is no longer required in configure - -- Update python-hostlist code from Kent Engström (NSC) to v1.5 - - Add hostgrep utility to search for lines matching a hostlist. - - Make each "-" on the command line count as one hostlist argument. - If multiple hostslists are given on stdin they are combined to a - union hostlist before being used in the way requested by the - options. - -- When using -j option in sacct no user restriction will applied unless - specified with the -u option. - -- For sched/wiki and sched/wiki2, change logging of wiki message traffic - from debug() to debug2(). Only seen if SlurmctldDebug is configured to - 6 or higher. - -- Significant speed up for association based reports in sreport - -- BLUEGENE - fix for checking if job can run with downed nodes. Previously - sbatch etc would tell you node configuration not available now jobs are - accepted but held until nodes are back up. - -- Fix in accounting so if any nodes are removed from the system when they - were previously down will be recorded correctly. - -- For sched/wiki2 (Moab), add flag to note if job is restartable and - prevent deadlock of job requeue fails. - -- Modify squeue to return non-zero exit code on failure. Patch from - Par Andersson (NSC). - -- Correct logic in select/cons_res to allocate a job the maximum node - count from a range rather than minimum (e.g. "sbatch -N1-4 my.sh"). - -- In accounting_storage/filetxt and accounting_storage/pgsql fix - possible invalid memory reference when a job lacks a name. - -- Give srun command an exit code of 1 if the prolog fails. - -- BLUEGENE - allows for checking nodecard states in the system instead - of midplane state so as to not down an entire midplane if you don't - have to. - -- BLUEGENE - fix creation of MESH blocks - -- BLUEGENE - on job cancellation we call jm_cancel_job and then wait until - the system cleans up the job. Before we would send a SIGKILL right - at the beginning. - -- BLUEGENE - if a user specifies a node count that can not be met the job - will be refused instead of before the plugin would search for the next - larger size that could be created. This prevents users asking for - things that can't be created, and then getting something back they might - not be expecting. - -* Changes in SLURM 1.3.14 -========================= - -- SECURITY BUG: Fix in sbcast logic that permits users to write files based - upon supplimental groups of the slurmd daemon. Similar logic for event - triggers if slurmctld is run as user root (not typical). - -* Changes in SLURM 1.3.13 -========================= - -- Added ability for slurmdbd to archive and purge step and/or job records. - -- Added DefaultQOS as an option to slurmdbd.conf for when clusters are - added the default will be set to this if none is given in the sacctmgr line. - -- Added configure option --enable-sun-const for Sun Constellation system with - 3D torus interconnect. Supports proper smap and sview displays for 3-D - topology. Node names are automatically put into Hilbert curve order given - a one-line nodelist definition in slurm.conf (e.g. NodeNames=sun[000x533]). - -- Fixed bug in parsing time for sacct and sreport to pick the correct year if - none is specified. - -- Provide better scheduling with overlapping partitions (when a job can not - be scheduled due to insufficient resources, reserve specific the nodes - associated with that partition rather than blocking all partitions with - any overlapping nodes). - -- Correct logic to log in a job's stderr that it was "CANCELLED DUE TO - NODE FAILURE" rather than just "CANCELLED". - -- Fix to crypto/openssl plugin that could result in job launch requests - being spoofed through the use of an improperly formed credential. This bug - could permit a user to launch tasks on compute nodes not allocated for - their use, but will NOT permit them to run tasks as another user. For more - information see http://www.ocert.org/advisories/ocert-2008-016.html - -* Changes in SLURM 1.3.12 -========================= - -- Added support for Workload Characteristic Key (WCKey) in accounting. The - WCkey is something that can be used in accounting to group associations - together across clusters or within clusters that are not related. Use - the --wckey option in srun, sbatch or salloc or set the SLURM_WCKEY env - var to have this set. Use sreport with the wckey option to view reports. - THIS CHANGES THE RPC LEVEL IN THE SLURMDBD. YOU MUST UPGRADE YOUR SLURMDBD - BEFORE YOU UPGRADE THE REST OF YOUR CLUSTERS. THE NEW SLURMDBD WILL TALK - TO OLDER VERSIONS OF SLURM FINE. - -- Added configuration parameter BatchStartTimeout to control how long to - allow for a batch job prolog and environment loading (for Moab) to run. - Previously if job startup took too long, a batch job could be cancelled - before fully starting with a SlurmctldLog message of "Master node lost - JobId=#, killing it". See "man slurm.conf" for details. - -- For a job step, add support for srun's --nodelist and --exclusive options - to be used together. - -- On slurmstepd failure, set node state to DRAIN rather than DOWN. - -- Fix bug in select/cons_res that would incorrectly satify a tasks's - --cpus-per-task specification by allocating the task CPUs on more than - one node. - -- Add support for hostlist expressions containing up to two numeric - expressions (e.g. "rack[0-15]_blade[0-41]"). - -- Fix bug in slurmd message forwarding which left file open in the case of - some communication failures. - -- Correction to sinfo node state information on BlueGene systems. DRAIN - state was replaced with ALLOC or IDLE under some situations. - -- For sched/wiki2 (Moab), strip quotes embedded within job names from the - name reported. - -- Fix bug in jobcomp/script that could cause the slurmctld daemon to exit - upon reconfiguration ("scontrol reconfig" or SIGHUP). - -- Fix to sinfo, don't print a node's memory size or tmp_disk space with - suffix of "K" or "M" (thousands or millions of megabytes). - -- Improve efficiency of scheduling jobs into partitions which do not overlap. - -- Fixed sreport user top report to only display the limit specified - instead of all users. - -* Changes in SLURM 1.3.11 -========================= - -- Bluegene/P support added (minimally tested, but builds correctly). - -- Fix infinite loop when using accounting_storage/mysql plugin either from - the slurmctld or slurmdbd daemon. - -- Added more thread safety for assoc_mgr in the controller. - -- For sched/wiki2 (Moab), permit clearing of a job's dependencies with the - JOB_MODIFY option "DEPEND=0". - -- Do not set a running or pending job's EndTime when changing it's time - limit. - -- Fix bug in use of "include" parameter within the plugstack.conf file. - -- Fix bug in the parsing of negative numeric values in configuration files. - -- Propagate --cpus-per-task parameter from salloc or sbatch input line to - the SLURM_CPUS_PER_TASK environment variable in the spawned shell for - srun to use. - -- Add support for srun --cpus_per_task=0. This can be used to spawn tasks - without allocating resouces for the job step from the job's allocation - when running multiple job steps with the --exclusive option. - -- Remove registration messages from saved messages when bringing down cluster. - Without causes deadlock if wrong cluster name is given. - -- Correction to build for srun debugger (export symbols). - -- sacct will now display more properly allocations made with salloc with only - one step. - -- Altered sacctmgr, sreport to look at complete option before applying. - Before we would only look at the first determined significant characters. - -- BLUGENE - in overlap mode marking a block to error state will now end - jobs on overlapping blocks and free them. - -- Give a batch job 20 minutes to start before considering it missing and - killing it (long delay could result from slurmd being paged out). Changed - the log message from "Master node lost JobId=%u, killing it" to "Batch - JobId=%u missing from master node, killing it". - -- Avoid "Invalid node id" error when a job step within an existing job - allocation specifies a node count which is less than the node count - allocated in order to satisfy the task count specification (e.g. - "srun -n16 -N1 hostname" on allocation of 16 one-CPU nodes). - -- For sched/wiki2 (Moab) disable changing a job's name after it has begun - execution. - -* Changes in SLURM 1.3.10 -========================= - -- Fix several bugs in the hostlist functions: - - Fix hostset_insert_range() to do proper accounting of hl->nhosts (count). - - Avoid assertion failure when callinsg hostset_create(NULL). - - Fix return type of hostlist and hostset string functions from size_t to - ssize_t. - - Add check for NULL return from hostlist_create(). - - Rewrite of hostrange_hn_within(), avoids reporting "tst0" in the hostlist - "tst". - -- Modify squeue to accept "--nodes=<hostlist>" rather than - "--node=<node_name>" and report all jobs with any allocated nodes from set - of nodes specified. From Par Anderson, National Supercomputer Centre, - Sweden. - -- Fix bug preventing use of TotalView debugger with TaskProlog configured or - or srun's --task-prolog option. - -- Improve reliability of batch job requeue logic in the event that the slurmd - daemon is temporarily non-responsive (for longer than the configured - MessageTimeout value but less than the SlurmdTimeout value). - -- In sched/wiki2 (Moab) report a job's MAXNODES (maximum number of permitted - nodes). - -- Fixed SLURM_TASKS_PER_NODE to live up more to it's name on an allocation. - Will now contain the number of tasks per node instead of the number of CPUs - per node. This is only for a resource allocation. Job steps already have - the environment variable set correctly. - -- Configuration parameter PropagateResourceLimits has new option of "NONE". - -- User's --propagate options take precidence over PropagateResourceLimits - configuration parameter in both srun and sbatch commands. - -- When Moab is in use (salloc or sbatch is executed with the --get-user-env - option to be more specific), load the user's default resource limits rather - than propagating the Moab daemon's limits. - -- Fix bug in slurmctld restart logic for recovery of batch jobs that are - initiated as a job step rather than an independent job (used for LSF). - -- Fix bug that can cause slurmctld restart to fail, bug introduced in SLURM - version 1.3.9. From Eygene Ryabinkin, Kurchatov Institute, Russia. - -- Permit slurmd configuration parameters to be set to new values from - previously unset values. - -* Changes in SLURM 1.3.9 -======================== - -- Fix jobs being cancelled by ctrl-C to have correct cancelled state in - accounting. - -- Slurmdbd will only cache user data, made for faster start up - -- Improved support for job steps in FRONT_END systems - -- Added support to dump and load association information in the controller - on start up if slurmdbd is unresponsive - -- BLUEGENE - Added support for sched/backfill plugin - -- sched/backfill modified to initiate multiple jobs per cycle. - -- Increase buffer size in srun to hold task list expressions. Critical - for jobs with 16k tasks or more. - -- Added support for eligible jobs and downed nodes to be sent to accounting - from the controller the first time accounting is turned on. - -- Correct srun logic to support --tasks-per-node option without task count. - -- Logic in place to handle multiple versions of RPCs within the slurmdbd. - THE SLURMDBD MUST BE UPGRADED TO THIS VERSION BEFORE UPGRADING THE - SLURMCTLD OR THEY WILL NOT TALK. - Older versions of the slurmctld will continue to talk to the new slurmdbd. - -- Add support for new job dependency type: singleton. Only one job from a - given user with a given name will execute with this dependency type. - From Matthieu Hautreux, CEA. - -- Updated contribs/python/hostlist to version 1.3: See "CHANGES" file in - that directory for details. From Kent Engström, NSC. - -- Add SLURM_JOB_NAME environment variable for jobs submitted using sbatch. - In order to prevent the job steps from all having the same name as the - batch job that spawned them, the SLURM_JOB_NAME environment variable is - ignored when setting the name of a job step from within an existing - resource allocation. - -- For use with sched/wiki2 (Moab only), set salloc's default shell based - upon the user who the job runs as rather than the user submitting the job - (user root). - -- Fix to sched/backfill when job specifies no time limit and the partition - time limit is INFINITE. - -- Validate a job's constraints (node features) at job submit or modification - time. Major re-write of resource allocation logic to support more complex - job feature requests. - -- For sched/backfill, correct logic to support job constraint specification - (e.g. node features). - -- Correct power save logic to avoid trying to wake DOWN node. From Matthieu - Hautreux, CEA. - -- Cancel a job step when one of it's nodes goes DOWN based upon the job - step's --no-kill option, by default the step is killed (previously the - job step remained running even without the --no-kill option). - -- Fix bug in logic to remove whitespace from plugstack.conf. - -- Add new configuration parameter SallocDefaultCommand to control what - shell that salloc launches by default. - -- When enforcing PrivateData configuration parameter, failures return - "Access/permission denied" rather than "Invalid user id". - -- From sbatch and srun, if the --dependency option is specified then set - the environment variable SLURM_JOB_DEPENDENCY to the same value. - -- In plugin jobcomp/filetxt, use ISO8601 formats for time by default (e.g. - YYYY-MM-DDTHH:MM:SS rather than MM/DD-HH:MM:SS). This restores the default - behavior from Slurm version 1.2. Change the value of USE_ISO8601 in - src/plusings/jobcomp/filetxt/jobcomp_filetxt.c to revert the behavior. - -- Add support for configuration option of ReturnToService=2, which will - return a DOWN to use if the node was previous set DOWN for any reason. - -- Removed Gold accounting plugin. This plugin was to be used for accounting - but has seen not been maintained and is no longer needed. If using this - please contact slurm-dev@llnl.gov. - -- When not enforcing associations and running accounting if a user - submits a job to an account that does not have an association on the - cluster the account will be changed to the default account to help - avoid trash in the accounting system. If the users default account - does not have an association on the cluster the requested account - will be used. - -- Add configuration parameter "--have-front-end" to define HAVE_FRONT_END - in config.h and run slurmd only on a front end (suitable only for SLURM - development and testing). - -* Changes in SLURM 1.3.8 -======================== - -- Added PrivateData flags for Users, Usage, and Accounts to Accounting. - If using slurmdbd, set in the slurmdbd.conf file. Otherwise set in the - slurm.conf file. See "man slurm.conf" or "man slurmdbd.conf" for details. - -- Reduce frequency of resending job kill RPCs. Helpful in the event of - network problems or down nodes. - -- Fix memory leak caused under heavy load when running with select/cons_res - plus sched/backfill. - -- For salloc, if no local command is specified, execute the user's default - shell. - -- BLUEGENE - patch to make sure when starting a job blocks required to be - freed are checked to make sure no job is running on them. If one is found - we will requeue the new job. No job will be lost. - -- BLUEGENE - Set MPI environment variables from salloc. - -- BLUEGENE - Fix threading issue for overlap mode - -- Reject batch scripts containing DOS linebreaks. - -- BLUEGENE - Added wait for block boot to salloc - -* Changes in SLURM 1.3.7 -======================== - -- Add jobid/stepid to MESSAGE_TASK_EXIT to address race condition when - a job step is cancelled, another is started immediately (before the - first one completely terminates) and ports are reused. - NOTE: This change requires that SLURM be updated on all nodes of the - cluster at the same time. There will be no impact upon currently running - jobs (they will ignore the jobid/stepid at the end of the message). - -- Added Python module to process hostslists as used by SLURM. See - contribs/python/hostlist. Supplied by Kent Engstrom, National - Supercomputer Centre, Sweden. - -- Report task termination due to signal (restored functionality present - in slurm v1.2). - -- Remove sbatch test for script size being no larger than 64k bytes. - The current limit is 4GB. - -- Disable FastSchedule=0 use with SchedulerType=sched/gang. Node - configuration must be specified in slurm.conf for gang scheduling now. - -- For sched/wiki and sched/wiki2 (Maui or Moab scheduler) disable the ability - of a non-root user to change a job's comment field (used by Maui/Moab for - storing scheduler state information). - -- For sched/wiki (Maui) add pending job's future start time to the state - info reported to Maui. - -- Improve reliability of job requeue logic on node failure. - -- Add logic to ping non-responsive nodes even if SlurmdTimeout=0. This permits - the node to be returned to use when it starts responding rather than - remaining in a non-usable state. - -- Honor HealthCheckInterval values that are smaller than SlurmdTimeout. - -- For non-responding nodes, log them all on a single line with a hostlist - expression rather than one line per node. Frequency of log messages is - dependent upon SlurmctldDebug value from 300 seconds at SlurmctldDebug<=3 - to 1 second at SlurmctldDebug>=5. - -- If a DOWN node is resumed, set its state to IDLE & NOT_RESPONDING and - ping the node immediately to clear the NOT_RESPONDING flag. - -- Log that a job's time limit is reached, but don't sent SIGXCPU. - -- Fixed gid to be set in slurmstepd when run by root - -- Changed getpwent to getpwent_r in the slurmctld and slurmd - -- Increase timeout on most slurmdbd communications to 60 secs (time for - substantial database updates). - -- Treat srun option of --begin= with a value of now without a numeric - component as a failure (e.g. "--begin=now+hours"). - -- Eliminate a memory leak associated with notifying srun of allocated - nodes having failed. - -- Add scontrol shutdown option of "slurmctld" to just shutdown the - slurmctld daemon and leave the slurmd daemons running. - -- Do not require JobCredentialPrivateKey or JobCredentialPublicCertificate - in slurm.conf if using CryptoType=crypto/munge. - -- Remove SPANK support from sbatch. - -* Changes in SLURM 1.3.6 -======================== - -- Add new function to get information for a single job rather than always - getting information for all jobs. Improved performance of some commands. - NOTE: This new RPC means that the slurmctld daemons should be updated - before or at the same time as the compute nodes in order to process it. - -- In salloc, sbatch, and srun replace --task-mem options with --mem-per-cpu - (--task-mem will continue to be accepted for now, but is not documented). - Replace DefMemPerTask and MaxMemPerTask with DefMemPerCPU, DefMemPerNode, - MaxMemPerCPU and MaxMemPerNode in slurm.conf (old options still accepted - for now, but mapped to "PerCPU" parameters and not documented). Allocate - a job's memory memory at the same time that processors are allocated based - upon the --mem or --mem-per-cpu option rather than when job steps are - initiated. - -- Altered QOS in accounting to be a list of admin defined states, an - account or user can have multiple QOS's now. They need to be defined using - 'sacctmgr add qos'. They are no longer an enum. If none are defined - Normal will be the QOS for everything. Right now this is only for use - with MOAB. Does nothing outside of that. - -- Added spank_get_item support for field S_STEP_CPUS_PER_TASK. - -- Make corrections in spank_get_item for field S_JOB_NCPUS, previously - reported task count rather than CPU count. - -- Convert configuration parameter PrivateData from on/off flag to have - separate flags for job, partition, and node data. See "man slurm.conf" - for details. - -- Fix bug, failed to load DisableRootJobs configuration parameter. - -- Altered sacctmgr to always return a non-zero exit code on error and send - error messages to stderr. - -* Changes in SLURM 1.3.5 -======================== - -- Fix processing of auth/munge authtentication key for messages originating - in slurmdbd and sent to slurmctld. - -- If srun is allocating resources (not within sbatch or salloc) and MaxWait - is configured to a non-zero value then wait indefinitely for the resource - allocation rather than aborting the request after MaxWait time. - -- For Moab only: add logic to reap defunct "su" processes that are spawned by - slurmd to load user's environment variables. - -- Added more support for "dumping" account information to a flat file and - read in again to protect data incase something bad happens to the database. - -- Sacct will now report account names for job steps. - -- For AIX: Remove MP_POERESTART_ENV environment variable, disabling - poerestart command. User must explicitly set MP_POERESTART_ENV before - executing poerestart. - -- Put back notification that a job has been allocated resources when it was - pending. - -* Changes in SLURM 1.3.4 -======================== - -- Some updates to man page formatting from Gennaro Oliva, ICAR. - -- Smarter loading of plugins (doesn't stat every file in the plugin dir) - -- In sched/backfill avoid trying to schedule jobs on DOWN or DRAINED nodes. - -- forward exit_code from step completion to slurmdbd - -- Add retry logic to socket connect() call from client which can fail - when the slurmctld is under heavy load. - -- Fixed bug when adding associations to add correctly. - -- Added support for associations for user root. - -- For Moab, sbatch --get-user-env option processed by slurmd daemon - rather than the sbatch command itself to permit faster response - for Moab. - -- IMPORTANT FIX: This only effects use of select/cons_res when allocating - resources by core or socket, not by CPU (default for SelectTypeParameter). - We are not saving a pending job's task distribution, so after restarting - slurmctld, select/cons_res was over-allocating resources based upon an - invalid task distribution value. Since we can't save the value without - changing the state save file format, we'll just set it to the default - value for now and save it in Slurm v1.4. This may result in a slight - variation on how sockets and cores are allocated to jobs, but at least - resources will not be over-allocated. - -- Correct logic in accumulating resources by node weight when more than - one job can run per node (select/cons_res or partition shared=yes|force). - -- slurm.spec file updated to avoid creating empty RPMs. RPM now *must* be - built with correct specification of which packages to build or not build. - See the top of the slurm.spec file for information about how to control - package building specification. - -- Set SLURM_JOB_CPUS_PER_NODE for jobs allocated using the srun command. - It was already set for salloc and sbatch commands. - -- Fix to handle suspended jobs that were cancelled in accounting - -- BLUEGENE - fix to only include bps given in a name from the bluegene.conf - file. - -- For select/cons_res: Fix record-keeping for core allocations when more - than one partition uses a node or there is more than one socket per node. - -- In output for "scontrol show job" change "StartTime" header to "EligibleTime" - for pending jobs to accurately describe what is reported. - -- Add more slurmdbd.conf parameters: ArchiveScript, ArchiveAge, JobPurge, and - StepPurge (not fully implemented yet). - -- Add slurm.conf parameter EnforcePartLimits to reject jobs which exceed a - partition's size and/or time limits rather than leaving them queued for a - later change in the partition's limits. NOTE: Not reported by - "scontrol show config" to avoid changing RPCs. It will be reported in - SLURM version 1.4. - -- Added idea of coordinator to accounting. A coordinator can add associations - between exsisting users to the account or any sub-account they are - coordinator to. They can also add/remove other coordinators to those - accounts. - -- Add support for Hostname and NodeHostname in slurm.conf being fully - qualified domain names (by Vijay Ramasubramanian, University of Maryland). - For more information see "man slurm.conf". - -* Changes in SLURM 1.3.3 -======================== - -- Add mpi_openmpi plugin to the main SLURM RPM. - -- Prevent invalid memory reference when using srun's --cpu_bind=cores option - (slurm-1.3.2-1.cea1.patch from Matthieu Hautreux, CEA). - -- Task affinity plugin modified to support a particular cpu bind type: cores, - sockets, threads, or none. Accomplished by setting an environment variable - SLURM_ENFORCE_CPU_TYPE (slurm-1.3.2-1.cea2.patch from Matthieu Hautreux, - CEA). - -- For BlueGene only, log "Prolog failure" once per job not once per node. - -- Reopen slurmctld log file after reconfigure or SIGHUP is received. - -- In TaskPlugin=task/affinity, fix possible infinite loop for slurmd. - -- Accounting rollup works for mysql plugin. Automatic rollup when using - slurmdbd. - -- Copied job stat logic out of sacct into sstat in the future sacct -stat - will be deprecated. - -- Correct sbatch processing of --nice option with negative values. - -- Add squeue formatted print option %Q to print a job's integer priority. - -- In sched/backfill, fix bug that was changing a pending job's shared value - to zero (possibly changing a pending job's resource requirements from a - processor on some node to the full node). - -* Changes in SLURM 1.3.2 -======================== - -- Get --ntasks-per-node option working for sbatch command. - -- BLUEGENE: Added logic to give back a best block on overlapped mode - in test_only mode - -- BLUEGENE: Updated debug info and man pages for better help with the - numpsets option and to fail correctly with bad image request for building - blocks. - -- In sched/wiki and sched/wiki2 properly support Slurm license consumption - (job state reported as "Hold" when required licenses are not available). - -- In sched/wiki2 JobWillRun command, don't return an error code if the job(s) - can not be started at that time. Just return an error message (from - Doug Wightman, CRI). - -- Fix bug if sched/wiki or sched/wiki2 are configured and no job comment is - set. - -- scontrol modified to report partition partition's "DisableRootJobs" value. - -- Fix bug in setting host address for PMI communications (mpich2 only). - -- Fix for memory size accounting on some architectures. - -- In sbatch and salloc, change --dependency's one letter option from "-d" - to "-P" (continue to accept "-d", but change the documentation). - -- Only check that task_epilog and task_prolog are runable by the job's - user, not as root. - -- In sbatch, if specifying an alternate directory (--workdir/-D), then - input, output and error files are in that directory rather than the - directory from which the command is executed - -- NOTE: Fully operational with Moab version 5.2.3+. Change SUBMITCMD in - moab.cfg to be the location of sbatch rather than srun. Also set - HostFormat=2 in SLURM's wiki.conf for improved performance. - -- NOTE: We needed to change an RPC from version 1.3.1. You must upgrade - all nodes in a cluster from v1.3.1 to v1.3.2 at the same time. - -- Postgres plugin will work from job accounting, not for association - management yet. - -- For srun/sbatch --get-user-env option (Moab use only) look for "env" - command in both /bin and /usr/sbin (for Suse Linux). - -- Fix bug in processing job feature requests with node counts (could fail - to schedule job if some nodes have not associated features). - -- Added nodecnt and gid to jobcomp/script - -- Insure that nodes select in "srun --will-run" command or the equivalent in - sched/wiki2 are in the job's partition. - -- BLUGENE - changed partition Min|MaxNodes to represent c-node counts - instead of base partitions - -- In sched/gang only, prevent possible invalid memory reference when - slurmctld is reconfigured, e.g. "scontrol reconfig". - -- In select/linear only, prevent invalid memory reference in log message when - nodes are added to slurm.conf and then "scontrol reconfig" is executed. - -* Changes in SLURM 1.3.1 -======================== - -- Correct logic for processing batch job's memory limit enforcement. - -- Fix bug that was setting a job's requeue value on any update of the - job using the "scontrol update" command. The invalid value of an - updated job prevents it's recovery when slurmctld restarts. - -- Add support for cluster-wide consumable resources. See "Licenses" - parameter in slurm.conf man page and "--licenses" option in salloc, - sbatch and srun man pages. - -- Major changes in select/cons_res to support FastSchedule=2 with more - resources configured than actually exist (useful for testing purposes). - -- Modify srun --test-only response to include expected initiation time - for a job as well as the nodes to be allocated and processor count - (for use by Moab). - -- Correct sched/backfill to properly honor job dependencies. - -- Correct select/cons_res logic to allocate CPUs properly if there is - more than one thread per core (previously failed to allocate all cores). - -- Correct select/linear logic in shared job count (was off by 1). - -- Add support for job preeption based upon partition priority (in sched/gang, - preempt.patch from Chris Holmes, HP). - -- Added much better logic for mysql accounting. - -- Finished all basic functionality for sacctmgr. - -- Added load file logic to sacctmgr for setting up a cluster in one step. - -- NOTE: We needed to change an RPC from version 1.3.0. You must upgrade - all nodes in a cluster from v1.3.0 to v1.3.1 at the same time. - -- NOTE: Work is currently underway to improve placement of jobs for gang - scheduling and preemption. - -- NOTE: Work is underway to provide additional tools for reporting - accounting information. - -* Changes in SLURM 1.3.0 -======================== - -- In sched/wiki2, add processor count to JOBWILLRUN response. - -- Add event trigger for node entering DRAINED state. - -- Build properly without OpenSSL installed (OpenSSL is recommended, but not - required). - -- Added slurmdbd, and modified accounting_storage plugin to talk to it. - Allowing multiple slurm systems to securly store and gather information - not only about jobs, but the system also. See accounting web page for more - information. - -* Changes in SLURM 1.3.0-pre11 -============================== - -- Restructure the sbcast RPC to take advantage of larger buffers available - in Slurm v1.3 RPCs. - -- Fix several memory leaks. - -- In scontrol, show job's Requeue value, permit change of Requeue and Comment - values. - -- In slurmctld job record, add QOS (quality of service) value for accounting - purposes with Maui and Moab. - -- Log to a job's stderr when it is being cancelled explicitly or upon reaching - it's time limit. - -- Only permit a job's account to be changed while that job is PENDING. - -- Fix race condition in job suspend/resume (slurmd.sus_res.patch from HP). - -* Changes in SLURM 1.3.0-pre10 -============================== - -- Add support for node-specific "arch" (architecture) and "os" (operating - system) fields. These fields are set based upon values reported by the - slurmd daemon on each compute node using SLURM_ARCH and SLURM_OS environment - variables (if set, the uname function otherwise) and are intended to support - changes in real time changes in operating system. These values are reported - by "scontrol show node" plus the sched/wiki and sched/wiki2 plugins for Maui - and Moab respectively. - -- In sched/wiki and sched/wiki2: add HostFormat and HidePartitionJobs to - "scontrol show config" SCHEDULER_CONF output. - -- In sched/wiki2: accept hostname expression as input for GETNODES command. - -- Add JobRequeue configuration parameter and --requeue option to the sbatch - command. - -- Add HealthCheckInterval and HealthCheckProgram configuration parameters. - -- Add SlurmDbdAddr, SlurmDbdAuthInfo and SlurmDbdPort configuration parameters. - -- Modify select/linear to achieve better load leveling with gang scheduler. - -- Develop the sched/gang plugin to support select/linear and - select/cons_res. If sched/gang is enabled and Shared=FORCE is configured - for a partition, this plugin will gang-schedule or "timeslice" jobs that - share common resources within the partition. Note that resources that are - shared across partitions are not gang-scheduled. - -- Add EpilogMsgTime configuration parameter. See "man slurm.conf" for details. - -- Increase default MaxJobCount configuration parameter from 2000 to 5000. - -- Move all database common files from src/common to new lib in src/database. - -- Move sacct to src/accounting added sacctmgr for scontrol like operations - to accounting infrastructure. - -- Basic functions of sacctmgr in place to make for administration of - accounting. - -- Moved clusteracct_storage plugin to accounting_storage plugin, - jobacct_storage is still it's own plugin for now. - -- Added template for slurm php extention. - -- Add infrastructure to support allocation of cluster-wide licenses to jobs. - Full support will be added some time after version 1.3.0 is released. - -- In sched/wiki2 with select/bluegene, add support for WILLRUN command - to accept multiple jobs with start time specifications. - -* Changes in SLURM 1.3.0-pre9 -============================= - -- Add spank support to sbatch. Note that spank_local_user() will be called - with step_layout=NULL and gid=SLURM_BATCH_SCRIPT and spank_fini() will - be called immediately afterwards. - -- Made configure use mysql_config to find location of mysql database install - Removed bluegene specific information from the general database tables. - -- Re-write sched/backfill to utilize new will-run logic in the select - plugins. It now supports select/cons_res and all job options (required - nodes, excluded nodes, contiguous, etc.). - -- Modify scheduling logic to better support overlapping partitions. - -- Add --task-mem option and remove --job-mem option from srun, salloc, and - sbatch commands. Enforce step memory limit, if specified and there is - no job memory limit specified (--mem). Also see DefMemPerTask and - MaxMemPerTask in "man slurm.conf". Enforcement is dependent upon job - accounting being enabled with non-zero value for JoabAcctGatherFrequency. - -- Change default node tmp_disk size to zero (for diskless nodes). - -* Changes in SLURM 1.3.0-pre8 -============================= - -- Modify how strings are packed in the RPCs, Maximum string size - increased from 64KB (16-bit size field) to 4GB (32-bit size field). - -- Fix bug that prevented time value of "INFINITE" from being processed. - -- Added new srun/sbatch option "--open-mode" to control how output/error - files are opened ("t" for truncate, "a" for append). - -- Added checkpoint/xlch plugin for use with XLCH (Hongjia Cao, NUDT). - -- Added srun option --checkpoint-path for use with XLCH (Hongjia Cao, NUDT). - -- Added new srun/salloc/sbatch option "--acctg-freq" for user control over - accounting data collection polling interval. - -- In sched/wiki2 add support for hostlist expression use in GETNODES command - with HostFormat=2 in the wiki.conf file. - -- Added new scontrol option "setdebug" that can change the slurmctld daemons - debug level at any time (Hongjia Cao, NUDT). - -- Track total total suspend time for jobs and steps for accounting purposes. - -- Add version information to partition state file. - -- Added 'will-run' functionality to all of the select plugins (bluegene, - linear, and cons_res) to return node list and time job can start based - on other jobs running. - -- Major restructuring of node selection logic. select/linear now supports - partition max_share parameter and tries to match like size jobs on the - same nodes to improve gang scheduling performance. Also supports treating - memory as consumable resource for job preemption and gang scheduling if - SelectTypeParameter=CR_Memory in slurm.conf. - -- BLUEGENE: Reorganized bluegene plugin for maintainability sake. - -- Major restructuring of data structures in select/cons_res. - -- Support job, node and partition names of arbitrary size. - -- Fix bug that caused slurmd to hang when using select/linear with - task/affinity. - -* Changes in SLURM 1.3.0-pre7 -============================= - -- Fix a bug in the processing of srun's --exclusive option for a job step. - -* Changes in SLURM 1.3.0-pre6 -============================= - -- Add support for configurable number of jobs to share resources using the - partition Shared parameter in slurm.conf (e.g. "Shared=FORCE:3" for two - jobs to share the resources). From Chris Holmes, HP. - -- Made salloc use api instead of local code for message handling. - -* Changes in SLURM 1.3.0-pre5 -============================= - -- Add select_g_reconfigure() function to node changes in slurmctld configuration - that can impact node scheduling. - -- scontrol to set/get partition's MaxTime and job's Timelimit in minutes plus - new formats: min:sec, hr:min:sec, days-hr:min:sec, days-hr, etc. - -- scontrol "notify" command added to send message to stdout of srun for - specified job id. - -- For BlueGene, make alpha part of node location specification be case insensitive. - -- Report scheduler-plugin specific configuration information with the - "scontrol show configuration" command on the SCHEDULER_CONF line. This - information is not found in the "slurm.conf" file, but a scheduler plugin - specific configuration (e.g. "wiki.conf"). - -- sview partition information reported now includes partition priority. - -- Expand job dependency specification to support concurrent execution, - testing of job exit status and multiple job IDs. - -* Changes in SLURM 1.3.0-pre4 -============================= - -- Job step launch in srun is now done from the slurm api's all further - modifications to job launch should be done there. - -- Add new partition configuration parameter Priority. Add job count to - Shared parameter. - -- Add new configuration parameters DefMemPerTask, MaxMemPerTask, and - SchedulerTimeSlice. - -- In sched/wiki2, return REJMESSAGE with details on why a job was - requeued (e.g. what node failed). - -* Changes in SLURM 1.3.0-pre3 -============================= - -- Remove slaunch command - -- Added srun option "--checkpoint=time" for job step to automatically be - checkpointed on a period basis. - -- Change behavior of "scancel -s KILL <jobid>" to send SIGKILL to all job - steps rather than cancelling the job. This now matches the behavior of - all other signals. "scancel <jobid>" still cancels the job and all steps. - -- Add support for new job step options --exclusive and --immediate. Permit - job steps to be queued when resources are not available within an existing - job allocation to dedicate the resources to the job step. Useful for - executing simultaneous job steps. Provides resource management both at - the level of jobs and job steps. - -- Add support for feature count in job constraints, for example - srun --nodes=16 --constraint=graphics*4 ... - Based upon work by Kumar Krishna (HP, India). - -- Add multi-core options to salloc and sbatch commands (sbatch.patch and - cleanup.patch from Chris Holmes, HP). - -- In select/cons_res properly release resources allocated to job being - suspended (rmbreak.patch, from Chris Holmes, HP). - -- Removed database and jobacct plugin replaced with jobacct_storage - and jobacct_gather for easier hooks for further expansion of the - jobacct plugin. - -* Changes in SLURM 1.3.0-pre2 -============================= - -- Added new srun option --pty to start job with pseudo terminal attached - to task 0 (all other tasks have I/O discarded) - -- Disable user specifying jobid when sched/wiki2 configured (needed for - Moab releases until early 2007). - -- Report command, args and working directory for batch jobs with - "scontrol show job". - -* Changes in SLURM 1.3.0-pre1 -============================= - -- !!! SRUN CHANGES !!! - The srun options -A/--allocate, -b/--batch, and -a/--attach have been - removed! That functionality is now available in the separate commands - salloc, sbatch, and sattach, respectively. - -- Add new node state FAILING plus trigger for when node enters that state. - -- Add new configuration parameter "PrivateData". This can be used to - prevent a user from seeing jobs or job steps belonging to other users. - -- Added configuration parameters for node power save mode: ResumeProgram - ResumeRate, SuspendExcNodes, SuspendExcParts, SuspendProgram and - SuspendRate. - -- Slurmctld maintains the IP address (rather than hostname) for srun - communications. This fixes some possible network routing issues. - -- Added global database plugin. Job accounting and Job completion are the - first to use it. Follow documentation to add more to the plugin. - -- Removed no-longer-needed jobacct/common/common_slurmctld.c since that is - replaced by the database plugin. - -- Added new configuration parameter: CryptoType. - Moved existing digital signature logic into new plugin: crypto/openssl. - Added new support for crypto/munge (available with GPL license). - -* Changes in SLURM 1.2.36 -========================= - -- For spank_get_item(S_JOB_ARGV) for batch job with script input via STDIN, - set argc value to 1 (rather than 2, argv[0] still set to path of generated - script). - -- sacct will now display more properly allocations made with salloc with only - one step. - -* Changes in SLURM 1.2.35 -========================= - -- Permit SPANK plugins to dynamically register options at runtime base upon - configuration or other runtime checks. - -- Add "include" keywork to SPANK plugstack.conf file to optionally include - other configuration files or directories of configuration files. - -- Srun to wait indefinitely for resource allocation to be made. Used to - abort after two minutes. - -* Changes in SLURM 1.2.34 -========================= - -- Permit the cancellation of a job that is in the process of being - requeued. - -- Ignore the show_flag when getting job, step, node or partition information - for user root. - -- Convert some functions to thread-safe versions: getpwnam, getpwuid, - getgrnam, and getgrgid to similar functions with "_r" suffix. While no - failures have been observed, a race condition would in the worst case - permit a user access to a partition not normally allowed due to the - AllowGroup specification or the wrong user identified in an accounting - record. The job would NOT be run as the wrong user. - -- For PMI only (MPICH2/MVAPICH2) base address to send messages to (the srun) - upon the address from which slurmd gets the task launch request rather then - "hostname" where srun executes. - -- Make test for StateSaveLocation directory more comprehensive. - -- For jobcomp/script plugin, PROCS environment variable is now the actual - count of allocated processors rather than the count of processes to - be started. - -* Changes in SLURM 1.2.33 -========================= - -- Cancelled or Failed jobs will now report their job and step id on exit - -- Add SPANK items available to get: SLURM_VERSION, SLURM_VERSION_MAJOR, - SLURM_VERISON_MINOR and SLURM_VERSION_MICRO. - -- Fixed handling of SIGPIPE in srun. Abort job. - -- Fix bug introduced to MVAPICH plugin preventing use of TotalView debugger. - -- Modify slurmctld to get srun/salloc network address based upon the incoming - message rather than hostname set by the user command (backport of logic in - SLURM v1.3). - -* Changes in SLURM 1.2.32 -========================= - -- LSF only: Enable scancel of job in RootOnly partition by the job's owner. - -- Add support for sbatch --distribution and --network options. - -- Correct pending job's wait reason to "Priority" rather than "Resources" if - required resources are being held in reserve for a higher priority job. - -- In sched/wiki2 (Moab) report a node's state as "Drained" rather than - "Draining" if it has no allocated work (An undocumented Moab wiki option, - see CRI ticket #2394). - -- Log to job's output when it is cancelled or reaches it's time limit (ported - from existing code in slurm v1.3). - -- Add support in salloc and sbatch commands for --network option. - -- Add support for user environment variables that include '\n' (e.g. - bash functions). - -- Partial rewrite of mpi/mvapich plugin for improved scalability. - -* Changes in SLURM 1.2.31 -========================= - -- For Moab only: If GetEnvTimeout=0 in slurm.conf then do not run "su" to get - the user's environment, only use the cache file. - -- For sched/wiki2 (Moab), treat the lack of a wiki.conf file or the lack - of a configured AuthKey as a fatal error (lacks effective security). - -- For sched/wiki and sched/wiki2 (Maui or Moab) report a node's state as - Busy rather than Running when allocated if SelectType=select/linear. Moab - was trying to schedule job's on nodes that were already allocated to jobs - that were hidden from it via the HidePartitionJobs in Slurm's wiki.conf. - -- In select/cons_res improve the resource selection when a job has specified - a processor count along with a maximum node count. - -- For an srun command with --ntasks-per-node option and *no* --ntasks count, - spawn a task count equal to the number of nodes selected multiplied by the - --ntasks-per-node value. - -- In jobcomp/script: Set TZ if set in slurmctld's environment. - -- In srun with --verbose option properly format CPU allocation information - logged for clusters with 1000+ nodes and 10+ CPUs per node. - -- Process a job's --mail_type=end option on any type of job termination, not - just normal completion (e.g. all failure modes too). - -* Changes in SLURM 1.2.30 -========================= - -- Fix for gold not to print out 720 error messages since they are - potentally harmful. - -- In sched/wiki2 (Moab), permit changes to a pending job's required features: - CMD=CHANGEJOB ARG=<jobid> RFEATURES=<features> - -- Fix for not aborting when node selection doesn't load, fatal error instead - -- In sched/wiki and sched/wiki2 DO NOT report a job's state as "Hold" if it's - dependencies have not been satisfied. This reverses a changed made in SLURM - version 1.2.29 (which was requested by Cluster Resources, but places jobs - in a HELD state indefinitely). - -* Changes in SLURM 1.2.29 -========================= - -- Modified global configuration option "DisableRootJobs" from number (0 or 1) - to boolean (YES or NO) to match partition parameter. - -- Set "DisableRootJobs" for a partition to match the global parameters value - for newly created partitions. - -- In sched/wiki and sched/wiki2 report a node's updated features if changed - after startup using "scontrol update ..." command. - -- In sched/wiki and sched/wiki2 report a job's state as "Hold" if it's - dependencies have not been satisfied. - -- In sched/wiki and sched/wiki2 do not process incoming requests until - slurm configuration is completely loaded. - -- In sched/wiki and sched/wiki2 do not report a job's node count after it - has completed (slurm decrements the allocated node count when the nodes - transition from completing to idle state). - -- If job prolog or epilog fail, log the program's exit code. - -- In jobacct/gold map job names containing any non-alphanumeric characters - to '_' to avoid MySQL parsing problems. - -- In jobacct/linux correct parsing if command name contains spaces. - -- In sched/wiki and sched/wiki2 report make job info TASK count reflect the - actual task allocation (not requested tasks) even after job terminates. - Useful for accounting purposes only. - -* Changes in SLURM 1.2.28 -========================= - -- Added configuration option "DisableRootJobs" for parameter - "PartitionName". See "man slurm.conf" for details. - -- Fix for faking a large system to correctly handle node_id in the task - afffinity plugin for ia64 systems. - -* Changes in SLURM 1.2.27 -========================= - -- Record job eligible time in accounting database (for jobacct/gold only). - -- Prevent user root from executing a job step within a job allocation - belonging to another user. - -- Fixed limiting issue for strings larger than 4096 in xstrfmtcat - -- Fix bug in how Slurm reports job state to Maui/Moab when a job is requeued - due to a node failure, but we can't terminate the job's spawned processes. - Job was being reported as PENDING when it was really still COMPLETING. - -- Added patch from Jerry Smith for qstat -a output - -- Fixed looking at the correct perl path for Slurm.pm in torque wrappers. - -- Enhance job requeue on node failure to be more robust. - -- Added configuration parameter "DisableRootJobs". See "man slurm.conf" - for details. - -- Fixed issue with account = NULL in Gold job accounting plugin - -* Changes in SLURM 1.2.26 -========================= - -- Correct number of sockets/cores/threads reported by slurmd (from - Par Andersson, National Supercomputer Centre, Sweden). - -- Update libpmi linking so that libslurm is not required for PMI use - (from Steven McDougal, SiCortex). - -- In srun and sbatch, do not check the PATH env var if an absolute pathname - of the program is specified (previously reported an error if no PATH). - -- Correct output of "sinfo -o %C" (CPU counts by node state). - -* Changes in SLURM 1.2.25 -========================= - -- Bug fix for setting exit code in accounting for batch script. - -- Add salloc option, --no-shell (for LSF). - -- Added new options for sacct output - -- mvapich: Ensure MPIRUN_ID is unique for all job steps within a job. - (Fixes crashes when running multiple job steps within a job on one node) - -- Prevent "scontrol show job" from failing with buffer overflow when a job - has a very long Comment field. - -- Make certain that a job step is purged when a job has been completed. - Previous versions could have the job step persist if an allocated node - went DOWN and the slurmctld restarted. - -- Fix bug in sbcast that can cause communication problems for large files. - -- Add sbcast option -t/--timeout and SBCAST_TIMEOUT environment variable - to control message timeout. - -- Add threaded agent to manage a queue of Gold update requests for - performance reasons. - -- Add salloc options --chdir and --get-user-env (for Moab). - -- Modify scontrol update to support job comment changes. - -- Do not clear a DRAINED node's reason field when slurmctld restarts. - -- Do not cancel a pending job if Moab or Maui try to start it on unusable nodes. - Leave the job queued. - -- Add --requeue option to srun and sbatch (these undocumented options have no - effect in slurm v1.2, but are legitimate options in slurm v1.3). - -* Changes in SLURM 1.2.24 -========================= - -- In sched/wiki and sched/wiki2, support non-zero UPDATE_TIME specification - for GETNODES and GETJOBS commands. - -- Bug fix for sending accounting information multiple times for same - info. patch from Hongjia Cao (NUDT). - -- BLUEGENE - try FILE pointer rotation logic to avoid core dump on - bridge log rotate - -- Spread out in time the EPILOG_COMPLETE messages from slurmd to slurmctld - to avoid message congestions and retransmission. - -* Changes in SLURM 1.2.23 -========================= - -- Fix for libpmi to not export unneeded variables like xstr* - -- BLUEGENE - added per partition dynamic block creation - -- fix infinite loop bug in sview when there were multiple partitions - -- Send message to srun command when a job is requeued due to node failure. - Note this will be overwritten in the output file unless JobFileAppend - is set in slurm.conf. In slurm version 1.3, srun's --open-mode=append - option will offer this control for each job. - -- Change a node's default TmpDisk from 1MB to 0MB and change job's default - disk space requirement from 1MB to 0MB. - -- In sched/wiki (Maui scheduler) specify a QOS (quality of service) by - specifying an account of the form "qos-name". - -- In select/linear, fix bug in scheduling required nodes that already have - a job running on them (req.load.patch from Chris Holmes, HP). - -- For use with Moab only: change timeout for srun/sbatch --get-user-env - option to 2 secs, don't get DISPLAY environment variables, but explicitly - set ENVIRONMENT=BATCH and HOSTNAME to the execution host of the batch script. - -- Add configuration parameter GetEnvTimeout for use with Moab. See - "man slurm.conf" for details. - -- Modify salloc and sbatch to accept both "--tasks" and "--ntasks" as - equivalent options for compatibility with srun. - -- If a partition's node list contains space separators, replace them with - commas for easier parsing. - -- BLUEGENE - fixed bug in geometry specs when creating a block. - -- Add support for Moab and Maui to start jobs with select/cons_res plugin - and jobs requiring more than one CPU per task. - -* Changes in SLURM 1.2.22 -========================= - -- In sched/wiki2, add support for MODIFYJOB option "MINSTARTTIME=<time>" - to modify a job's earliest start time. - -- In sbcast, fix bug with large files and causing sbcast to die. - -- In sched/wiki2, add support for COMMENT= option in STARTJOB and CANCELJOB - commands. - -- Avoid printing negative job run time in squeue due to clock skew. - -- In sched/wiki and sched/wiki2, add support for wiki.conf option - HidePartitionJobs (see man pages for details). - -- Update to srun/sbatch --get-user-env option logic (needed by Moab). - -- In slurmctld (for Moab) added job->details->reserved_resources field - to report resources that were kept in reserve for job while it was - pending. - -- In sched/wiki (for Maui scheduler) report a pending job's node feature - requirements (from Miguel Roa, BSC). - -- Permit a user to change a pending job's TasksPerNode specification - using scontrol (from Miguel Roa, BSC). - -- Add support for node UP/DOWN event logging in jobacct/gold plugin - WARNING: using the jobacct/gold plugin slows the system startup set the - MessageTimeout variable in the slurm.conf to around 20+. - -- Added check at start of slurmctld to look for /tmp/slurm_gold_first if - there, and using the gold plugin slurm will make record of all nodes in - downed or drained state. - -* Changes in SLURM 1.2.21 -========================= - -- Fixed torque wrappers to look in the correct spot for the perl api - -- Do not treat user resetting his time limit to the current value as - an error. - -- Set correct executable names for Totalview when --multi-prog option - is used and more than one node is allocated to the job step. - -- When a batch job gets requeued, record in accounting logs that - the job was cancelled, the requeued job's submit time will be - set to the time of its requeue so it looks like a different job. - -- Prevent communication problems if the slurmd/slurmstepd have a - different JobAcct plugin configured than slurmctld. - -- Adding Gold plugin for job accounting - -- In sched/wiki2, add support for MODIFYJOB option "JOBNAME=<name>" - to modify a job's name. - -- Add configuration check for sys/syslog.h and include it as needed. - -- Add --propagate option to sbatch for control over limit propagation. - -- Added Gold interface to the jobacct plugin. To configure in the config - file specify... - JobAcctType=jobacct/gold - JobAcctLogFile=CLUSTER_NAME:GOLD_AUTH_KEY_FILE:GOLDD_HOST:GOLDD_PORT7112 - -- In slurmctld job record, set begin_time to time when all of a job's - dependencies are met. - -* Changes in SLURM 1.2.20 -========================= - -- In switch/federation, fix small memory leak effecting slurmd. - -- Add PMI_FANOUT_OFF_HOST environment variable to control how message - forwarding is done for PMI (MPICH2). See "man srun" for details. - -- From sbatch set SLURM_NTASKS_PER_NODE when --ntasks-per-node option is - specified. - -- BLUEGENE: Documented the prefix should always be lower case and the 3 - digit suffix should be uppercase if any letters are used as digits. - -- In sched/wiki and sched/wiki2, add support for --cpus-per-task option. - From Miguel Ros, BSC. - -- In sched/wiki2, prevent invalid memory pointer (and likely seg fault) - for job associated with a partition that has since been deleted. - -- In sched/wiki2 plus select/cons_res, prevent invalid memory pointer - (and likely seg fault) when a job is requeued. - -- In sched/wiki, add support for job suspend, resume, and modify. - -- In sched/wiki, add suppport for processor allocation (not just node allocation) - with layout control. - -- Prevent re-sending job termination RPC to a node that has already completed - the job. Only send it to specific nodes which have not reported completion. - -- Support larger environment variables 64K instead of BUFSIZ (8k on some - systems). - -- If a job is being requeued, job step create requests will print a - warning and repeatedly retry rather than aborting. - -- Add optional mode value to srun and sbatch --get-user-env option. - -- Print error message and retry job submit commands when MaxJobCount - is reached. From Don Albert, Bull. - -- Treat invalid begin time specification as a fatal error in sbatch and - srun. From Don Albert, Bull. - -- Validate begin time specification to avoid hours >24, minutes >59, etc. - -* Changes in SLURM 1.2.19 -========================= -*** NOTE IMPORTANT CHANGE IN RPM BUILD BELOW **** - -- slurm.spec file (used to build RPMs) was updated in order to support Mock, a - chroot build environment. See https://hosted.fedoraproject.org/projects/mock/ - for more information. The following RPMs are no longer build by default: - aix-federation, auth_none, authd, bluegene, sgijob, and switch-elan. Change - the RPMs built using the following options in ~/rpmmacros: "%_with_authd 1", - "%_without_munge 1", etc. See the slurm.spec file for more details. - -- Print warning if non-privileged user requests negative "--nice" value on - job submission (srun, salloc, and sbatch commands). - -- In sched/wiki and sched/wiki2, add support for srun's --ntasks-per-node - option. - -- In select/bluegene with Groups defined for Images, fix possible memory - corruption. Other configurations are not affected. - -- BLUEGENE - Fix bug that prevented user specification of linux-image, - mloader-image, and ramdisk-image on job submission. - -- BLUEGENE - filter Groups specified for image not just by submitting - user's current group, but all groups the user has access to. - -- BLUEGENE - Add salloc options to specify images to be loaded (--blrts-image, - --linux-image, --mloader-image, and --ramdisk-image). - -- BLUEGENE - In bluegene.conf, permit Groups to be comma separated in addition - to colon separators previously supported. - -- sbatch will accept batch script containing "#SLURM" options and advise - changed to "#SBATCH". - -- If srun --output or --error specification contains a task number rather - than a file name, send stdout/err from specified task to srun's stdout/err - rather than to a file by the same name as the task's number. - -- For srun --multi-prog option, verify configuration file before attempting - to launch tasks, report clear explanation of any configuration file errors. - -- For sched/wiki2, add optional timeout option to srun's --get-user-env - parameter, change default timeout for "su - <user> env" from 3 to 8 seconds. - On timeout, attempt to load env from file at StateSaveLocation/env_cache/<user>. - The format of this file is the same as output of "env" command. If there - is no env cache file, then abort the request. - -- squeue modified for completing job to remove nodes that have already - completed the job before applying node filter logic. - -- squeue formatted output option added for job comment, "%q" (the obvious - choices for letters are already in use). - -- Added configure option --enable-load-env-no-login for use with Moab. If - set then the user job runs with the environment built without a login - ("su <user> env" rather than "su - <user> env"). - -- Fix output of "srun -o %C" (allocated CPU count) for running jobs. This was - broken in 1.2.18 for handling requeue of Moab jobs. - -- Added logic to mpiexec wrapper to read in the MPIEXEC_TIMEOUT var - -- Updated qstat wrapper to display information for partitions (-Q) option - -- NOTE: SLURM should now work directly with Globus using the PBS GRAM. - -* Changes in SLURM 1.2.18 -========================= - -- BLUEGENE - bug fix for smap stating passthroughs are used when they aren't - -- Fixed bug in sview to be able to edit partitions correctly - -- Fixed bug so in slurm.conf files where SlurmdPort isn't defined things - work correctly. - -- In sched/wiki2 and sched/wiki add support for batch job being requeued - in Slurm either when nodes fail or upon request. - -- In sched/wiki2 and sched/wiki with FastSchedule=2 configured and nodes - configured with more CPUs than actually exist, return a value of TASKS - equal to the number of configured CPUs that are allocated to a job rather - than the number of physical CPUs allocated. - -- For sched/wiki2, timeout "srun --get-user-env ..." command after 3 seconds - if unable to perform pseudo-login and get user environment variables. - -- Add contribs/time_login.c program to test how long pseudo-login takes - for specific users or all users. This can identify users for which Moab - job submissions are unable to set the proper environment variables. - -- Fix problem in parallel make of Slurm. - -- Fixed bug in consumable resources when CR_Core_Memory is enabled - -- Add delay in slurmctld for "scontrol shutdown" RPC to get propagated - to slurmd daemons. - -* Changes in SLURM 1.2.17 -========================= - -- In select/cons_res properly release resources allocated to job being - suspended (rmbreak.patch, from Chris Holmes, HP). - -- Fix AIX linking problem for PMI (mpich2) support. - -- Improve PMI logic for greater scalability (up to 16k tasks run). - -- Add srun support for SLURM_THREADS and PMI_FANOUT environment variables. - -- Fix support in squeue for output format with left justification of - reason (%r) and reason/node_list (%R) output. - -- Automatically requeue a batch job when a node allocated to it fails - or the prolog fails (unless --no-requeue or --no-kill option used). - -- In sched/wiki, enable use of wiki.conf parameter ExcludePartitions to - directly schedule selected partitions without Maui control. - -- In sched/backfill, if a job requires specific nodes, schedule other jobs - ahead of it rather than completely stopping backfill scheduling for that - partition. - -- BLUEGENE - corrected logic making block allocation work in a circular - fashion instead of linear. - -* Changes in SLURM 1.2.16 -========================= - -- Add --overcommit option to the salloc command. - -- Run task epilog from job's working directory rather than directory - where slurmd daemon started from. - -- Log errors running task prolog or task epilog to srun's output. - -- In sched/wiki2, fix bug processing condensed hostlist expressions. - -- Release contribs/mpich1.slurm.patch without GPL license. - -- Fix bug in mvapich plugin for read/write calls that return EAGAIN. - -- Don't start MVAPICH timeout logic until we know that srun is starting - an MVAPICH program. - -- Fix to srun only allocating number of nodes needed for requested task - count when combining allocation and step creation in srun. - -- Execute task-prolog within proctrack container to insure that all - child processes get terminated. - -- Fixed job accounting to work with sgi_job proctrack plugin. - -* Changes in SLURM 1.2.15 -========================= - -- In sched/wiki2, fix bug processing hostlist expressions where hosts - lack a numeric suffix. - -- Fix bug in srun. When user did not specify time limit, it defaulted to - INFINITE rather than partition's limit. - -- In select/cons_res with SelectTypeParameters=CR_Socket_Memory, fix bug in - memory allocation tracking, mem.patch from Chris Holmes, HP. - -- Add --overcommit option to the sbatch command. - -* Changes in SLURM 1.2.14 -========================= - -- Fix a couple of bugs in MPICH/MX support (from Asier Roa, BSC). - -- Fix perl api for AIX - -- Add wiki.conf parameter ExcludePartitions for selected partitions to - be directly schedule by Slurm without Moab control - -- Optimize load leveling for shared nodes (alloc.patch, contributed - by Chris Holmes, HP). - -- Added PMI_TIME environment variable for user to control how PMI - communications are spread out in time. See "man srun" for details. - -- Added PMI timing information to srun debug mode to aid in tuning. - Use "srun -vv ..." to see the information. - -- Added checkpoint/ompi (OpenMPI) plugin (still under development). - -- Fix bug in load leveling logic added to v1.2.13 which can cause an - infinite loop and hang slurmctld when sharing nodes between jobs. - -- Added support for sbatch to read in #PBS options from a script - -* Changes in SLURM 1.2.13 -========================= - -- Add slurm.conf parameter JobFileAppend. - -- Fix for segv in "scontrol listpids" on nodes not in SLURM config. - -- Add support for SCANCEL_CTLD env var. - -- In mpi/mvapich plugin, add startup timeout logic. Time based upon - SLURM_MVAPICH_TIMEOUT (value in seconds). - -- Fixed pick_step_node logic to only pick the number of nodes requested - from the user when excluding nodes, to avoid an error message. - -- Disable salloc, sbatch and srun -I/--immediate options with - Moab scheduler. - -- Added "contribs" directory with a Perl API and Torque wrappers for Torque - to SLURM migration. This directory should be used to put anything that - is outside of SLURM proper such as a different API. Perl APIs contributed - by Hongjia Cao (NUDT). - -- In sched/wiki2: add support for tasklist with node name expressions - and task counts (e.g. TASKLIST=tux[1-4]*2:tux[12-14]*4"). - -- In select/cons_res with sched/wiki2: fix bug in task layout logic. - -- Removed all curses info from the bluegene plugin putting it into smap - where it belongs. - -- Add support for job time limit specification formats: min, min:sec, - hour:min:sec, and days-hour:min:sec (formerly only supported minutes). - Applies to salloc, sbatch, and srun commands. - -- Improve scheduling support for exclusive constraint list, nodes can - now be in more than one constraint specific exclusively for a job - (e.g. "srun -C [rack1|rack2|rack3|rowB] srun") - -- Create separate MPICH/MX plugin (split out from MPICH/GM plugin) - -- Increase default MessageTimeout (in slurm.conf) from 5 to 10 secs. - -- Fix bug in batch job requeue if node zero of allocation fails to respond - to task launch request. - -- Improve load leveling logic to more evenly distribute the workload - (best_load.patch, contributed by Chris Holmes, HP). - -* Changes in SLURM 1.2.12 -========================= - -- Increase maximum message size from 1MB to 16MB (from Ernest Artiaga, BSC). - -- In PMI_Abort(), log the event and abort the entire job step. - -- Add support for additional PMI functions: PMI_Get_clique_ranks and - PMI_Get_clique_size (from Chuck Clouston, Bull). - -- Report an error when a hostlist comes in appearing to be a box but not - formatted in XYZxXYZ format. - -- Add support for partition configuration "Shared=exclusive". This is - equivalent to "srun --exclusive" when select/cons_res is configured. - -- In sched/wiki2, report the reason for a node being unavailable for the - GETNODES command using the CAT="<reason>" field. - -- In sched/wiki2 with select/linear, duplicate hostnames in HOSTLIST, one - per allocated processor. - -- Fix bug in scancel with specific signal and job lacks active steps. - -- In sched/wiki2, add support for NOTIFYJOB ARG=<jobid> MSG=<message>. - This sends a message to an active srun command. - -- salloc will now set SLURM_NPROCS to improve srun's behavior under salloc. - -- In sched/wiki2 and select/cons_res: insure that Slurm's CPU allocation - is identical to Moab's (from Ernest Artiaga and Asier Roa, BSC). - -- Added "scontrol show slurmd" command to status local slurmd daemon. - -- Set node DOWN if prolog fails on node zero of batch job launch. - -- Properly handle "srun --cpus-per-task" within a job allocation when - SLURM_TASKS_PER_NODE environment varable is not set. - -- Fixed return of slurm_send_rc_msg if msg->conn_fd is < 0 set errno ENOTCONN - and return SLURM_ERROR instead of return ENOTCONN - -- Added read before we send anything down a socket to make sure the socket - is still there. - -- Add slurm.conf variables UnkillableStepProgram and UnkillableStepTimeout. - -- Enable nice file propagation from sbatch command. - -* Changes in SLURM 1.2.11 -========================= - -- Updated "etc/mpich1.slurm.patch" for direct srun launch of MPICH1_P4 - tasks. See the "README" portion of the patch for details. - -- Added new scontrol command "show hostlist <hostnames>" to translate a list - of hostnames into a hostlist expression (e.g. "tux1,tux2" -> "tux[1-2]") - and "show hostnames <list>", returns a list of of nodes (one node per line) - from SLURM hostlist expression or from SLURM_NODELIST environment variable - if no hostlist specified. - -- Add the sbatch option "--wrap". - -- Add the sbatch option "--get-user-env". - -- Added support for mpich-mx (use the mpichgm plugin). - -- Make job's stdout and stderr file access rights be based upon user's umask - at job submit time. - -- Add support for additional PMI functions: PMI_Parse_option, - PMI_Args_to_keyval, PMI_Free_keyvals and PMI_Get_options (from Puenlap Lee - and Nancy Kritkausky, Bull). - -- Make default value of SchedulerPort (configuration parameter) be 7321. - -- Use SLURM_UMASK environment variable (if set) at job submit time as umask - for spawned job. - -- Correct some format issues in the man pages (from Gennero Oliva, ICAR). - -- Added support for parallel make across an existing SLURM allocation - based upon GNU make-3.81. Patch is in "etc/make.slurm.patch". - -- Added '-b' option to sbatch for easy MOAB trasition to sbatch instead of - srun. Option does nothing in sbatch. - -- Changed wiki2's handling of a node state in Completing to return 'busy' - instead of 'running' which matches slurm version 1.1 - -* Changes in SLURM 1.2.10 -========================= - -- Fix race condititon in jobacct/linux with use of proctrack/pgid and a - realloc issue inside proctrack/linux - -- Added MPICH1_P4 plugin for direct launch of mpich1/p4 tasks using srun - and a patched version of the mpi library. See "etc/mpich1.slurm.patch". - NOTE: This is still under development and not ready for production use. - -* Changes in SLURM 1.2.9 -======================== - -- Add new sinfo field to sort by "%E" sorts by the time associated with a - node's state (from Prashanth Tamraparni, HP). - -- In sched/wiki: fix logic for restarting backup slurmctld. - -- Preload SLURM plugins early in the slurmstepd operation to avoid - multiple dlopens after forking (and to avoid a glibc bug - that leaves dlopen locks in a bad state after a fork). - -- Added MPICH1_P4 patch to launch tasks using srun rather than rsh and - automatically generate mpirun's machinefile based upon the job's - allocation. See "etc/mpich1.slurm.patch". - -- BLUEGENE - fix for overlap mode to mark all other base partitions as used - when creating a new block from the file to insure we only use the base - partitions we are asking for. - -* Changes in SLURM 1.2.8 -======================== - -- Added mpi/mpich1_shmem plugin. - -- Fix in proctrack/sgi_job plugin that could cause slurmstepd to seg_fault - preventing timely clean-up of batch jobs in some cases. - -* Changes in SLURM 1.2.7 -======================== - -- BLUEGENE - code to make it so you can make a 36x36x36 system. - The wiring should be correct for a system with x-dim of 1,2,4,5,8,13 - in emulation mode. It will work with any real system no matter the size. - -- Major re-write of jobcomp/script plugin: fix memory leak and - general code clean-up. - -- Add ability to change MaxNodes and ExcNodeList for pending job - using scontrol. - -- Purge zombie processes spawned via event triggers. - -- Add support for power saving mode (experimental code to reduce voltage - and frequency on nodes that stay in the IDLE state, for more information - see http://www.llnl.gov/linux/slurm/power_save.html). None of this - code is enabled by default. - -* Changes in SLURM 1.2.6 -======================== - -- Fix MPIRUN_PORT env variable in mvapich plugin - -- Disable setting triggers by other than user SlurmUser unless SlurmUser - is root for improved security. - -- Add event trigger for IDLE nodes. - -* Changes in SLURM 1.2.5 -======================== - -- Fix nodelist truncation in "scontrol show jobs" output - -- In mpi/mpichgm, fix potential problem formatting GMPI_PORT, from - Ernest Artiaga, BSC. - -- In sched/wiki2 - Report job's account, from Ernest Artiaga, BSC. - -- Add sbatch option "--ntasks-per-node". - -* Changes in SLURM 1.2.4 -======================== - -- In select/cons_res - fix for function argument type mis-match in getting - CPU count for a job, from Ernest Artiaga, BSC. - -- In sched/wiki2 - Report job's tasks_per_node requirement. - -- In forward logic fix to check if the forwarding node recieves a connection - but doesn't ever get the message from the sender (network issue or - something) also check to make sure if we get something back we make sure - we account for everything we sent out before we call it good. - -- Another fix to make sure steps with requested nodes have correct cpus - accounted for and a fix to make sure the user can't allocate more - cpus than the have requested. - -* Changes in SLURM 1.2.3 -======================== - -- Cpuset logic added to task/affinity, from Don Albert (Bull) and - Moe Jette (LLNL). The /dev/cpuset file system must be mounted and - set "TaskPluginParam=cpusets" in slurm.conf to enable. - -- In sched/wiki2, fix possible overflow in job's nodelist, from - Ernest Artiaga, BSC. - -- Defer creation of new job steps until a suspended job is resumed. - -- In select/linear - fix for potential stack corruption bug. - -* Changes in SLURM 1.2.2 -======================== - -- Added new command "strigger" for event trigger management, a new - capability. See "man strigger" for details. - -- srun --get-user-env now sends su's stderr to /dev/null - -- Fix in node_scheduling logic with multiple node_sets, from - Ernest Artiaga, BSC. - -- In select/cons_res, fix for function argument type mis-match in getting - CPU count for a job. - -* Changes in SLURM 1.2.1 -======================== - -- MPICHGM support bug fixes from Ernest Artiaga, BSC. - -- Support longer hostlist strings, from Ernest Artiaga, BSC. - -* Changes in SLURM 1.2.0 -======================== - -- Srun to use env vars for SLURM_PROLOG, SLURM_EPILOG, SLURM_TASK_PROLOG, - and SLURM_TASK_EPILOG. patch.1.2.0-pre11.070201.envproepilog from - Dan Palermo, HP. - -- Documenation update. patch.1.2.0-pre11.070201.mchtml from Dan Palermo, HP. - -- Set SLURM_DIST_CYCLIC = 1 (needed for HP MPI, slurm.hp.env.patch). - -* Changes in SLURM 1.2.0-pre15 -============================== - -- Fix for another spot where the backup controller calls switch/federation - code before switch/federation is initialized. - -* Changes in SLURM 1.2.0-pre14 -============================== - -- In sched/wiki2, clear required nodes list when a job is requeued. - Note that the required node list is set to every node used when - a job is started via sched/wiki2. - -- BLUEGENE - Added display of deallocating blocks to smap and other tools. - -- Make slurmctld's working directory be same as SlurmctldLogFile (if any), - otherwise StateSaveDir (which is likely a shared directory, possibly - making core file identification more difficult). - -- Fix bug in switch/federation that results in the backup controller - aborting if it receives an epilog-complete message. - -* Changes in SLURM 1.2.0-pre13 -============================== - -- Fix for --get-user-env. - -* Changes in SLURM 1.2.0-pre12 -============================== - -- BLUEGENE - Added correct node info for sinfo and sview for viewing - allocated nodes in a partition. - -- BLUEGENE - Added state save on slurmctld shutdown of blocks in an error - state on real systems and total block config on emulation systems. - -- Major update to Slurm's PMI internal logic for better scalability. - Communications now supported directly between application tasks via - Slurm's PMI library. Srun sends single message to one task on each node - and that tasks forwards key-pairs to other tasks on that nodes. The old - code sent key-pairs directly to each task. - NOTE: PMI applications must re-link with this new library. - -- For multi-core support: Fix task distribution bug and add automated - tests, patch.1.2.0-pre11.070111.plane from Dan Palermo (HP). - -* Changes in SLURM 1.2.0-pre11 -============================== - -- Add multi-core options to slurm_step_launch API. - -- Add man pages for slurm_step_launch() and related functions. - -- Jobacct plugin only looks at the proctrack list instead of the entire - list of processes running on the node. Cutting down a lot of unnecessary - file opens in linux and cutting down the time to query the procs by - more than half. - -- Multi-core bug fix, mask re-use with multiple job steps, - patch.1.2.0-pre10.061214.affinity_stepid from Dan Palermo (HP). - -- Modify jobacct/linux plugin to completely eliminate open /proc files. - -- Added slurm_sched_plugin_reconfig() function to re-read config files. - -- BLUEGENE - --reboot option to srun, salloc, and sbatch actually works. - -- Modified step context and step launch APIs. - -* Changes in SLURM 1.2.0-pre10 -============================== - -- Fix for sinfo node state counts by state (%A and %F output options). - -- Add ability to change a node's features via "scontrol update". NOTE: - Update slurm.conf also to preserve changes over slurmctld restart or - reconfig. - NOTE: Job and node state information can not be preserved from earlier - versions. - -- Added new slurm.conf parameter TaskPluginParam. - -- Fix for job requeue and credential revoke logic from Hongjia Cao (NUDT). - -- Fix for incorrectly generated masks for task/affinity plugin, - patch.1.2.0-pre9.061207.bitfmthex from Dan Palermo (HP). - -- Make mask_cpu options of srun and slaunch commands not requeue prefix - of "0x". patch.1.2.0-pre9.061208.srun_maskparse from Dan Palermo (HP). - -- Add -c support to the -B automatic mask generation for multi-core - support, patch.1.2.0-pre9.061208.mcore_cpuspertask from Dan Palermo (HP). - -- Fix bug in MASK_CPU calculation, - patch.1.2.0-pre9.061211.avail_cpuspertask from Dan Palermo (HP). - -- BLUEGENE - Added --reboot option to srun, salloc, and sbatch commands. - -- Add "scontrol listpids [JOBID[.STEPID]]" support. - -- Multi-core support patches, fixed SEGV and clean up output for large - task counts, patch.1.2.0-pre9.061212.cpubind_verbose from Dan Palermo (HP). - -- Make sure jobacct plugin files are closed before exec of user tasks to - prevent problems with job checkpoint/restart (based on work by - Hongjia Cao, NUDT). - -* Changes in SLURM 1.2.0-pre9 -============================= - -- Fix for select/cons_res state preservation over slurmctld restart, - patch.1.2.0-pre7.061130.cr_state from Dan Palermo. - -- Validate product of socket*core*thread count on node registration rather - than individual values. Correct values will need to be specified in slurm.conf - with FastSchedule=1 for correct multi-core scheduling behavior. - -* Changes in SLURM 1.2.0-pre8 -============================= - -- Modity job state "reason" field to report why a job failed (previously - previously reported only reason waiting to run). Requires cold-start of - slurmctld (-c option). - -- For sched/wiki2 job state request, return REJMESSAGE= with reason for - a job's failure. - -- New FastSchedule configuration parameter option "2" means to base - scheduling decisions upon the node's configuration as specified in - slurm.conf and ignore the node's actual hardware configuration. This - can be useful for testing. - -- Add sinfo output format option "%C" for CPUs (active/idle/other/total). - Based upon work by Anne-Marie Wunderlin (BULL). - -- Assorted multi-core bug fixes (patch1.2.0-pre7.061128.mcorefixes). - -- Report SelectTypeParameters from "scontrol show config". - -- Build sched/wiki plugin for Maui Scheduler (based upon new sched/wiki2 - code for Moab Scheduler). - -- BLUEGENE - changed way of keeping track of smaller partitions using - ionode range instead of quarter nodecard notation. - (i.e. bgl000[0-3] instead of bgl000.0.0) - -- Patch from Hongjia Cao (EINPROGRESS error message change) - -- Fix for correct requid for jobacct plugin - -- Added subsec timing display for sacct - -* Changes in SLURM 1.2.0-pre7 -============================= - -- BLUEGENE - added configurable images for bluegene block creation. - -- Plug a bunch of memory leaks. - -- Support processors, core, and physical IDs that are not in numeric - order (in slurmd to gathering node state information, based on patch - by Don Albert, Bull). - -- Fixed bug with aix not looking in the correct dir for the proctrack - include files - -- Removed global_srun.* from common merged it into srun proper - -- Added bluegene section to troubleshooting guide (web page). - -- NOTE: Requires cold-start when moving from 1.2.0-pre6, save state - info for jobs changed. - -- BLUEGENE - Changed logic for wiring bgl blocks to be more maintainable. - (Haven't tested on large system yet, works on 2 base partition system) - -- Do not read the select/cons_res state save file if slurmctld is - cold-started (with the "-c" option). - -* Changes in SLURM 1.2.0-pre6 -============================= - -- Maintain actually job step run time with suspend/resume use. - -- Allow slurm.conf options to appear multiple times. SLURM will use the - last instance of any particular option. - -- Add version number to node state save file. Will not recover node - state information on restart from older version. - -- Add logic to save/restore multi-core state information. - -- Updated multi-core logic to use types uint16_t and uint32_t instead - of just type int. - -- Race condition for forwarding logic fix from Hongjia Cao - -- Add support for Portable Linux Processor Affinity (PLPA, see - http://www.open-mpi.org/software/plpa). - -- When a job epilog completes on all non-DOWN nodes, immediately purge - it's job steps that lack switch windows. Needed for LSF operation. - Based upon slurm.hp.node_fail.patch. - -- Modify srun to ignore entries on --nodelist for job step creation - if their count exceeds the task count. Based on slurm.hp.srun.patch. - -* Changes in SLURM 1.2.0-pre5 -============================= - -- Patch from HP patch.1.2.0.pre4.061017.crcore_hints, supports cores as - consumable resource. - -* Changes in SLURM 1.2.0-pre4 -============================= - -- Added node_inx to job_step_info_t to get the node indecies for mapping out - steps in a job by nodes. - -- sview grid added - -- BLUEGENE node_inx added to blocks for reference. - -- Automatic CPU_MASK generation for task launch, new srun option -B. - -- Automatic logical to physical processor identification and mapping. - -- Added new srun options to --cpu_bind: sockets, cores, and threads - -- Updated select/cons_res to operate as socket granularity. - -- New srun task distribution options to -m: plane - -- Multi-core support in sinfo, squeue, and scontrol. - -- Memory can be treated as a consumable resource. - -- New srun options --ntasks-per-[node|socket|core]. - -* Changes in SLURM 1.2.0-pre3 -============================= - -- Remove configuration parameter ShedulerAuth (defunct). - -- Add NextJobId to "scontrol show config" output. - -- Add new slurm.conf parameter MailProg. - -- New forwarding logic. New recieve_msg functions depending on what you - are expecting to get back. No srun_node_id anymore passed around in - a slurm_msg_t - -- Remove sched/wiki plugin (use sched/wiki2 for now) - -- Disable pthread_create() for PMI_send when TotalView is running for - better performance. - -- Fixed certain tests in test suite to not run with bluegene or front-end - systems - -- Removed addresses from slurm_step_layout_t - -- Added new job field, "comment". Set by srun, salloc and sbatch. See - with "scontrol show job". Used in sched/wiki2. - -- Report a job's exit status in "scontrol show job". - -- In sched/wiki2: add support for JOBREQUEUE command. - -* Changes in SLURM 1.2.0-pre2 -============================= - -- Added function slurm_init_slurm_msg to be used to init any slurm_msg_t - you no longer need do any other type of initialization to the type. - -* Changes in SLURM 1.2.0-pre2 -============================= - -- Fixed task dist to work with hostfile and warn about asking for more tasks - than you have nodes for in arbitray mode. - -- Added "account" field to job and step accounting information and sacct output. - -- Moved task layout to slurmctld instead of srun. Job step create returns - step_layout structure with hostnames and addresses that corrisponds - to those nodes. - -- Changed api slurm_lookup_allocation params, - resource_allocation_response_msg_t changed to job_alloc_info_response_msg_t - this structure is being renamed so contents are the same. - -- alter resource_allocation_response_msg_t see slurm.h.in - -- remove old_job_alloc_msg_t and function slurm_confirm_alloc - -- Slurm configuration files now support an "Include" directive to - include other files inline. - -- BLUEGENE New --enable-bluegene-emulation configure parameter to allow - running system in bluegene emulation mode. Only - really useful for developers. - -- New added new tool sview GUI for displaying slurm info. - -- fixed bug in step layout to lay out tasks correctly - -* Changes in SLURM 1.2.0-pre1 -============================= - -- Fix bug that could run a job's prolog more than once - -- Permit batch jobs to be requeued, scontrol requeue <jobid> - -- Send overcommit flag from srun in RPCs and have slurmd set SLURM_OVERCOMMIT - flag at batch job launch time. - -- Added new configuration parameter MessageTimeout (replaces #define in - the code) - -- Added support for OSX build. - -* Changes in SLURM 1.1.37 -========================= - - In sched/wiki2: Add NAME to job record. - - Changed -w (--nodelist) option to only read in number of nodes specified - by -N option unless nprocs was set and in Arbitrary layout mode. - - Added some loops around pthread creates incase they fail and also fixed an - issue in srun to fail job has failed instead of waiting around for threads - that will never end. - - Added fork handlers in the slurmstepd - - In sched/wiki2: fix logic for restarting backup slurmctld. - - In sched/wiki2: if job has no time limit specified, return the partition's - time limit (which is the default for the job) rather than 365 days. - -* Changes in SLURM 1.1.36 -========================= - - Permit node state specification of DRAIN in slurm.conf. - - In jobcomp/script - fix bug that prevented UID and JOBID environment - variables from being set. - -* Changes in SLURM 1.1.35 -========================= - - In sched/wiki2: Add support for CMD=SIGNALJOB to accept option - of VALUE=SIGXXX in addition to VALUE=# and VALUE=XXX options. - - In sched/wiki2: Add support for CMD=MODIFYJOB to accept option of - DEPEND=afterany:<jobid>, specify jobid=0 to clear. - - Correct logic for job allocation with task count (srun -n ...) AND - FastSchedule=0 AND low CPUs count in Slurm's node configuration. - - Add new and undocumented scancel option, --ctld, to route signal - requests through slurmctld rather than directly to slurmd daemons. - Useful for testing purposes. - - Fixed issue with hostfile support not working in a job step. - - Set supplemental groups for SlurmUser in slurmctld daemon, from - Anne Marie Wunderlin, Bull. - - In jobcomp/script: Add ACCOUNT and PROCS (count) to environment - variables set. Fix bug that prevented UID and JOBID from being - overwritten. - -* Changes in SLURM 1.1.34 -========================= - - Insure that slurm_signal_job_step() is defined in srun for mvapich - and mpichgm error conditions. - - Modify /etc/init.d/slurm restart command to wait for daemon to terminate - before starting a new one - - Permit job steps to be started on draining nodes that have already - been allocated to that job. - - Prevent backup slurmctld from purging pending batch job scripts when a - SIGHUP is received. - - BLUEGENE - check to make sure set_block_user works when the block - is in a ready state. - - Fix to slurmstepd to not use local variables in a pthread create. - - In sched/wiki2 - add wiki.conf parameter HostFormat specifying - format of hostlists exchanged between Slurm and Moab (experimental). - - mpi/mvapich: Support Adam Moody's fast MPI initialization protocol - (MVAPICH protocol version 8). - -* Changes in SLURM 1.1.33 -========================= - - sched/wiki2 - Do not wait for job completion before permitting - additional jobs to be scheduled. - - Add srun SLURM_EXCLUSIVE environment variable support, from - Gilles Civario (Bull). - - sched/wiki2 - Report job's node sharing options. - - sched/wiki2 - If SchedulerPort is in use, retry opening it indefinitely. - - sched/wiki2 - Add support for changing the size of a pending job. - - BLUEGENE - Fix to correctly look at downed/drained nodes with picking - a block to run a job and not confuse it with another running job. - -* Changes in SLURM 1.1.32 -========================= - - If a job's stdout/err file names are unusable (bad path), use the - default names. - - sched/wiki2 - Fix logic to be compatible with select/cons_res plugin - for allocating individual processors within nodes. - - Fix job end time calculation when changed from an initial value of - INFINITE. - -* Changes in SLURM 1.1.31 -========================= - - Correctly identify a user's login shell when running "srun -b --uid" - as root. Use the --uid field for the /etc/passwd lookup instead of - getuid(). - -* Changes in SLURM 1.1.30 -========================= - - Fix to make sure users don't include and exclude the same node in - their srun line. - - mpi/mvapich: Forcibly terminate job 60s after first MPI_Abort() - to avoid waiting indefinitely for hung processes. - - proctrack/sgi_job: Fix segv when destroying an active job container - with processes still running. - - Abort a job's stdout/err to srun if not processed within 5 minutes - (prevents node hanging in completing state if the srun is stopped). - -* Changes in SLURM 1.1.29 -========================= - - Fix bug which could leave orphan process put into background from - batch script. - -* Changes in SLURM 1.1.28 -========================= - - BLUEGENE - Fixed issue with nodes that return to service outside of an - admin state is now updated in the bluegene plugin. - - Fix for --get-user-env parsing of non-printing characters in users' logins. - - Restore "squeue -n localhost" support. - - Report lack of PATH env var as verbose message, not error in srun. - -* Changes in SLURM 1.1.27 -========================= - - Fix possible race condition for two simultaneous "scontrol show config" - calls resulting in slurm_xfree() Error: from read_config.c:642 - - BLUEGENE - Put back logic to make a block fail a boot 3 times before - cancelling a users job. - - Fix problem using srun --exclude option for a job step. - - Fix problem generating slurmd error "Unrecognized request: 0" with - some compilers. - -* Changes in SLURM 1.1.26 -========================= - - In sched/wiki2, fixes for support of job features. - - In sched/wiki2, add "FLAGS=INTERACTIVE;" to GETJOBS response for - non-batch (not srun --batch) jobs. - -* Changes in SLURM 1.1.25 -========================= - - switch/elan: Fix for "Failed to initialise stats structure" from - libelan when ELAN_STATKEY > MAX_INT. - - Tune PMI support logic for better scalability and performance. - - Fix for running a task on each node of an allocation if not specified. - - In sched/wiki2, set TASKLIST for running jobs. - - In sched/wiki2, set STARTDATE for pending jobs with deferred start. - - Added srun --get-user-env option (for Moab scheduler). - -* Changes in SLURM 1.1.24 -========================= - - In sched/wiki2, add support for direct "srun --dependency=" use. - - mpi/mvapich: Add support for MVAPICH protocol version 6. - - In sched/wiki2, change "JOBMODIFY" command to "MODIFYJOB". - - In sched/wiki2, change "JOBREQUEUE" command to "REQUEUEJOB". - - For sched/wiki2, permit normal user to specify arbitrary job id. - - In sched/wiki2, set buffer pointer to NULL after free() to avoid - possible memory corruption. - - In sched/wiki2, report a job's exit code on completion. - - For AIX, fix mail for job event notification. - - Add documentation for propagation options in man srun and slurm.conf. - -* Changes in SLURM 1.1.23 -========================= - - Fix bug in non-blocking connect() code affecting AIX. - -* Changes in SLURM 1.1.22 -========================= - - Add squeue option to print a job step's task count (-o %A). - - Initialize forward_struct to avoid trying to free a bad pointer, - patch from Anton Blanchard (SAMBA). - - In sched/wiki2, fix fatal race condition on slurmctld startup. - - Fix for displaying launching verbose messages for each node under the - tree instead of just the head one. - - Fix job suspend bug, job accounting plugin would SEGV when given a - bad job ID. - -* Changes in SLURM 1.1.21 -========================= - - BLUEGENE - Wait on a fini to make sure all threads are finished before - cleaning up. - - BLUEGENE - replacements to not destroy lists but just empty it to avoid - losing the pointer to the list in the block allocator. - - BLUEGENE - added --enable-bluegene-emulation configure option to 1.1 - - In sched/wiki2, enclose a job's COMMENT value in double quotes. - - In sched/wiki2, support newly defined SIGNALJOB command. - - In sched/wiki2, maintain open event socket, don't open and close - for each event. - - In sched/wiki2, fix for scalability problem starting large jobs. - - Fix logic to execute a batch job step (under an existing resource - allocation) as needed by LSF. - - Patches from Hongjia Cao (pmi finialize issues and type declaration) - - Delete pending job if it's associated partition is deleted. - - fix for handling batch steps completing correctly and setting the - return code. - - Altered ncurses check to make sure programs can link before saying we - have a working curses lib and header. - - Fixed an init issue with forward_struct_init not being set correctly in - a few locations in the slurmd. - - Fix for user to use the NodeHostname (when specified in the slurm.conf file) - to start jobs on. - -* Changes in SLURM 1.1.20 -========================= - - Added new SPANK plugin hook slurm_spank_local_user_init() called - from srun after node allocation. - - Fixed bug with hostfile support not working on a direct srun - -* Changes in SLURM 1.1.19 -========================= - - BLUEGENE - make sure the order of blocks read in from the bluegene.conf - are created in that order (static mode). - - Fix logic in connect(), slurmctld fail-over was broken in v1.1.18. - - Fix logic to calculate the correct timeout for fan out. - -* Changes in SLURM 1.1.18 -========================= - - In sched/wiki2, add support for EHost and EHostBackup configuration - parameters in wiki.conf file - - In sched/wiki2, fix memory management bug for JOBWILLRUN command. - - In sched/wiki2, consider job Busy while in Completing state for - KillWait+10 seconds (used to be 30 seconds). - - BLUEGENE - Fixes to allow full block creation on the system and not to add - passthrough nodes to the allocation when creating a block. - - BLUEGENE - Fix deadlock issue with starting and failing jobs at the same - time - - Make connect() non-blocking and poll() with timeout to avoid huge - waits under some conditions. - - Set "ENVIRONMENT=BATCH" environment variable for "srun --batch" jobs only. - - Add logic to save/restore select/cons_res state information. - - BLUEGENE - make all sprintf's into snprintf's - - Fix for "srun -A" segfault on a node failure. - -* Changes in SLURM 1.1.17 -========================= - - BLUEGENE - fix to make dynamic partitioning not go create block where - there are nodes that are down or draining. - - Fix srun's default node count with an existing allocation when neither - SLURM_NNODES nor -N are set. - - Stop srun from setting SLURM_DISTRIBUTION under job steps when a - specific was not explicitly requested by the user. - -* Changes in SLURM 1.1.16 -========================= - - BLUEGENE - fix to make prolog run 5 minutes longer to make sure we have - enough time to free the overlapping blocks when starting a new job on a - block. - - BLUEGENE - edit to the libsched_if.so to read env and look at - MPIRUN_PARTITION to see if we are in slurm or running mpirun natively. - - Plugins are now dlopened RTLD_LAZY instead of RTLD_NOW. - -* Changes in SLURM 1.1.15 -========================= - - BLUEGENE - fix to be able to create static partitions - - Fixed fanout timeout logic. - - Fix for slurmctld timeout on outgoing message (Hongjia Cao, NUDT.edu.cn). - -* Changes in SLURM 1.1.14 -========================= - - In sched/wiki2: report job/node id and state only if no changes since - time specified in request. - - In sched/wiki2: include a job's exit code in job state information. - - In sched/wiki2: add event notification logic on job submit and completion. - - In sched/wiki2: add support for JOBWILLRUN command type. - - In sched/wiki2: for job info, include required HOSTLIST if applicable. - - In sched/wiki2: for job info, replace PARTITIONMASK with RCLASS (report - partition name associated with a job, but no task count) - - In sched/wiki2: for job and node info, report all data if TS==0, - volitile data if TS<=update_time, state only if TS>update_time - - In sched/wiki2: add support for CMD=JOBSIGNAL ARG=jobid SIGNAL=name or # - - In sched/wiki2: add support for CMD=JOBMODIFY ARG=jobid [BANK=name] - [TIMELIMIT=minutes] [PARTITION=name] - - In sched/wiki2: add support for CMD=INITIALIZE ARG=[USEHOSTEXP=T|F] - [EPORT=#]; RESPONSE=EPORT=# USEHOSTEXP=T - - In sched/wiki2: fix memory leak. - - Fix sinfo node state filtering when asking for idle nodes that are also - draining. - - Add Fortran extension to slurm_get_rem_time() API. - - Fix bug when changing the time limit of a running job that has previously - been suspended (formerly failed to account for suspend time in setting - termination time). - - fix for step allocation to be able to specify only a few nodes in a - step and ask for more that specified. - - patch from Hongjia Cao for forwarding logic - - BLUEGENE - able to allocate specific nodes without locking up. - - BLUEGENE - better tracking of blocks that are created dynamically, - less hitting the db2. - -* Changes in SLURM 1.1.13 -========================= - - Fix hang in sched/wiki2 if Moab stops responding responding when - response is outgoing. - - BLUEGENE - fix to make sure the block is good to go when picking it - - BLUEGENE - add libsched_if.so so mpirun doesn't try to create a block - by itself. - - Enable specification of srun --jobid=# option with --batch (for user root). - - Verify that job actually starts when requested by sched/wiki2. - - Add new wiki.conf parameters: EPort and JobAggregationTime for event - notification logic (see wiki.conf man page for details) - -* Changes in SLURM 1.1.12 -========================= - - Sched/wiki2 to report a job's account as COMMENT response to GETJOBS - request. - - Add srun option "--comment" (maps to job account until slurm v1.2, - needed for Moab scheduler functionality). - - fixed some timeout issues in the controller hopefully stopping all the - issues with excessive timeouts. - - unit conversion (i.e. 1024 => 1k) only happens on bgl systems for node - count. - - Sched/wiki2 to report a job's COMPETETIME and SUSPENDTIME in GETJOBS - response. - - Added support for Mellanox's version of mvapich-0.9.7. - -* Changes in SLURM 1.1.11 -========================= - - Update file headers adding permission to link with OpenSSL. - - Enable sched/wiki2 message authentication. - - Fix libpmi compilation issue. - - Remove "gcc-c++ python" from slurm.spec BuildRequires. It breaks - the AIX build, so we'll have to find another way to deal with that. - -* Changes in SLURM 1.1.10 -========================= - -- task distribution fix for steps that are smaller than job allocation. - -- BLUEGENE - fix to only send a success when block was created when trying - to allocate the block. - -- fix so if slurm_send_recv_node_msg fails on the send the auth_cred returned - by the resp is NULL. - -- Fix switch/federation plugin so backup controller can assume control - repeatedly without leaking or corrupting memory. - -- Add new error code (for Maui/Moab scheduler): ESLURM_JOB_HELD - -- Tweak slurmctld's node ping logic to better handle failed nodes with - hierarchical communications fail-over logic. - -- Add support for sched/wiki specific configuration file "wiki.conf". - -- Added sched/wiki2 plugin (new experimental wiki plugin). - -* Changes in SLURM 1.1.9 -======================== - -- BLUEGENE - fix to handle a NO_VAL sent in as num procs in the job - description. - -- Fix bug in slurmstepd code for parsing --multi-prog command script. - Parser was failing for commands with no arguments. - -- Fix bug to check unsigned ints correctly in bitstring.c - -- Alter node count covert to kilo to only convert number divisible by - 1024 or 512 - -* Changes in SLURM 1.1.8 -======================== - -- Added bug fixes (fault-tolerance and memory leaks) from Hongjia Cao - <hjcao@nudt.edu.cn> - -- Gixed some potential BLUEGENE issues with the bridge log file not having - a mutex around the fclose and fopen. - -- BLUEGENE - srun -n procs now regristers correctly - -- Fixed problem with reattach double allocating step_layout->tids - -- BLUEGENE - fix race condition where job is finished before it starts. - -* Changes in SLURM 1.1.7 -======================== - -- BLUEGENE - fixed issue with doing an allocation for nodes since asking - for 32,128, or 512 all mean 1 to the controller. - -- Add "Include" directive to slurm.conf files. If "Include" is found - at the beginning of a line followed by whitespace and then - the full path to a file, that file is included inline with the current - slurm.conf file. - -* Changes in SLURM 1.1.6 -======================== - -- Improved task layout for relative positions - -- Fixed heterogeous cpu overcommit issue - -- Fix bug where srun would hang if it ran on one node and that - node's slurmd died - -- Fix bug where srun task layout would be bad when min-max node range is - specified (e.g. "srun -N1-4 ...") - -- Made slurmctld_conf.node_prefix only be set on Bluegene systems. - -- Fixed a race condition in the controller to make it so a plugin thread - wouldn't be able to access the slurmctld_conf structure before it was - filled. - -* Changes in SLURM 1.1.5 -======================== - -- Ignore partition's MaxNodes for SlurmUser and root. - -- Fix possible memory corruption with use of PMI_KVS_Create call. - -- Fix race condition when multiple PMI_KVS_Barrier calls. - -- Fix logic in which slurmctld outgoing RPC requests could get delayed. - -- Fix logic for laying out steps without a hostlist. - -* Changes in SLURM 1.1.4 -======================== - -- Improve error handling in hierarchical communications logic. - -* Changes in SLURM 1.1.3 -======================== - -- Fix big-endian bug in the bitstring code which plagued AIX. - -- Fix bug in handling srun's --multi-prog option, could go off end of buffer. - -- Added support for job step completion (and switch window release) on - subset of allocated nodes. - -- BLUEGENE - removed configure option --with-bg-link bridge is linked with - dlopen now no longer needing fake database so files on frontend node. - -- BLUEGENE - implemented use of rm_get_partition_info instead of - ...partitions_info which has made a much better design improving stability. - -- Streamline PMI communications and increase timeouts for highly parallel - jobs. Improves scalability of PMI. - -* Changes in SLURM 1.1.2 -======================== - -- Fix bug in jobcomp/filetxt plugin to report proper NodeCnt when a job - fails due to a node failure. - -- Fix Bluegene configure to work with the new 64bit libs. - -- Fix bug in controller that causes it to segfault when hit with a malformed - message. - -- For "srun --attach=X" to other users job, report an error and exit (it - previously just hung). - -- BLUEGENE - fix for doing correct small block logic on user error. - -- BLUEGENE - Added support in slurmd to create a fake libdb2.so if it - doesn't exist so smap won't seg fault - -- BLUEGENE - "scontrol show job" reports "MaxProcs=None" and "Start=None" - if values are not specified at job submit time - -- Add retry logic for PMI communications, may be needed for highly parallel - jobs. - -- Fix bug in slurmd where variable is used in logging message after freed - (slurmstepd rank info). - -- Fix bug in scontrol show daemons if NodeName=localhost will work now to - display slurmd as place where it is running. - -- Patch from HP for init nodes before init_bitmaps - -- ctrl-c killed sruns will result in job state as cancelled instead of - completed. - -- BLUEGENE - added configure option --with-bg-link to choose dynamic linking - or static linking with the bridgeapi. - -* Changes in SLURM 1.1.1 -======================== - -- Fix bug in packing job suspend/resume RPC. - -- If a user breaks out of srun before the allocation takes place, mark the - job as CANCELLED rather than COMPLETED and change its start and end time - to that time. - -- Fix bug in PMI support that prevented use of second PMI_Barrier call. - This fix is needed for MVAPICH2 use. - -- Add "-V" options to slurmctld and slurmd to print version number and exit. - -- Fix scalability bug in sbcast. - -- Fix bug in cons_res allocation strategy. - -- Fix bug in forwarding with mpi - -- Fix bug sacct forwarding with stat option - -- Added nodeid to sacct stat information - -- cleaned up way slurm_send_recv_node_msg works no more clearing errno - -- Fix error handling bug in the networking code that causes the slurmd to - xassert if the server is not running when the slurmd tries to register. - -* Changes in SLURM 1.1.0 -======================== - -- Fix bug that could temporarily make nodes DOWN when they are really - responding. - -- Fix bug preventing backup slurmctld from responding to PING RPCs. - -- Set "CFLAGS=-DISO8601" before configuration to get ISO8601 format - times for all SLURM commands. NOTE: This may break Moab, Maui, and/or - LSF schedulers. - -- Fix for srun -n and -O options when paired with -b. - -- Added logic for fanout to failover to forward list if main node is - unreachable - -- sacct also now keeps track of submitted, started and ending times of jobs - -- reinit config file mutex at beginning of slurmstepd to avoid fork issues - -* Changes in SLURM 1.1.0-pre8 -============================= - -- Fix bug in enforcement of partition's MaxNodes limit. - -- BLUEGENE - added support for srun -w option also fixed the geometry option - for srun. - -* Changes in SLURM 1.1.0-pre7 -============================= - -- Accounting works for aix systems, use jobacct/aix - -- Support large (over 2GB) files on 32-bit linux systems - -- changed all writes to safe_write in srun - -- added $float to globals.example in the testsuite - -- Set job's num_proc correctly for jobs that do not have exclusive use - of it's allocated nodes. - -- Change in support for test suite: 'testsuite/expect/globals.example' - is now 'testsuite/expect/globals' and you can override variable - settings with a new file 'testsuite/expect/globals.local'. - -- Job suspend now sends SIGTSTP, sleep(1), sends SIGSTOP for better - MPI support. - -- Plug a bunch of memory leaks in various places. - -- Bluegene - before assigning a job to a block the plugin will check the bps - to make sure they aren't in error state. - -- Change time format in job completion logging (JobCompType=jobcomp/filetxt) - from "MM/DD HH:MM:SS" to "YYYY-MM-DDTHH:MM:SS", conforming with the ISO8601 - standard format. - -* Changes in SLURM 1.1.0-pre6 -============================= - -- Added logic to "stat" a running job with sacct option -S use -j to specify - job.step - -- removed jobacct/bluegene (no real need for this) meaning, I don't think - there is a way to gather the data yet. - -- Added support for mapping "%h" in configured SlurmdLog to the hostname. - -- Add PropagatePrioProcess to control propagation of a user's nice value - to spawned tasks (based upon work by Daniel Christians, HP). - -* Changes in SLURM 1.1.0-pre5 -============================= - -- Added step completion RPC logic - -- Vastly changed sacct and the jobacct plugin. Read documentation for full - details. - -- Added jobacct plugin for AIX and BlueGene, they currently don't work, - but infrastructure is in place. - -- Add support for srun option --ctrl-comm-ifhn to set PMI communications - address (Hongjia Cao, National University of Defense Technology). - -- Moved safe_read/write to slurm_protocol_defs.h removing multiple copies. - -- Remove vestigial functions slurm_allocate_resources_and_run() and - slurm_free_resource_allocation_and_run_response_msg(). - -- Added support for different executable files and arguments by task based - upon a configuration file. See srun's --multi-prog option (based upon - work by Hongjia Cao, National University of Defense Technology). - -- moved the way forward logic waited for fanout logic mostly eliminating - problems with scalability issues. - -- changed -l option in sacct to display different params see sacct/sacct.h - for details. - -* Changes in SLURM 1.1.0-pre4 -============================= - -- Bluegene specific - Added support to set bluegene block state to - free/error via scontrol update BlockName - -- Add needed symbol to select/bluegene in order to load plugin. - -* Changes in SLURM 1.1.0-pre3 -============================= - -- Added framework for XCPU job launch support. - -- New general configuration file parser and slurm.conf handling code. - Allows long lines to be continued on the next line by ending with a "\". - Whitespace is allowed between the key and "=", and between the "=" and - value. - WARNING: A NodeName may now occur only once in a slurm.conf file. - If you want to temporarily make nodes DOWN in the slurm.conf, - use the new DownNodes keyword (see "man slurm.conf"). - -- Gracefully handle request to submit batch job from within an existing - batch job. - -- Warn user attempting to create a job allocation from within an existing job - allocation. - -- Add web page description for proctrack plugin. - -- Add new function slurm_get_rem_time() for job's time limit. - -- JobAcct plugin renamed from "log" to "linux" in preparation for support of - new system types. - WARNING: "JobAcctType=jobacct/log" is no longer supported. - -- Removed vestigal 'bg' names from bluegene plugin and smap - -- InactiveLimit parameter is not enforced for RootOnly partitions. - -- Update select/cons_res web page (Susanne Balle, HP, - cons_res_doc_patch_3_29_06). - -- Build a "slurmd.test" along with slurmd. slurmd.test has the path to - slurmstepd set allowing it to run unmodified out of the builddir for - testing (Mark Grondona). - -* Changes in SLURM 1.1.0-pre2 -============================= - -- Added "bcast" command to transmit copies of a file to compute nodes - with message fanout. - -- Bluegene specific - Added support for overlapping partitions and - dynamic partitioning. - -- Bluegene specific - Added support for nodecard sized blocks. - -- Added logic to accept 1k for 1024 and so on for --nodes option of srun. - This logic is through display tools such as smap, sinfo, scontrol, and - squeue. - -- Added bluegene.conf man page. - -- Added support for memory affinity, see srun --mem_bind option. - -* Changes in SLURM 1.1.0-pre1 -============================= - -- New --enable-multiple-slurmd configure parameter to allow running - more than one copy of slurmd on a node at the same time. Only - really useful for developers. - -- New communication is now branched on all processes to slurmd's from - slurmctld and srun launch command. This is done with a tree type - algorithm. Spawn and batch mode work the same as before. New slurm.conf - variable TreeWidth=50 is default. This is the number of threads per - stop on the tree. - -- Configuration parameter HeartBeatInterval is depracated. Now used half - of SlurmdTimeout and SlurmctldTimeout for communications to slurmd and - slurmctld daemons repsectively. - -- Add hash tables for select/cons_res plugin (Susanne Balle, HP, - patch_02222006). - -- Remove some use of cr_enabled flag in slurmctld job record, use - new flag "test_only" in select_g_job_test() instead. - -* Changes in SLURM 1.0.17 -========================= - -- Set correct user groups for task epilogs. - -- Add more debugging for tracking slow slurmd job initiations - (slurm.hp.replaydebug.patch). - -* Changes in SLURM 1.0.16 -========================= - -- For "srun --attach=X" to other users job, report an error and exit (it - previously just hung). - -- Make sure that "scancel -s KILL" terminates the job just like "scancel" - including deletion of all job steps (Chris Holmes, HP, slurm,patch). - -- Recognize ISO-8859 input to srun as a script (for non-English scripts). - -- switch/elan: Fix bug in propagation of ELAN_STATKEY environment variable. - -- Fix bug in slurmstepd IO code that can result in it spinning if a - certain error occurs. - -- Remove nodes from srun's required node list if their count exceeds - the number of requested tasks. - -- sched/backfill to schedule around jobs that are hung in a completing - state. - -- Avoid possibly re-running the epilog for a job on slurmctld restart or - reconfig by saving and restoring a hostlist of nodes still completing - the job. - -* Changes in SLURM 1.0.15 -========================= - -- In srun, reset stdin to blocking mode (if it was originally blocking before - we set it to O_NONBLOCK) on exit to avoid trouble with things like running - srun under a bash shell in an emacs *shell* buffer. - -- Fix srun race condition that occasionally causes segfaults at shutdown - -- Fix obscure locking issues in log.c code. - -- Explicitly close IO related sockets. If an srun gets "stuck", possibly - because of unkillable tasks in its job step, it will not hold many TCP - sockets in the CLOSE_WAIT state. - -- Increase the SLURM protocol timeout from 5 seconds to 10 seconds. - (In 1.2 there will be a slurm.conf parameter for this, rather than having - it hardcoded.) - -* Changes in SLURM 1.0.14 -========================= - -- Fix for bad xfree() call in auth/munge which can raise an assert(). - -- Fix installed fork handlers for the conf mutex for slurmd and slurmstepd. - -* Changes in SLURM 1.0.13 -========================= - -- Fix for AllowGroups option to work when the /etc/group file doesn't - contain all users in group by adding the uids of the names in /etc/passwd - that have a gid of that which we are looking for. - -- Fix bug in InactiveLimit support that can potentially purge active jobs. - NOTE: This is highly unlikely except on very large AIX clusters. - -- Fix bug for reiniting the config_lock around the control_file in - slurm_protocol_api.c logic has changed in 1.1 so no need to merge - -* Changes in SLURM 1.0.12 -========================= - -- Report node state of DRAIN rather than DOWN if DOWN with DRAIN flag set. - -- Initialize job->mail_type to 0 (NONE) for job submission. - -- Fix for stalled task stdout/stderr when buffered I/O is used, and - a single line exceeds 4096 bytes. - -- Memory leak fixes for maui plugin (hjcao@nudt.edu.cn) - -- Fix for spinning srun when the terminal to which srun is talking - goes away. - -- Don't set avail_node_bitmap for DRAINED nodes on slurmctld reconfig - (can schedule a job on drained node after reconfig). - - -* Changes in SLURM 1.0.11 -========================= - -- Fix for slurmstepd hang when launching a task. (Needed to install - list library's atfork handlers). - -- Fix memory leak on AIX (and possibly other architectures) due to - missing pthread_attr_destroy() calls. - -- Fix rare task standard I/O setup bug. When the bug hit, stdin, stdout, - or stderr could be an invalid file descriptor. - -- General slurmstepd file descriptor cleanup. - -- Fix memory leak in job accounting logic (Andy Riebs, HP, memory_leak.patch). - -* Changes in SLURM 1.0.10 -========================= - -- Fix for job accounting logic submitted from Andy Riebs to handle issues - with suspending jobs and such. patch file named requeue.patch - -- Make select/cons_res interoperate with mpi/lam plugin for task counts. - -- Fix race condition where srun could seg-fault due to use of logging functions - within pthread after calling log_fini. - -- Code changes for clean build with gcc 2.96 (gcc_2_96.patch, Takao Hatazaki, HP). - -- Add CacheGroups configuration support in configurator.html (configurator.patch, - Takao Hatazaki, HP). - -- Fix bug preventing use of mpich-gm plugin (mpichgm.patch, Takao Hatazaki, HP). - -* Changes in SLURM 1.0.9 -======================== - -- Fix job accounting logic to open new log file on slurmctld reconfig. - (Andy Riebs, slurm.hp.logfile.patch). - -- Fix bug which allows a user to run a batch script on a node not allocated - by the slurmctld. - -- Fix poe MP_HOSTFILE handling bug on AIX. - -* Changes in SLURM 1.0.8 -======================== - -- Fix to communication between slurmd and slurmstepd to allow for partial - reads and writes on their communication pipes. - -* Changes in SLURM 1.0.7 -======================== - -- Change in how AuthType=auth/dummy is handled for security testing. - -- Fix for bluegene systems to allow full system partitions to stay booted - when other jobs are submitted to the queue. - -* Changes in SLURM 1.0.6 -======================== - -- Prevent slurmstepd from crashing when srun attaches to batch job. - -* Changes in SLURM 1.0.5 -======================== - -- Restructure logic for scheduling BlueGene small block jobs. Added - "test_only" flag to select_p_job_test() in select plugin. - -- Correct squeue "NODELIST" output for BlueGene small block jobs. - -- Fix possible deadlock situations on BlueGene plugin on errors. - -* Changes in SLURM 1.0.4 -======================== - -- Release job allocation if step creation fails (especially for BlueGene). - -- Fix bug select/bluegene warm start with changed bglblock layout. - -- Fix bug for queuing full-system BlueGene jobs. - -* Changes in SLURM 1.0.3 -======================== - -- Fix bug that could refuse to queue batch jobs for BlueGene system. - -- Add BlueGene plugin mutex lock for reconfig. - -- Ignore BlueGene bgljobs in ERROR state (don't try to kill). - -- Fix job accounting for batch jobs (Andy Riebs, HP, - slurm.hp.jobacct_divby0a.patch). - -- Added proctrack/linuxproc.so to the main RPM. - -- Added mutex around bridge api file to avoid locking up the api. - -- BlueGene mod: Terminate slurm_prolog and slurm_epilog immediately if - SLURM_JOBID environment variable is invalid. - -- Federation driver: allow selection of a sepecific switch interface - (sni0, sni1, etc.) with -euidevice/MP_EUIDEVICE. - -- Return an error for "scontrol reconfig" if there is already one in - progress - -* Changes in SLURM 1.0.2 -======================== - -- Correctly report DRAINED node state as type OTHER for "sinfo --summarize". - -- Fixes in sacct use of malloc (Andy Riebs, HP, sacct_malloc.patch). - -- Smap mods: eliminate screen flicker, fix window resize, report more clear - message if window too small (Dan Palermo, HP, patch.1.0.0.1.060126.smap). - -- Sacct mods for inconsistent records (race condition) and replace --debug - option with --verbose (Andy Riebs, HP, slurm.hp.sacct_exp_vvv.patch). - -- scancel of a job step will now send a job-step-completed message - to the controller after verifying that the step has completed on all nodes. - -- Fix task layout bug in srun. - -- Added times to node "Reason" field when set down for insufficient - resources or if not responding. - -- Validate operation with Elan switch and heterogeneous nodes. - -* Changes in SLURM 1.0.1 -======================== - -- Assorted updates and clarifications in documentation. - -- Detect which munge installation to use 32/64 bit. - -* Changes in SLURM 1.0.0 -======================== - -- Fix sinfo filtering bug, especially "sinfo -R" output. - -- Fix node state change bug, resuming down or drained nodes. - -- Fix "scontrol show config" to display JobCredentialPrivateKey instead - of JobCredPrivateKey and JobCredentialPublicCertificate instead of - JobCredPublicKey. They now match the options in the slurm.conf. - -- Fix bug in job accounting for very long node list records (Andy Riebs, - HP, sacct_buf.patch). - -- BLUEGENE SPECIFIC - added load function to smap to load an already - exsistant bluegene.conf file. - -- Fix bug in sacct: If user requests specific job or job step ID, - only the last one with that ID will be reported. If multiple - nodes fail, the job has its state recorded as "JOB_TERMINATED...nf" - (Andy Riebs, HP, slurm.hp.sacct_dup.patch). - -- Fix some inconsistencies in sacct's help message (Andy Riebs, HP, - slurm.hp.sacct_help.patch). - -- Validate input to sacct command and allows embedded spaces in - arguments (Andy Riebs, HP, slurm.hp.sacct_validate.patch). - -* Changes in SLURM 0.7.0-pre8 -============================= - -- BGL specific -- bug fix for smap configure function down configuration - -- Add support for job suspend/resume. - -- Add slurmd cache for group IDs (Takao Hatazaki, HP). - -- Fix bug in processing of "#SLURM" batch script option parsing. - -* Changes in SLURM 0.7.0-pre7 -============================= - -- Fix issue with NODE_STATE_COMPLETING, could start job on node before - epilog completed. - -- Added some infrastructure for job suspend/resume (scontrol, api, and - slurmctld stub). - -- Set job's num_procs to the actual processor count allocated to the job. - -- Fix bug in HAVE_FRONT_END support for cluster emulation. - -* Changes in SLURM 0.7.0-pre6 -============================= - -- Added support for task affinity for binding tasks to CPUs (Daniel - Palermo, HP). - -- Integrate task affinity support with configuration, add validation - test. - -* Changes in SLURM 0.7.0-pre5 -============================= - -- Enhanced performance and debugging for slurmctld reconfiguration. - -- Add "scontrol update Jobid=# Nice=#" support. - -- Basic slurmctld and tool functionality validated to 16k nodes. - -- squeue and smap now display correct info for jobs in bluegene enviornment. - -- Fix setting of SLURM_NODELIST for batch jobs. - -- Add SubmitTime to job information available for display. - -- API function slurm_confirm_allocation() has been marked OBSOLETE - and will go away in some future version of SLURM. Use - slurm_allocation_lookup() instead. - -- New API calls slurm_signal_job and slurm_signal_job_step to send - signals directly to the slurmds without triggering the shutdown sequence. - -- remove "uid" from old_job_alloc_msg_t, no longer needed. - -- Several bug fixes in maui scheduler plugin from Dave Jackon - (Cluster Resources). - -* Changes in SLURM 0.7.0-pre4 -============================= - -- Remove BNR libary functions and add those for PMI (KVS and basic - MPI-1 functions only for now) - -- Added Hostfile support for POE and srun. MP_HOSTFILE env var to set - location of hostfile. Tasks will run from list order in the file. - -- Removes the slurmd's use of SysV shared memory. Instead the slurmd - communicates with the slurmstepd processes through the slurmstepd's - new named unix domain socket. The "stepd_api" is used to talk to the - slurmstepd (src/slurmd/common/stepd_api.[ch]). - -- Bluegene specific - bluegene block allocator will find most any - partition size now. Added support to start at any point in smap - to request a partition instead of always starting at 000. - -- Bluegene specific - Support to smap to down or bring up nodes in - configure mode. Added commands include allup, alldown, - up [range], down [range] - -- Time format in sinfo/squeue/smap/sacct changed from D:HH:MM:SS to - D-HH:MM:SS per POSIX standards document. - -- Treat scontrol update request without any requested changes as an - error condition. - -- Bluegene plugin renamed with BG instead of BGL. partition_allocator moved - into bluegene plugin and renamed block_allocator. Format for bluegene.conf - file changed also. Read bluegene html page. Code is backwards compatable - smap will generate in new form - -- Add srun option --nice to give user some control over job priority. - -* Changes in SLURM 0.7.0-pre3 -============================= - -- Restructure node states: DRAINING and DRAINED states are replaced - with a DRAIN flag. COMPLETING state is changed to a COMPLETING flag. - -- Test suite moved into testsuite/expect from separate repository. - -- Added new document describing slurm APIs (doc/html/api.html). - -- Permit nodes to be in multiple partitions simultaneously. - -* Changes in SLURM 0.7.0-pre2 -============================= - -- New stdio protocol. Now srun has just a single TCP stream to each node - of a job-step. srun and slurmd comminicate over the TCP stream using a - simple messaging protocol. - -- Added task plugin and use task prolog/epilog(s). - -- New slurmd_step functionality added. Fork exec instead of using shared - memory. Not completely tested. - -- BGL small partition logic in place in plugin and smap. Scheduler needs - to be rewritten to handle multiple partitions on a single node. No - documentation written on process yet. - -- If running select/bluegene plugin without access to BGL DB2, then - full-system bglblock is of system size defined in bluegene.conf. - -* Changes in SLURM 0.7.0-pre1 -============================= - -- Support defered initiation of job (e.g. srun --begin=11:30 ...). - -- Add support for srun --cpus-per-task through task allocation in - slurmctld. - -- fixed partition_allocator to work without curses - -- made change to srun to start message thread before other threads - to make sure localtime doesn't interfere. - -- Added new RPCs for slurmctld REQUEST_TERMINATE_JOB or TASKS, - REQUEST_KILL_JOB/TASKS changed to REQUEST_SIGNAL_JOB/TASKS. - -- Add support for e-mail notification on job state changes. - -- Some infrastructure added for task launch controls (slurm.conf: - TaskProlog, TaskEpilog, TaskPlugin; srun --task-prolog, --task-epilog). - -* Changes in SLURM 0.6.11 -========================= - -- Fix bug in sinfo partition sorting order. - -- Fix bugs in srun use of #SLURM options in batch script. - -- Use full Elan credential space rather than re-using credentials as soon - as job step completes (helps with fault-tolerance). - -* Changes in SLURM 0.6.10 -========================= - -- Fix for slurmd job termination logic (could hang in COMPLETING state). - -- Sacct bug fixes: Report correct user name for job step, show "uid.gid" - as fifth field of job step record (Andy Riebs, slurm.hp.sacct_uid.patch). - -- Add job_id to maui scheduler plugin start job status message. - -- Fix for srun's handling of null characters in stdout or stderr. - -- Update job accounting for larger systems (Andy Riebs, uptodate.patch). - -- Fixes for proctrack/linuxproc and mpich-gm support (Takao Hatazaki, HP). - -- Fix bug in switch/elan for large task count job having irregular task - distribution across nodes. - -* Changes in SLURM 0.6.9 -======================== - -- Fix bug in mpi plugin to set the ID correctly - -- Accounting bug causing segv fixed (Andy Riebs, 14oct.jobacct.patch) - -- Fix for failed launch of a debugged job (e.g. bad executable name). - -- Wiki plugin fix for tracking allocated nodes (Ernest Artiaga, BSC). - -- Fix memory leaks in slurmctld and federation plugin. - -- Fix sefault in federation plugin function fed_libstate_clear(). - -- Align job accounting data (Andy Riebs, slurm.hp.unal_jobacct.patch) - -- Restore switch state in backup controller restarts - -* Changes in SLURM 0.6.8 -======================== - -- Invalid AllowGroup value in slurm.conf to not cause seg fault. - -- Fix bug that would cause slurmctld to seg-fault with select/cons_res - and batch job containing more than one step. - -* Changes in SLURM 0.6.7 -======================== - -- Make proctrack/linuxproc thread safe, could cause slurmd seg fault. - -- Propagate umask from srun to spawned tasks. - -- Fix problem in switch/elan error handling that could hang a slurmd - step manager process. - -- Build on AIX with -bmaxdata:0x70000000 for memory limit more than 256MB. - -- Restore srun's return code support. - -* Changes in SLURM 0.6.6 -======================== - -- Fix for bad socket close() in the spawn-io code. - -* Changes in SLURM 0.6.5 -======================== - -- Sacct to report on job steps that never actually started. - -- Added proctrack/rms to elan rpm. - -- Restructure slurmctld/agent.c logic to insure timely reaping of - terminating pthreads. - -- Srun not to hang if job fails before task launches not all completed. - -- Fix for consumable resources properly scheduling nodes that have more - nodes than configured (Susanne Balle, HP, cons_res_patch.10.14.2005) - -* Changes in SLURM 0.6.4 -======================== - -- Bluegene plugin drains an entire bglblock on repeated boot failures - only if it has not identified a specific node as being bad. - -* Changes in SLURM 0.6.3 -======================== - -- Fix slurmctld mem leaks (step name and hostlist struct). - -- Bluegene plugin sets end time for job terminated due to removed - bglblock. - -* Changes in SLURM 0.6.2 -======================== - -- Fix sinfo and squeue formatting to properly handle slurm nodes, - jobs, and other names containing "%". - -* Changes in SLURM 0.6.1 -======================== - -- Fixed smap -Db to display slurm partitions correctly (take 2). - -- Add srun fork() retry logic for very heavily loaded system. - -- Fix possible srun hang on task launch failure. - -- Add support for mvapich v0.9.4, 0.9.5 and gen2. - -* Changes in SLURM 0.6.0 -======================== - -- Add documentation for ProctrackType=proctrack/rms. - -- Make proctrack/rms be the default for switch/elan. - -- Do not preceed SIGKILL or SIGTERM to job step with (non-requested) SIGCONT. - -- Fixed smap -Db to display slurm partitions correctly. - -- Explicitly disallow ProctrackType=proctrack/linuxproc with - SwitchType=switch/elan. They will not work properly together. - -* Changes in SLURM 0.6.0-pre8 -============================= - -- Remove debugging xassert in switch/federation that were accidentally - committed - -- Make slurmd step manager retry slurm_container_destroy() indefinitely - instead of giving up after 30 seconds. If something prevents a job - step's processes from being killed, the job will be stuck in the - completing until the container destroy succeeds. - -* Changes in SLURM 0.6.0-pre7 -============================= - -- Disable localtime_r() calls from forked processes (semaphore set - in another pthread can deadlock calls to localtime_r made from - the forked process, this will be properly fixed in the next - major release of SLURM). - -- Added SLURM_LOCALID environment variable for spawned tasks - (Dan Palermo, HP). - -- Modify switch logic to restore state based exclusively upon - recovered job steps (not state save file). - -- Gracefully refuse job if there are too many job steps in slurmd. - -- Fix race condition in job completion that can leave nodes in - COMPLETING state after job is COMPLETED. - -- Added frees for BGL BrigeAPI strdups that were to this point unknown. - -- smap scrolls correctly for BGL systems. - -- slurm_pid2jobid() API call will now return the jobid for a step - manager slurmd process. - -* Changes in SLURM 0.6.0-pre6 -============================= - -- Added logic to return scheduled nodes to Maui scheduler (David - Jackson, Cluster Resources) - -- Fix bug in handling job request with maximum node count. - -- Fix node selection scheduling bug with heterogeneous nodes and - srun --cpus-per-task option - -- Generate error file to note prolog failures. - -* Changes in SLURM 0.6.0-pre5 -============================= - -- Modify sfree (BGL command) so that --all option no longer requires - an argument. - -- Modify smap so it shows all nodes and partitions by default (even - nodes that the user can't access, otherwise there are holes in - its maps). - -- Added module to parse time string (src/common/parse_time.c) for - future use. - -- Fix BlueGene hostlist processing for non-rectangular prisms and - add string length checking. - -- Modify orphan batch job time calculation for BGL to account for - slowness when booting many bglblocks at the same time. - -* Changes in SLURM 0.6.0-pre4 -============================= - -- Added etc/slurm.epilog.clean to kill processes initiated outside of - slurm when a user's last job on a node terminates. - -- Added config.xml and configurator.html files for use by OSCAR. - -- Increased maximum job step count from 64 to 130 for BGL systems only. - -* Changes in SLURM 0.6.0-pre3 -============================= - -- Add code so job request for shared nodes gets explicitly requested - nodes, but lightly loaded nodes otherwise. - -- Add job step name field. - -- Add job step network specification field. - -- Add proctrack/rms plugin - -- Change the proctrack API to send a slurmd_job_t pointer to both - slurm_container_create() and slurm_container_add(). One of those - functions MUST set job->cont_id. - -- Remove vestigial node_use (virtual or coprocessor) field from job - request RPC. - -- Fix mpich-gm bugs, thanks to Takao Hatazaki (HP). - -- Fix code for clean build with gcc 2.96, Takao Hatazaki (HP). - -- Add node update state of "RESUME" to return DRAINED, DRAINING, or - DOWN node to service (IDLE or ALLOCATED state). - -- smap keeps trying to connect to slurmctld in iterative mode rather - than just aborting on failure. - -- Add squeue option --node to filter by node name. - -- Modify squeue --user option to accept not only user names, but also - user IDs. - -* Changes in SLURM 0.6.0-pre2 -============================= - -- Removed "make rpm" target. - -* Changes in SLURM 0.6.0-pre1 -============================= - -- Added bgl/partition_allocator/smap changes from 0.5.7. - -- Added configurable resource limit propagation (Daniel Christians, HP). - -- Added mpi plugin specify at start of srun. - -- Changed SlurmUser ID from 16-bit to 32-bit. - -- Added MpiDefault slurm.conf parameter. - -- Remove KillTree configuration parameter (replace with - "ProctrackType=proctrack/linuxproc") - -- Remove MpichGmDirectSupport configuration parameter (replace with - "MpiDefault=mpich-gm") - -- Make default plugin be "none" for mpi. - -- Added mpi/none plugin and made it the default. - -- Replace extern program_invocation_short_name with program_invocation_name - due to short name being truncated to 16 bytes on some systems. - -- Added support for Elan clusters with different CPU counts on nodes - (Chris Holmes, HP). - -- Added Consumable Resources web page (Susanne Balle, HP). - -- "Session manager" slurmd process has been eliminated. - -- switch/federation fixes migrated from 0.5.* - -- srun pthreads really set detached, fixes scaling problem - -- srun spawns message handler process so it can now be stopped (via - Ctrl-Z or TotalView) without inducing failures. - -* Changes in SLURM 0.5.7 -======================== - -- added infrastructure for (eventual) support of AIX checkpointing - of slurm batch and interactive poe jobs - -- added wiring for BGL to do wiring for physical location first and then - logical. - -- only one thread used to query database before polling thread is there. - -* Changes in SLURM 0.5.6 -======================== - -- fix for BGL hostnames and full system partition finding - -* Changes in SLURM 0.5.5 -======================== - -- Increase SLURM_MESSAGE_TIMEOUT_MSEC_STATIC to 15000 - -- Fix for premature timeout in _slurm_send_timeout - -- Fix for federation overlapping calls to non-thread-safe _get_adapters - -* Changes in SLURM 0.5.4 -======================== - -- Added support for no reboot for VN to CO on BGL - -- Fix for if a job starts after it finishes on BGL - -* Changes in SLURM 0.5.3 -======================== - -- federation patch so the slurm controller has sane window status at - start-up regardless of the window status reported in the slurmd - registration. - -- federation driver exits with fatal() if the federation driver can not - find all of the adapters listed in the federation.conf - -* Changes in SLURM 0.5.2 -======================== - -- Extra federation driver sanity checks - -* Changes in SLURM 0.5.1 -======================== - -- Fix federation driver bad free(), other minor fed fixes - -- Allow slurm to parse very long lines in the slurm.conf - -* Changes in SLURM 0.5.0 -======================== - -- Fix race condition in job accouting plugin, could hang slurmd - -- Report SlurmUser id over 16 bits as an error (fix on v0.6) - -* Changes in SLURM 0.5.0-pre19 -============================== - -- Fix memory management bug in federation driver - -* Changes in SLURM 0.5.0-pre18 -============================== - -- elan switch plugin memory leak plugged - -- added g_slurmctld_jobacct_fini() to release all memory (useful - to confirm no memory leaks) - -- Fix slurmd bug introduced in pre17 - -* Changes in SLURM 0.5.0-pre17 -============================== - -- slurmd calls the proctrack destroy function at job step completion - -- federation driver tries harder to clean up switch windows - -- BGL wiring changes - -* Changes in SLURM 0.5.0-pre16 -============================== - -- Check slurm.conf values for under/overflows (some are 16 bit values). - -- Federation driver clears windows at job step completion - -- Modify code for clean build with gcc v4.0 - -- New SLURM_NETWORK environmant variable used by slurm_ll_api - -* Changes in SLURM 0.5.0-pre15 -============================== - -- Added "network" field to "scontrol show job" output. - -- Federation fix for unfreed windows when multiple adapters on - one node use the same LID - -* Changes in SLURM 0.5.0-pre14 -============================== - -- RDMA works on fed plugin. - -* Changes in SLURM 0.5.0-pre13 -============================== - -- Major mods to support checkpoint on AIX. - -- Job accounting documenation expanded, added tuning options, minor bug fixes - -- BGL wiring will now work on <= 4 node X-dim partitions and also 8 node - X-dim partitions. - -- ENV variables set for spawning jobs. - -- jobacct patch from HP to not erroneously lock a mutex in the - jobacct_log plugin. - -- switch/federation supports multiple adapters per task. sn_all behaviour - is now correct, and it also supports sn_single. - -* Changes in SLURM 0.5.0-pre12 -============================== - -- Minor build changes to support RPM creation on AIX - -* Changes in SLURM 0.5.0-pre11 -============================== - -- Slurmd tests for initialized session manager (user's) slurmd pid before - killing it to avoid killing system daemon (race condition). - -- srun --output or --error file names of "none" mapped to /dev/null for - batch jobs rather than a file actually named "none". - -- BGL: don't try to read bglblock state until they are all created to - avoid having BGL Bridge API seg fault. - -* Changes in SLURM 0.5.0-pre10 -============================== - -- Fix bug that was resetting BGL job geometry on unrelated field update. - -- squeue and sinfo print timestamp in interate mode by default. - -- added scrolling windows in smap - -- introduced new variable to start polling thread in the bluegene plugin. - -- Latest accounting patches from Riebs/HP, retry communications. - -- Added srun option --kill-on-bad-exit from Holmes/HP. - -- Support large (64-bit address) log files where possible. - -- Fix problem of signals being delivered twice to tasks. Note that as - part of the fix the slurmd session manger no longer calls setsid to - create a new session. - -* Changes in SLURM 0.5.0-pre9 -============================= - -- If a job and node are in COMPLETING state and slurmd stops responding for - SlurmdTimeout, then set the node DOWN and the job COMPLETED. - -- Add logic to switch/elan to track contexts allocated to active job steps - rather than just using a cyclic counter and hoping to avoid collisions. - -- Plug memory leak in freeing job info retrieved using API. - -- Bluegene Plugin handles long deallocating states from driver 202. - -- Fix bug in bitfmt2int() which can go off allocated memory. - -* Changes in SLURM 0.5.0-pre8 -============================= - -- BlueGene srun --geometry was not getting propogated properly. - -- Fix race condition with multiple simultaneous epilogs. - -- Modify slurmd to resend job completion RPC to slurmctld in the - case where slurmctld is not responding. - -- Updated sacct: handle cancelled jobs correctly, add user/group - output, add ntasks ans synonym for nprocs, display error field - by default, display ncpus instead of nprocs - -- Parallelization of queing jobs up to 32 at once. Variable - MAX_AGENT_COUNT used in bgl_job_run.c to specify. - -- bgl_job_run.c fixed threading issue with uid_to_string use. - -* Changes in SLURM 0.5.0-pre7 -============================= - -- Preserve next_job_id across restarts. - -- Add support for really long job names (256 bytes). - -- Add configuration parameter SchedulerRootFilter to control what - entity manages prioritization of jobs in RootOnly partition - (internal scheduler plugin or external entity). - -- Added support for job accounting. - -- Added support for consumable resource based node scheduling. - -- Permit batch job to be launched to re-existing allocation. - -* Changes in SLURM 0.5.0-pre6 -============================= - -- Load bluegene.conf and federation.conf based upon SLURM_CONF env - var (if set). - -- Fix slurmd shutdown signal synchronization bug (not consistently - terminating). - -- Add doc/html/ibm.html document. Update bluegene.html. - -- Add sfree to bluegene plugin. - -- Remove geometry[SYSTEM_DIMENSIONS] from opaque node_select data - type if SYSTEM_DIMENSIONS==0 (not ASCI-C compliant). - -- Modify smap to test for valid libdb2.so before issuing any BGL - Bridge API calls. - -- Modify spec file for optional inclusion of select_bluegene and - sched_wiki plugin libraries. - -- Initialize job->network in data structure, could cause job - submit/update to fail depending upon what is left on stack. - -* Changes in SLURM 0.5.0-pre5 -============================= - -- Expand buffer to hold node_select info in job termination log. - -- Modify slurmctld node hashing function to reduce collisions. - -- Treat bglblock vanishing as fatal error for job, prolog and epilog - exit immediately. - -- bug fix for following multiple X-dim partitions - -* Changes in SLURM 0.5.0-pre4 -============================= - -- Fix bug in slurmd that could double KillWait time on job timeout. - -- Fix bug in srun's error code reporting to slurmctld, could DOWN - a node if job run as root has non-zero error code. - -- Remove a node's partition info when removed from existing partition. - -- Use proctrack plugin to call all processes in a job step before - calling interconnect_postfini() to insure no processes escape from - job and prevent switch windows from being released. - -- Added mail.html web page telling how to get on slurm mailing lists. - -- Added another directory to search for DB2 files on BGL system. - -- Added overview man page slurm.1. - -- Added new configure option "--with-db2-dir=PATH" for BGL. - -* Changes in SLURM 0.5.0-pre3 -============================= - -- Merge of SLURM v0.4-branch into v0.5/HEAD. - -* Changes in SLURM 0.5.0-pre2 -============================= - -- Fix bug in srun to clean-up upon failure of an allocated node - (srun -A would generate a segmentation fault, Chris Holmes, HP). - -- If slurmd's node name is mapped to NULL (due to bad configuration) - terminate slurmd with a fatal error and don't crash slurmctld. - -- Add SLURMD_DEBUG env var for use with AIX/POE in spawn_task RPC. - -- Always pack job's "features" for access by prolog/epilog - -* Changes in SLURM 0.5.0-pre1 -============================= - -- Add network option to srun and job creation API for specification - of communication protocol over IBM Federation switch. - -- Add new slurm.conf parameter ProctrackType (process tracking) and - associated plugin in the slurmd module. - -- Send node's switch state with job epilog completion RPC and - node registration (only when slurmd starts, not on periodic - registrtions). - -- Add federation switch plugin. - -- Add new configuration keyword, SchedulerRootFilter, to control - external scheduler control of RoolOnly partition (Chris Holmes, HP). - -- Modify logic to set process group ID for spawned processes (last - patch from slurm v0.3.11). - -- "srun -A" modified to return exit code of last command executed - (Chris Holmes, HP). - -- Add support for different slurm.conf files controlled via SLURM_CONF - env var (Brian O'Sullivan, pathscale) - -- Fix bug if srun given --uid without --gid option (Chris Holmes, HP). - -* Changes in SLURM 0.4.24 -========================= - -- DRAIN nodes with switches on base partitions are in ERROR, MISSING, - or DOWN states. - -* Changes in SLURM 0.4.23 -========================= - -- Modified bluegene plugin to only sync bglblocks to jobs on initial - startup, not on reconfig. Fixes race condition. - -- Modified bluegene plugin to work with 141 driver. Enabling it to - only have to reboot when switching from coproc -> virtual and back. - -- added support for a full system partition to make sure every other - partition is free and vice-verse. - -- smap resizing issue fixed. - -- change prolog not to add time when a partition is in deallocating - state. - -- NOTE: This version of SLURM requires BGL driver 141/2005. - -* Changes in SLURM 0.4.22 -========================= - -- Modified bluegene plugin to not do anything if the bluegene.conf file - is altered. - -- added checking for lists before trying to create iterator on the list. - -* Changes in SLURM 0.4.21 -========================= - -- Fix in race condition with time in Status Thread of BGL - -- Fix no leading zeros in smap output. - -* Changes in SLURM 0.4.20 -========================= - -- Smap output is more user friendly with -c option - -* Changes in SLURM 0.4.19 -========================= - -- Added new RPCs for getting bglblock state info remotely and cache data - within the plugin (permits removal of DB2 access from BGL FEN and - dramatically increases smap responsivenss, also changed prolog/epilog - operation) - -- Move smap executable to main slurm RPM (from separate RPM). - -- smap uses RPC instead of DB2 to get info about bgl partitions. - -- Status function added to bluegene_agent thread. Keeps current state - of BGL partitions updating every second. will handle multiple attempts - at booting if booting a partition fails. - -* Changes in SLURM 0.4.18 -========================= - -- Added error checking of rm_remove_partition calls. - -- job_term() was terminating a job in real time rather than - queueing the request. This would result in slurmctld hanging - for many seconds when a job termination was required. - -* Changes in SLURM 0.4.17 -======================== - -- Bug fixes from testing .16. - -* Changes in SLURM 0.4.16 -======================== - -- Added error checking to a bunch of Bridge API calls and more - gracefully handle failure modes. - -- Made smap more robust for more jobs. - -* Changes in SLURM 0.4.15 -======================== - -- Added error checking to a bunch of Bridge API calls and more - gracefully handle failure modes. - -* Changes in SLURM 0.4.14 -======================== - -- job state is kept on warm start of slurm - -* Changes in SLURM 0.4.13 -======================== - -- epilog fix for bgl plugin - -* Changes in SLURM 0.4.12 -======================== - -- bug shot for new api calls. - -- added BridgeAPILogFile as an option for bluegene.conf file - -* Changes in SLURM 0.4.11 -======================== - -- changed as many rm_get_partition() to rm_get_partitions_info as we could - for time saving. - -* Changes in SLURM 0.4.10 -======================== - -- redesign for BGL external wiring. - -- smap display bug fix for smaller systems. - -* Changes in SLURM 0.4.9 -======================== - -- setpnum works now, have to include this in bluegene.conf - -* Changes in SLURM 0.4.8 -======================== - -- Changed the prolog and the epilog to use the env var MPIRUN_PARTITION - instead of BGL_PARTITION_ID - -* Changes in SLURM 0.4.7 -======================== - -- Remove some BGL specific headers that IBM now distributes, NOTE - BGL driver 080 or greater required. - -- Change autogen.sh to deal with problems running autoconf on one - system and configure on another with different software versions. - -* Changes in SLURM 0.4.6 -======================== - -- smap now works on non-BGL systems. - -- took tv.h out of partition_allocator so it would work withn driver 080 - from IBM. - -- updated slurmd signal handling to prevent possible user killing of daemon. - -* Changes in SLURM 0.4.5 -======================== - -- Change sinfo default time limit field to have 10 bytes (up from 9). - -- Fix bug in bluegene partition selection (sorting bug). - -- Don't display any completed jobs in smap. - -- Add NodeCnt to filetxt job completion plugin. - -- Minor restructuring of how MMCS is polled for DOWN nodes and switches. - -- Fix squeue output format for "%s" (node select data). - -- Queue job requesting more resources than exist in a partition if - that partition's state is DOWN (rather than just abort it). - -- Add prolog/epilog for bluegene to code base (moved from mpirun in CVS) - -- Add prolog, epilog and bluegene.conf.example to bluegene RPM - -- In smap, Admin can get the Rack/midplane id from an XYZ input and vice versa. - -- Add smap line-oriented output capability. - -* Changes in SLURM 0.4.4 -======================== - -- Fix race condition in slurmd seting pgid of spawned tasks for - process tracking. - -- Fix scontrol reconfig does nothing to running jobs nor crash the system - -- Fix sort of bgl_list only happens once in select_bluegene.c instead of every - time a new job is inserted. - -* Changes in SLURM 0.4.3 -======================== - -- Turn off some RPM build checks (bug in RPM, see slurm.spec.in) - -- starting slurmctrld will destroy all RMP*** partitions everytime. - -* Changes in SLURM 0.4.2 -======================== - -- Fix memory leak in BlueGene plugin. - -- Srun's --test-only option takes precedence over --batch option. - -- Add sleep(1) after setting bglblock owner due to apparent race condition - in the BGL API. - -- Slurm was timing out and killing batch jobs if the node registered when - a job prolog was still running. - -* Changes in SLURM 0.4.1 -======================== - -- BlueGene plugin kills jobs running in defunct bglblock on restart. - -- Smap displays pending jobs now, in addition to running and completing jobs. - -- Remove node "use=" from bluegene.conf file, create both coprocessor and - virtual bglblocks for now (later create just one and use API to change - it when such an API is available). - -- Add "ChangeNumpsets" parameter to bluegene.conf to use script to - update the numpsets parameter for newly created bglblocks (to be - removed once the API functions). - -- Add all patches from slurm v0.3.11 (through 2/7/2005) - - Added srun option --disable-status,-X to disable srun status feature - and instead forward SIGINT immediately to job upon receipt of Ctrl-C. - - Fix for bogus slurmd error message "Unable to put task N into pgrp..." - - Fix case where slurmd may erroneously detect shared memory entry - as "stale" and delete entry for unkillable or slow-to-exit job. - - (qsnet) Fix for running slurmd on node without and elan3 adapter. - - Fix for reported problem: slurm/538: user tasks block writing to stdio - -* Changes in SLURM 0.4.0 -======================== - -- Minor tweak to init.d/slurm for BlueGene systems. - -- Added smap RPM package (to install binary built on BlueGene - service node on front-end nodes). - -- Added wait between bglblock destroy and creation of new blocks - so that MMCS can complete the operation. - -- Fix bug in synchronizing bglblock owners on slurmctld restart. - -* Changes in SLURM 0.4.0-pre11 -============================== - -- Add new srun option "--test-only" for testing slurm_job_will_run API. - -- Fix bugs in slurm_job_will_run() processing. - -- Change slurm_job_will_run() to not return a message, just an error code. - -- Sync partition owners with running jobs on slurmctld restart. - -* Changes in SLURM 0.4.0-pre10 -============================== - -- Specify number of I/O nodes associated with BlueGene partition. - -- Do not launch a job's tasks if the job is cancelled while its - prolog is running (which can be slow on BlueGene). - -- Add new error code, ESLURM_BATCH_ONLY for attepts to launch - job steps on front-end system (e.g. Blue Gene). - -- Updates to html documents. - -- Assorted fixes in smap, partition creation mode. - -- Add proper support for "srun -n" option on BGL recognizing - processor count in both virual and coprocessor modes. - -- Make default node_use on Blue Gene be coprocessor, as documented. - -- Add SIGKILL to BlueGene jobs as part of cleanup. - -* Changes in SLURM 0.4.0-pre9 -============================= - -- Change in /etc/init.d/slurm for RedHat and Suze compatability - -* Changes in SLURM 0.4.0-pre8 -============================= - -- Add logic to create and destroy Bluegene Blocks automatically as needed. - -- Update smap man page to include Bluegene configuration commands. - -* Changes in SLURM 0.4.0-pre7 -============================= - -- Port all patches from slurm v0.3 up through v0.3.10: - - Remove calls in auth/munge plugin deprecated by munge-0.4. - - Allow single task id to be selected with --input, --output, and --error. - - Create shared memory segment for Elan statistics when using the - switch/elan plugin. - - More fixes necessary for TotalView. - -* Changes in SLURM 0.4.0-pre6 -============================= - -- Add new job reason value "JobHeld" for jobs with priority==0 - -- Move startup script from "/etc/rc.d/init.d/slurm" to "/etc/init.d/slurm" - -- Modify prolog/epilog logic in slurmd to accomodate very long run times, - on BGL these scripts wait for events that can take a very long time - (tens of seconds). - -- This code base was used for BGLb acceptance test with pre-defined - BGL blocks. - -* Changes in SLURM 0.4.0-pre5 -============================= - -- select/bluegene plugin confirms db.properties file in $sysconfdir - and copies it to StateSaveLocation (slurmctld's working directory) - -- select/bluegene plugin confirms environment variable required for - DB2 interaction are set (execute "db2profile" script before slurmctld) - -- slurmd to always give jobs KillWait time between SIGTERM and SIGKILL - at termination - -- set job's start_time and end_time = now rather than leaving zero if - they fail to execute - -- modify srun to forward SIGTERM - -- enable select/bluegene testing for DOWN nodes and switches - -- select/bluegene plugin to delete orphan jobs, free BGLblocks and - set owner as jobs terminate/start - -* Changes in SLURM 0.4.0-pre4 -============================= - -- Fixes for reported problems: - - slurm/512: Let job steps run on DRAINING nodes - - slurm/513: Gracefully deal with UIDs missing from passwd file - -- Add support for MPICH-GM (from takao.hatazaki@hp.com) - -- Add support for NodeHostname in node configuration - -- Make "scontrol show daemons" function properly on front-end system - (e.g. Blue Gene) - -- Fix srun bug when --input, --output and --error are all "none" - -- Don't schedule jobs for user root if partition is DOWN - -- Modify select/bluegene to honor job's required node list - -- Modify user name logic to explicitly set UID=0 to "root", - Suse Linux was not handling multiple users with UID=0 well. - -* Changes in SLURM 0.4.0-pre3 -============================= - -- Send SIGTERM to batch script before SIGKILL for mpirun cleanup on - Blue Gene/L - -- Create new allocation as needed for debugger in case old allocation - has been purged - -- Add Blue Gene User Guide to html documents - -- Fix srun bug that could cause seg fault with --no-shell option if not - running under a debugger - -- Propogate job's task count (if set) for batch job via SLURM_NPROCS. - -- Add new job parameters for Blue Gene: geometry, rotate, mode (virtual - or co-processor), communications type (mesh or torus), and partition ID. - -- Exercise a bunch of new switch plugin functions for Federation - switch support. - -- Fix bug in scheduling jobs when a processor count is specified - and FastSchedule=0 and the cluster is heterogeneous. - -* Changes in SLURM 0.4.0-pre2 -============================= - -- NOTE: "startclean" when transitioning from version 0.4.0-pre1, JOBS ARE LOST - -- Fixes for reported problems: - - slurm/477: Signal of batch job script (scancel -b) fixed - - slurm/481: Permit clearing of AllowGroups field for a partition - - slurm/482: Adjust Elan base context number to match RMS range - - slurm/489: Job completion logger was writing NULL to text file - -- Preserve job's requested processor count info after job is initiated - (for viewing by squeue and scontrol) - -- srun cancels created job if job step creation fails - -- Added a lots of Blue Gene/L support logic: slurmd executes on a single - node to front-end the 512-CPU base-partitions (Blue Gene/L's nodes) - -- Add node selection plugin infrastructure, relocate existing logic - to select/linear, add configuration parameter SelectType - -- Modify node hashing algorithm for better performance on Blue Gene/L - -- Add ability to specify node ranges for 3-D rectangular prism - -* Changes in SLURM 0.4.0-pre1 -============================= - -- NOTE: "startclean" when transitioning from version 0.3, JOBS ARE LOST - -- Added support for job account information (arbitrary string) - -- Added support for job dependencies (start job X after job Y completes) - -- Added support for configuration parameter CheckpointType - -- Added new job state "CANCELLED" - -- Don't strip binaries, breaks parallel debuggers - -- Fix bug in Munge authentication retry logic - -- Change srun handling of interupts to work properly with TotalView - -- Added "reason" field to job info showing why a job is waiting to run - -* Changes in SLURM 0.3.7 -======================== - -- Fixes required for TotalView operability under RHEL3.0 - (Reported by Dong Ahn <dahn@llnl.gov>) - - Do not create detached threads when running under parallel debugger. - - Handle EINTR from sigwait(). - -* Changes in SLURM 0.3.6 -======================== - -- Fixes for reported problems: - - slurm/459: Properly support partition's "Shared=force" configuration. - -- Resync node state to DRAINED or DRAINING on restart in case job - and node state recovered are out of sync. - -- Added jobcomp/script plugin (execute script on job completion, - from Nathan Huff, North Dakota State University). - -- Added new error code ESLURM_FRAGMENTED for immediate resource - allocation requests which are refused due to completing job (formerly - returned ESLURM_NOT_TOP_PRIORITY) - -- Modified job completion logging plugin calling sequence. - -- Added much of the infrastructure required for system checkpoint - (APIs, RPCs, and NULL plugin) - -* Changes in SLURM 0.3.5 -======================== - -- Fix "SLURM_RLIMIT_* not found in environment" error message when - distributing large rlimit to jobs. - -- Add support for slurm_spawn() and associated APIs (needed for IBM - SP systems). - -- Fix bug in update of node state to DRAINING/DRAINED when update - request occurs prior to initial node registration. - -- Fix bug in purging of batch jobs (active batch jobs were being - improperly purged starting in version 0.3.0). - -- When updating a node state to DRAINING/DRAINED a Reason must be - provided. The user name and a timestamp will automatically be - appended to that Reason. - -* Changes in SLURM 0.3.4 -======================== - -- Fixes for reported problems: - - slurm/404: Explicitly set pthread stack size to 1MB for srun - -- Allow srun to respond to ctrl-c and kill queued job while waiting - for allocation from controller. - -* Changes in SLURM 0.3.3 -======================== - -- Fix slurmctld handling of heterogeneous processor count on elan - switch (was setting DRAINED nodes in state DRAINING). - -- Fix sinfo -R, --list-reasons to list all relevant node states. - -- Fix slurmctld to honor srun's node configuration specifications - with FastSchedule==0 configuration. - -- Added srun option --debugger-test to confirm that slurm's debugger - infrastructure is operational. - -- Removed debugging hacks for srun.wrapper.c. Temporarily use - RPM's debugedit utility if available for similar effect. - -* Changes in SLURM 0.3.2 -======================== - -- The srun command wakes immeditely upon resource allocation (via new RPC) - rather than polling. - -- SLURM daemons log current version number at startup. - -- If slurmd can't respond to ping (e.g. paging is keeping it from - responding in a timely fashion) then send a registration RPC - to slurmctld. - -- Fix slurmd -M option to call mlockall() after daemonizing. - -- Add "slurm_" prefix to slurm's hostlist_ function man pages. - -- More AIX support added. - -- Change get info calls from using show_all to more general show_flags - with #define for SHOW_ALL flag. - -* Changes in SLURM 0.3.1 -======================== - -- Set SLURM_TASKS_PER_NODE env var for batch jobs (and LAM/MPI). - -- Fix for slurmd spinning when stdin buffers full (gnats:434) - -- Change some slurmctld malloc sizes to reduce demand for realloc calls, - improves performance and eliminates realloc failure on RH EL3 under - extremely heavy workload apparently due to memory fragmentation. - -- Fix scheduling logic for heterogeneous processor count. - -- Modify security_2_2 test to function with release 0.3 - -- Fix broken rpm build when libslurm not already installed. - -- New slurmd option -M to mlock() slurmd process into memory. - -- New srun option --no-shell causes srun to exit instead of spawning - shell when using --allocate, -A. - -- Modify srun --uid=user and --gid=group options to maintain invoking - user's credentials until after nodes have been allocated to requested - user/group (allows root to run jobs and allocate nodes for other users - in a RootOnly partition). - -- Fix node processing if state change requested via scontrol prior to - initial node registration. - -* Changes in SLURM 0.3.0 -======================== - -- Support for AIX added (a few bugs do remain). - -- Fix memory leak in slurmctld, slurm_cred_create(). - -- On ELF systems, export BNR_* functions from SLURM API. - -- Add support for "hidden" partitions (applies to their - nodes, jobs, and job steps as well). APIs and commands - modified to optionally display hidden partitions. - -- Modify partition's group_allow test to be based upon the user - of the allocation rather than the user making the allocation - request (user root for LCRM batch jobs). - -- Restructure plugin directory structure. - -- New --core=type option in srun for lightweight corefile support. - (requires liblwcf). - -- Let user root and SlurmUser exceed any partition limits. - -- Srun treats "--time=0" as a request for an infinite time limit. - -* Changes in SLURM 0.3.0.0-pre10 -================================ - -- Fix bugs in support of slurmctld "-f" option (specify different - slurm.conf pathname). - -- Remove slurmd "-f" option. - -- Several documenation changes for slurm administrators. - -- On ELF systems, export only slurm_* functions from slurm API and - ensure plugins use only slurm_ prefixed functions (created aliases - where necessary). - -- New srun option -Q, --quiet to suppress informational messages. - -- Fix bug in slurmctld's building of nodelist for job (failed if - more than one numeric field in node name). - -- Change "scontrol completing" and "sinfo" to use job's node bitmap - to identify nodes associated with that particular job that are - still processing job completion. This will work properly for - shared nodes. - -- Set SLURM_DISTRIBUTION environment varible for user tasks. - -- Fix for file descriptor leak in slurmd. - -- Propagate stacksize limit to jobs along with other resource limits - that were previously ignored. - -* Changes in SLURM 0.3.0.0-pre9 -=============================== - -- Restructure how slurmctld state saves are performed for better - scalability. - -- New sinfo option "--list-reason" or "-R". Displays down or drained - nodes along with their REASON field. - -* Changes in SLURM 0.3.0.0-pre8 -=============================== - -- Queue outgoing message traffic rather than immediately spawning - pthreads (under heavy load this resulted in hundreds of pthreads - using more memory than was available). - -- Restructure slurmctld message agent for higher throughput. - -- Add new sinfo options --responding and --dead (i.e. non-responding) - for filtering node states. - -- Fix bug in sinfo to properly process specified state filter including - "*" suffix for non-responding nodes. - -- Create StateSaveLocation directory if changes via slurmctld reconfig - -* Changes in SLURM 0.3.0.0-pre7 -=============================== - -- Fixes for reported problems: - - slurm/381: Hold jobs requesting more resources than partition limit. - - slurm/387: Jobs lost and nodes DOWN on slurmctld restart. - -- Add support for getting node's real memory size on AIX. - -- Sinfo sort partitions in slurm.conf order, new sort option ("#P"). - -- Document how to gracefully change plugin values. - -- Slurmctld does not attempt to recover jobs when the switch plugin - value changes (decision reached when any job's switch state recovery - fails). - -- Node does not transition from COMPLETING to DOWN state due to - not responding. Wait for tasks to complete or admin to set DOWN. - -- Always chmod SlurmdSpoolDir to 755 (a umask of 007 was resulting - in batch jobs failing). - -- Return errors when trying to change configuration parameters - AuthType, SchedulerType, and SwitchType via "scontrol reconfig" - or SIGHUP. Document how to safely change these parameters. - -- Plugin-specific error number definitions and descriptive strings - moved from common into plugin modules. - -- Documentation for writing scheduler, switch, and job completion - logging plugins added. - -- Added job and node state descriptions to the squeue and sinfo man pages. - -- Backup slurmctld to generate core file on SIGABRT. - -- Backup slurmctld to re-read slurm.conf on SIGHUP. - -- Added -q,--quit-on-interrupt option to srun. - -- Elan switch plugin now starts neterr resolver thread on all Elan3 - systems (QsNet and QsNetII). - -- Added some missing read locks for references for slurmctld's - configuration data structure - -- Modify processing of queued slurmctld message traffic to get better - throughput (resulted in job inactivity limit being reached improperly - when hundreds of jobs running simultaneously) - -* Changes in SLURM 0.3.0.0-pre6 -=============================== - -- Fixes for reported problems: - - slurm/372: job state descriptions added to squeue man page - -- Switch plugin added. Add "SwitchType=switch/elan" to slurm.conf for - systems with Quadrics Elan3 or Elan4 switches. - -- Don't treat DOWN nodes with too few CPUs as a fatal error on Elan - -- Major re-write of html documents - -- Updates to node pinging for large numbers of unresponsive nodes - -- Explicitly set default action for SIGTERM (action on Thunder was - to ignore SIGTERM) - -- Sinfo "--exact" option only applies to fields actually displayed - -- Partition processor count not correctly computed for heterogeneous - clusters with FastSchedule=0 configuration - -- Only return DOWN nodes to service if the reason for them being in - that state is non-responsiveness and "ReturnToService=1" configuration - -- Partition processor count now correctly computed for heterogeneous - clusters with FastSchedule configured off - -- New macros and function to export SLURM version number - -* Changes in SLURM 0.3.0.0-pre5 -=============================== - -- Fixes for reported problems: - - slurm/346: Support multiple colon-separated PluginDir values - -- Fix node state transition: DOWN to DRAINED (instead of DRAINING) - -- Fix a couple of minor slurmctld memory leaks - -* Changes in SLURM 0.3.0.0-pre4 -=============================== - -- Fix bug where early launch failures (such as invalid UID/GID) resulted - in jobs not terminating properly. - -- Initial support for BNR committed (not yet functional). - -- QsNet: SLURM now uses /etc/elanhosts exclusively for converting - hostnames to ElanIDs. - -* Changes in SLURM 0.3.0.0-pre3 -=============================== - -- Fixes for reported problems: - - slurm/328: Slurmd was restarting with a new shared memory segment and - losing track of jobs - - slurm/329: Job processing may be left running when one task dies - - slurm/333: Slurmd fails to launch a job and deletes a step, due to - a race condition in shared memory management - - slurm/334: Slurmd was getting a segv due to a race condition in shared - memory management - - slurm/342: Properly handle nodes being removed from configuration - even when there are partitions, nodes, or job steps still associated - with them - -- Srun properly terminates jobs/steps upon node failure (used to hang - waiting for I/O completion) - -- Job time limits enforced even if InactiveLimit configured as zero - -- Support the sending of an arbitrary signal to a batch script (but not - the processses in its job steps) - -- Re-read slurm configuration file whenever changed, needed by users - of SLURM APIs - -- Scancel was generating a assert failure - -- Slurmctld sends a launch response message upon scheduling of a queued - job (for immediate srun response) - -- Maui scheduler plugin added - -- Backfill scheduler plugin added - -- Batch scripts can now have arguments that are propogated - -- MPICH support added (via patch, not in SLURM CVS) - -- New SLURM environment variables added SLMR_CPUS_ON_NODE and - SLURM_LAUNCH_NODE_IPADDR, these provide support needed for LAM/MPI - (version 7.0.4+) - -- The TMPDIR directory is created as needed before job launch - -- Do not create duplicate SLURM environment variables with the same name - -- Insure proper enforcement of node sharing by job - -- Treat lack of SpoolDir or StateSaveDir as a fatal error - -- Quickstart.html guide expanded - -- Increase maximum jobs steps per node from 16 to 64 - -- Delete correct shared memory segment on slurmd -c (clean start) - -* Changes in SLURM 0.3.0.0-pre2 -=============================== - -- Fixes for reported problems: - - slurm/326: Properly clean-up jobs terminating on non-responding nodes - -- Move all configuration data structure into common/read_config, scontrol - now always shows default values if not specified in slurm.conf file - -- Remove the unused "Prioritize" configuration parameter - -* Changes in SLURM 0.3.0.0-pre1 -=============================== - -- Fixes for reported problems: - - slurm/252: "jobs left orphaned when using TotalView:" SLURM controller - now pings srun and kills defunct jobs. - - slurm/253: "srun fails to accept new IO connection." - - slurm/317: "Lack of default partition in config file causes errors." - - slurm/319: Socket errors on multiple simultaneous job launches fixed - - slurm/321: slurmd shared memory synchronization error. - -- Removed slurm_tv_clean daemon which has been obsoleted by slurm/252 fix. - -- New scontrol command ``delete'' and RPC added to delete a partition - -- Squeue can now print and sort by group id/name - -- Scancel has new option -q,--quiet to not report an error if a job - is already complete - -- Add the excluded node list to job information reported. - -- RPC version mis-match now properly handled - -- New job completion plugin interface added for logging completed jobs. - -- Fixed lost digit in scontrol job priority specification. - -- Remove restriction in the number of consecutive node sets (no longer - needed after DPCS upgrade) - -- Incomplete state save write now properly handled. - -- Modified slurmd setrlimit error for greater clarity. - -- Slurmctld performs load-leveling across shared nodes. - -- New user function added slurm_get_end_time for user jobs. - -- Always compile srun with stabs debug section when TotalView support - is requested. - -* Changes in SLURM 0.2.21 -========================= - -- Fixes for reported problems: - - slurm/253: Try using different port if connect() fails (was rarely - failing when an existing defunct connection was in TIME_WAIT state) - - slurm/300: Possibly killing wrong job on slurmd restart - - slurm/312: Freeing non-allocated memory and killing slurmd - -- Assorted changes to support RedHat Enterprise Linux 3.0 and IA64 - -- Initial Elan4 and libelanctrl support (--with-elan). - -- Slurmctld was sometimes inappropriately setting a job's priority - to 1 when a node was down (even if up nodes could be used for the - job when a running job completes) - -- Convert all user commands from use of popt library to getopt_long() - -- If TotalView support is requested, srun exports "totalview_jobid" - variable for `%J' expansion in TV bulk launch string. - -- Fix several locking bugs in slurmd IO layer. - -- Throttle back repetitious error messages in slurmd to avoid filling - log files. - - -* Changes in SLURM 0.2.20 -========================= - -- Fixes for reported problems: - - slurm/298: Elan initialization error (Invalid vp 2147483674). - - slurm/299: srun fails to exit with multiple ^C's. - -- Temporarily prevent DPCS from allocating jobs with more than eight - sets of consecutive nodes. This was likely causing user applications - to fail with libelan errors. This will be removed after DPCS is updated. - -- Fix bug in popt use, was failing in some versions of Linux. - -- Resend KILL_JOB messages as needed to clear COMPLETING jobs. - -- Install dummy SIGCHLD handler in slurmd to fix problem on NPTL systems - where slurmd was not notified of terminated tasks. - -* Changes in SLURM 0.2.19 -========================= - -- Memory corruption bug fixed, it was causing slurmctld to seg-fault - -* Changes in SLURM 0.2.18 -========================= - -- Fixes for reported problems: - - slurm/287: slurm protocol timeouts when using TotalView. - - slurm/291: srun fails using ``-n 1'' under multi-node allocation. - - slurm/294: srun IO buffer reports ENOSPC. - -- Memory corruption bug fixed, it was causing slurmctld to seg-fault - -- Non-responding nodes now go from DRAINING to DRAINED state when - jobs complete - -- Do not schedule pending jobs while any job is actively COMPLETING - unless the submitted job specifically identifies its nodes (like DPCS) - -- Reset priority of jobs with priority==1 when a non-responding node - starts to respond again - -- Ignore jobs with priority==1 when establishing new baseline upon - slurmctld restart - -- Make slurmctld/message retry be timer based rather than queue based - for better scalability - -- Slurmctld logging is more concise, using hostlists more - -- srun --no-allocate used special job_id range to avoid conflicts - or premature job termination (purging by slurmctld) - -- New --jobid=id option in srun to initiate job step under an existing - allocation. - -- Support in srun for TotalView bulk launch. - -* Changes in SLURM 0.2.17 -========================= - -- Fixes for reported problems: - - slurm/279: Hold jobs that can't execute due to DOWN or DRAINED - nodes and release when nodes are returned to service. - - slurm/285: "srun killed due to SIGPIPE" - -- Support for running job steps on nodes relative to current - allocation via srun -r, --relative=n option. - -- SIGKILL no longer broadcasted to job via srun on task failure unless - --no-allocate option is used. - -- Re-enabled "chkconfig --add" in default RPMs. - -- Backup controller setting proper PID into slurmctld.pid file. - -- Backup controller restores QSW state each time it assumes control - -- Backup controller purges old job records before assuming control - to avoid resurrecting defunct jobs. - -- Kill jobs on non-responding DRAINING nodes and make their state - DRAINED. - -- Save state upon completion of a job's last EPILOG_COMPLETION to - reduce possibility of inconsistent job and node records when the - controller is transitioning between primary and backup. - -- Change logging level of detailed communication errors to not print - them unless detailed debugging is requested. - -- Increase number of concurrent controller server threads from 20 - to 50 and restructure code to handle backlogs more efficiently. - -- Partition state at controller startup is based upon slurm.conf - rather than previously saved state. Additional improvements to - avoid inconsistent job/node/partition states at restart. Job state - information is used to arbitrate conflicts. - -- Orphaned file descriptors eliminated. - -* Changes in SLURM 0.2.16 -========================= - -- Fixes for reported problems: - - slurm/265: Early termination of srun could cause job to remain in queue. - - slurm/268: Slurmctld could deadlock if there was a delay in the - termination of a large node-count job. An EPILOG_COMPLETE RPC was - added so that slurmd could notify slurmctld whenever the job - termination was completed. - - slurm/270: Segfault in sinfo if a configured node lacked a partition. - - slurm/278: Exit code in scontrol did not indicate failure. - -- Fixed bug in slurmd that caused the daemon to occaisionally kill itself. - -- Fixed bug in srun when running with --no-allocate and >1 process per node. - -- Small fixes and updates for srun manual. - -* Changes in SLURM 0.2.15 -========================= - -- Fixes for reported problems: - - slurm/265: Job was orphaned when allocation response message could - not be sent. Job is now killed on allocation response message transmit - failure and socket error details are logged. - - Fix for slurm/267: "Job epilog may run multiple times." - -- Squeue job TIMELIMIT format changed from "h:mm" to "d:h:mm:ss". - -- DPCS initiated jobs have steps execute properly without explicit - specification of node count. - -* Changes in SLURM 0.2.14 -========================= - -- Fixes for reported problems: - - slurm/194: "srun doesn't handle most options when run under an allocation." - - slurm/244: "REQ: squeue shows requested size of pending jobs." - -- SLURM_NODELIST environment variable now exported to all jobs, not - only batch jobs. - -- Nodelist displayed in squeue for completing jobs is now restricted to - completing nodes. - -- Node "reason" field properly displayed in sinfo even with filtering. - -- ``slurm_tv_clean'' daemon now supports a log file. - -- Batch jobs are now re-queued on launch failure. - -- Controller confirms job scripts for batch jobs are still running on - node zero at node registration. - -- Default RPMs no longer stop/start SLURM daemons on upgrade or install. - -* Changes in SLURM 0.2.13 -========================= - -- Fixes for reported problems: - - Fixed bug in slurmctld where "drained" nodes would go back into - the "idle" state under some conditions (slurm/228). - - Added possible fix for slurm/229: "slurmd occasionally fails - to reap all children." - -- Fixed memory leak in auth_munge plugin. - -- Added fix to slurmctld to allow arbitrarily large job specifications - to be saved and recovered in the state file. - -- Allow "updates" in the configuration file of previously defined - node state and reason. - -- On "forceful termination" of a running job step, srun now exits - unconditionally, instead of waiting for all I/O. - -- Slurmctld now uses pidfile to kill old daemon when a new one is started. - -- Addition of new daemon "slurm_tv_clean" used to clean up jobs orphaned - due to use of the TotalView parallel debugger. - -* Changes in SLURM 0.2.12 -========================= - -- Fixes for reported problems: - - Fix for "waitpid: No child processes" when using TotalView (slurm/217). - - Implemented temporary workaround for slurm/223: "Munge decode failed: - Munged communication error." - - Temporary fix for slurm/222: "elan3_create(0): Invalid argument." - -- Fixed memory leaks in slurmctld (mostly due to reconfigure). - -- More squeue/sinfo interface changes (see squeue(1), sinfo(1)). - -- Sinfo now accepts list of node states to -t,--state option. - -- Node "reason" field now available via sinfo command (see sinfo(1)). - -- Wrapper source for srun (srun.wrapper.c) now installed and available - for TotalView support. - -- Improved retry login in user commands for periods when slurmctld - primary is down and backup has not yet taken over. - -* Changes in SLURM 0.2.11 -========================= - -- Changes in srun: - - Fixed bug in signal handling that occaisonally resulted in orphaned - jobs when using Ctrl-C. - - Return non-zero exit code when remote tasks are killed by a signal. - - SIGALRM is now blocked by default. - -- Added ``reason'' string for down, drained, or draining nodes. - -- Added -V,--version option to squeue and sinfo. - -- Improved some error messages from user utilities. - -* Changes in SLURM 0.2.10 -========================= - -- New slurm.conf configuration parameters: - - WaitTime: Default for srun -w,--wait parameter. - - MaxJobCount: Maximum number of jobs SLURM can handle at one time. - - MinJobAge: Minimum time since completing before job is purged from - slurmctld memory. - -- Block user defined signals USR1 and USR2 in slurmd session manager. - -- More squeue cleanup. - -- Support for passing options to sinfo via environment variables. - -- Added option to scontrol to find intersection of completing jobs and nodes. - -- Added fix in auth_munge to prevent "Munged communication error" message. - -* Changes in SLURM 0.2.9 -======================== - -- Fixes for reported problems: - - Argument to srun `-n' option was taken as octal if preceded with a `0'. - -- New format for Elan hosts config file (/etc/elanhosts. See README) - -- Various fixes for managing COMPLETING jobs. - -- Support for passing options to squeue via environment variables - (see squeue(1)) - -* Changes in SLURM 0.2.8 -========================= - -- Fix for bug in slurmd that could make debug messages appear in job output. - -- Fix for bug in slurmctld retry count computation. - -- Srun now times out slow launch threads. - -- "Time Used" output in squeue now includes seconds. - -* Changes in SLURM 0.2.7 -========================= - -- Fix for bug in Elan module that results in slurmd hang. - -- Added completing job state to default list of states to print with squeue. - -* Changes in SLURM 0.2.6 -========================= - -- More fixes for handling cleanup of slow terminating jobs. - -- Fixed bug in srun that might leave nodes allocated after a Ctrl-C. - -* Changes in SLURM 0.2.5 -========================= - -- Various fixes for cleanup of slow terminating or unkillable jobs. - -- Fixed some small memory leaks in communications code. - -- Added hack for synchronized exit of jobs on large node count. - -- Long lists of nodes are no longer truncated in sinfo. - -- Print more descriptive error message when tasks exit with nonzero status. - -- Fixed bug in srun where unsuccessful launch attempts weren't detected. - -- Elan network error resolver thread now runs from elan module in slurmd. - -- Slurmctld uses consecutive Elan context and program description numbers - instead of choosing them randomly. - -* Changes in SLURM 0.2.4 -========================== - -- Fix for file descriptor leak in slurmctld. - -- auth_munge plugin now prints credential info on decode failure. - -- Minor changes to scancel interface. - -- Filename format option "%J" now works again for srun --output and --error. - -* Changes in SLURM 0.2.3 -========================== - -- Fix bug in srun when using per-task files for stderr. - -- Better error reporting on failure to open per-task input/output files. - -- Update auth_munge plugin for munge 0.1. - -- Minor changes to squeue interface. - -- New srun option `--hold' to submit job in "held" state. - -* Changes in SLURM 0.2.2 -========================== - -- Fixes for reported problems: - - Execution of script allocate mode fails in some cases. (gnats:161) - - Errors using per-task input files with Elan support. (gnats:162) - - srun doesn't handle all environment variables properly. (gnats:164) - -- Parallel job is now terminated if a task is killed by a signal. - -- Exit status of srun is set based on exit codes of tasks. - -- Redesign of sinfo interface and options. - -- Shutdown of slurmctld no longer propagates shutdown to all nodes. - -* Changes in SLURM 0.2.1 -=========================== - -- Fix bug where reconfigure request to slurmctld killed the daemon. - -* Changes in SLURM 0.2.0 -============================ - - -- SlurmdTimeout of 0 means never set a non-responding node to DOWN. - -- New srun option, -u,--unbuffered, for unbuffered stdout. - -- Enhancements for sinfo - - Non-responding nodes show "*" character appended instead of "NoResp+". - - Node states show abbreviated variant by default - -- Enhancements for scontrol. - - Added "ping" command to show current state of SLURM controllers. - - Job dump in scontrol shows user name as well as UID. - - Node state of DRAIN is appropriately mapped to DRAINING or DRAINED. - -- Fix for bug where request for task count greater than partition limit - was queued anyway. - -- Fix for bugs in job end time handling. - -- Modifications for error free builds on 64 bit architectures. - -- Job cancel immediately deallocates nodes instead of waiting on srun. - -- Attempt to create slurmd spool if it does not exist. - -- Fixed signal handling bug in srun allocate mode. - -- Earlier error detection in slurmd startup. - -- "fatal: _shm_unlock: Numerical result out of range" bug fixed in slurmd. - -- Config file parsing is now case insensitive. - -- SLURM_NODELIST environment variable now set in allocate mode. - -* Changes in SLURM 0.2.0-pre2 -============================= - - -- Fix for reconfigure when public/private key path is changed. - -- Shared memory fixes in slurmd. - - fix for infinite semaphore incrementation bug. - -- Semaphore fixes in slurmctld. - -- Slurmctld now remembers which nodes have registered after recover. - -- Fixed reattach bug when tasks have exited. - -- Change directory to /tmp in slurmd if daemonizing. - -- Logfiles are reopened on reconfigure. - -$Id$