Skip to content
Snippets Groups Projects
NEWS 243 KiB
Newer Older
Christopher J. Morrone's avatar
Christopher J. Morrone committed
This file describes changes in recent versions of SLURM. It primarily
documents those changes that are of interest to users and admins.
Danny Auble's avatar
Danny Auble committed
* Changes in SLURM 2.1.1
=============================
 -- Fix for case sensitive databases when a slurmctld has a mixed case
    clustername to lower case the string to easy compares.
 -- Fix squeue if job is completing and failed to print remaining
    nodes instead of failed message.
 -- Fix sview core when searching for partitions by state.
 -- Fixed setting the start time when querying in sacct to the
    beginning of the day if not set previously.
 -- Defined slurm_free_reservation_info_msg and slurm_free_topo_info_msg
    in common/slurm_protocol_defs.h
 -- Avoid generating error when a job step includes a memory specification and 
    memory is not configured as a consumable resource.
 -- Patch for small memory leak in src/common/plugstack.c
 -- Fix sview search on node state.
Danny Auble's avatar
Danny Auble committed

* Changes in SLURM 2.1.0
Danny Auble's avatar
Danny Auble committed
=============================
 -- Improve sview layout of blocks in use.
 -- A user can now change the dimensions of the grid in sview.
 -- BLUEGENE - improved startup speed further for large numbers of defined
    blocks
 -- Fix to _get_job_min_nodes() in wiki2/get_jobs.c suggested by Michal Novotny
 -- BLUEGENE - fixed issues when updating a pending job when a node 
    count was incorrect for the asked for connection type.
 -- BLUEGENE - fixed issue when combining blocks that are in ready states to
    make a larger block from those or make multiple smaller blocks by
    splitting the larger block.  Previously this would only work with block
    in a free state.
 -- Fix bug in wiki(2) plugins where if HostFormat=2 and the task list is
    greater than 64 we don't truncate.  Previously this would mess up Moab
    by sending a truncated task list when doing a get jobs.
 -- Added update slurmctld debug level to sview when in admin mode.
 -- Added logic to make sure if enforcing a memory limit when using the
    jobacct_gather plugin a user can no longer turn off the logic to enforce
    the limit.
 -- Replaced many calls to getpwuid() with reentrant uid_to_string()
 -- The slurmstepd will now refresh it's log file handle on a reconfig,
    previously if a log was rolled any output from the stepd was lost.
Danny Auble's avatar
Danny Auble committed

Danny Auble's avatar
Danny Auble committed
* Changes in SLURM 2.1.0-pre9
=============================
 -- Added the "scontrol update SlurmctldDebug" as the preferred alternative to
    the "scontrol setdebug" command.
 -- BLUEGENE - made it so when removing a block in an error state the nodes in
    the block are set correctly in accounting as not in error.
 -- Fixed issue where if slurmdbd is not up qos' are set up correctly for
Danny Auble's avatar
Danny Auble committed
    associations off of cache.
 -- scontrol, squeue, sview all display the correct node, cpu count along with
    correct corresponding nodelist on completing jobs.
 -- Patch (Mark Grondona) fixes serious security vulnerability in SLURM in
    the spank_job_env functionality.
 -- Improve spank_job_env interface and documentation
 -- Add ESPANK_NOT_LOCAL error code to spank_err_t
 -- Made the #define DECAY_INTERVAL used in the priority/multifactor plugin
    a slurm.conf variable (PriorityCalcPeriod)
 -- Added new macro SLURM_VERSION for use in autoconf scripts to determine
    current version of slurm installed on system when building against the api.
 -- Patch from Matthieu Hautreux that adds an entry into the error file when
    a job or step receives a TERM or KILL signal.
 -- Make it so env var SLURM_SRUN_COMM_HOST is overwritten if already in 
Danny Auble's avatar
Danny Auble committed
    existence in the slurmd.
Danny Auble's avatar
Danny Auble committed

Danny Auble's avatar
Danny Auble committed
* Changes in SLURM 2.1.0-pre8
=============================
 -- Rearranged the "scontrol show job" output into functional groupings
 -- Change the salloc/sbatch/srun -P option to -d (dependency)
 -- Removed the srun -d option; must use srun --slurmd-debug instead
 -- When running the mysql plugin natively MUNGE errors are now eliminated 
    when sending updates to slurmctlds.
 -- Check to make sure we have a default account before looking to 
    fill in default association. 
 -- Accounting - Slurmctld and slurmdbd will now set uids of users which were 
    created after the start of the daemons on reconfig.  Slurmdbd will 
    attempt to set previously non-existant uids every hour.
 -- Patch from Aaron Knister and Mark Grondona, to parse correctly quoted 
    #SBATCH options in a batch script.
 -- job_desc_msg_t - in, out, err have been changed to std_in, std_out, 
    and std_err respectfully.  Needed for PySLURM, since Python sees (in) 
    as a keyword.
 -- Changed the type of addr to struct sockaddr_in in _message_socket_accept()
    in sattach.c, step_launch.c, and allocate_msg.c, and moved the function 
    into a common place for all the calls since the code was very similar.
 -- proctrack/lua support has been added see contribs/lua/protrack.lua
Danny Auble's avatar
Danny Auble committed
 -- replaced local gtk m4 test with AM_PATH_GTK_2_0
 -- changed AC_CHECK_LIB to AC_SEARCH_LIBS to avoid extra libs in
    compile lines.
 -- Patch from Matthieu Hautreux to improve error message in slurmd/req.c
 -- Added support for split groups from (Matthiu Hautreux CEA)
 -- Patch from Mark Grondona to move blcr scripts into pkglibexecdir
 -- Patch from Doug Parisek to calculate a job's projected start time under the
    builtin scheduler.
 -- Removed most global variables out of src/common/jobacct_common.h
Danny Auble's avatar
Danny Auble committed

Danny Auble's avatar
Danny Auble committed
* Changes in SLURM 2.1.0-pre7
=============================
 -- BLUEGENE - make 2.1 run correctly on a real bluegene cluster
Danny Auble's avatar
Danny Auble committed
 -- sacctmgr - Display better debug for when an admin specifies a non-existant 
    parent account when changing parent accounts.
 -- Added a mechanism to the slurmd to defer the epilog from starting until
    after a running prolog has finished.
Danny Auble's avatar
Danny Auble committed
 -- If a node reboots inbetween checking status the node is marked down unless 
    ReturnToService=2
 -- Added -R option to slurmctld to recover partition state also when 
    restarting or reconfiguring.
 
Moe Jette's avatar
Moe Jette committed
* Changes in SLURM 2.1.0-pre6
=============================
 -- When getting information about nodes in hidden partitions, return a node
    name of NULL rather than returning no information about the node so that 
    node index information is still valid.
 -- When querying database for jobs in certain state and a time period is 
    given only jobs in that state during the period will be returned,
    previously if a time period was given in sacct jobs eligible to run or 
    running would be displayed, which is still the default if no states are 
    requested.
 -- One can now query jobs based on size (nodes and or cpus) (mysql plugin only)
 -- Applied patch from Mark Grondona that tests for a missing config file before
    any other processing in spank_init().  This now prevents fatal errors from
    being mistakenly treated as recoverable.
 -- --enable-debug no longer has to be stated at configure time to have 
    the slurmctld or slurmstepd dump core on a seg fault.
 -- Moved the errant slurm_job_node_ready() declaration from job_info.h to
    slurm.h and deleted job_info.h.
Danny Auble's avatar
Danny Auble committed
 -- Added the slurm_job_cpus_allocated_on_node_id() 
Danny Auble's avatar
Danny Auble committed
    slurm_job_cpus_allocated_on_node() API for working with the 
    job_resources_t structure.
 -- BLUEGENE - speed up start up for systems that have many blocks (100+)
    configured on the system.
Moe Jette's avatar
Moe Jette committed
* Changes in SLURM 2.1.0-pre5
=============================
 -- Add squeue option "--start" to report expected start time of pending jobs.
 -- Sched/backfill plugin modified to set expected start time of pending jobs.
 -- Add SchedulerParameters option of "max_job_bf=#" to control how far down
    the queue of pending jobs that SLURM searches in an attempt backfill 
    schedule them. The default value is 50 jobs.
 -- Fixed cause of squeue -o "%C" seg fault.
 -- Add -"-signal=<int>@<time>" option to salloc, sbatch and srun commands to
    notify programs before reaching the end of their time limit.
 -- Add scontrol option to update a running job's EndTime (also resets the 
    job's time limit).
 -- Add new job wait reason, ReqNodeNotAvail: Required node is not available 
    (down or drained).
 -- Log when slurmctld or slurmd are started with small core file limit.
 -- Permit job's owner to change features, processor count, minimum and 
    maximun node counts of pending jobs (the operation was previously 
    restricted to user root)
 -- Applied patch from Chuck Clouston for scontrol man page with clarifications
    and additional info
 -- Change slurm errno name from ESLURM_TOO_MANY_REQUESTED_NODES to 
    ESLURM_INVALID_NODE_COUNT to better reflect its meaning.
 -- Fix bug in sched/backfill which could result in invalid memory reference
    when trying to schedule jobs submitted with --exclude option.
 -- Fix for slurmctld deadlock at startup with PreemptMode=SUSPEND,GANG.
Moe Jette's avatar
Moe Jette committed
 -- Added preemption plugins to RPM.
 -- Completely disable logging of sched/wiki and sched/wiki2 (Maui & Moab) 
    message traffic unless DebugFlag=Wiki is configured.
 -- Change scontrol show job info: ReqProcs (number of processors requested) 
    is replaced by NumProcs (number of processors requested or actually 
    allocated) and ReqNodes (number of nodes requested) is replaced by 
    NumNodes (number of nodes requested or actually allocated).
 -- Fixed issue when max nodes wasn't specified and was later set by limit 
    to not request that as the actual maximum.
 -- Move job preemption (for requeue, checkpoint and kill modes only) out of
    gang scheduling module. Make identification of preemptable jobs an argument
    to the select_g_job_test function rather than calling preempt plugin from
    the select plugin. Make output of srun --test-only option include a list
    of preempted job IDs. 
 -- Better record keeping for front end systems when registering.
 -- Enable memory allocation logic for jobs step (i.e. allocate resources
    within the job's memory allocation and enforce limits).
Danny Auble's avatar
Danny Auble committed
 -- handle error state in sinfo
 -- sview and "scontrol show config" now report as SLURM_VERSION the version 
    of slurmctld rather than that of the command.
 -- Change SuspendTime configuration parameter from 16-bits to 32-bits.
 -- Add environment variable support to sattach, salloc, sbatch and srun
    to permit user control over exit codes so application exit codes can be
    distiguished from those generated by SLURM. SLURM_EXIT_ERROR specifies the
    exit code when a SLURM error occurs. SLURM_EXIT_IMMEDIATE specifies the 
    exit code when the --immediate option is specified and resources are not
    available. Any other non-zero exit code would be that of the application
    run by SLURM.
 -- Added a Quality of Service (QOS) html page.
 -- In sched/wiki2, JOBWILLRUN command, add support for identification of 
    preemptable and preempted jobs (both new and old format of commands are
    supported).
 -- Remove contribs/python/hostlist files. Download the materials as needed
    directly from http://www.nsc.liu.se/~kent/python-hostlist.
 -- BLUEGENE - Preemption now works on bluegene systems
 -- For salloc, sbatch and srun commands, ignore _maximum_ values for
    --sockets-per-node, --cores-per-socket and --threads-per-core options.
    Remove --mincores, --minsockets, --minthreads options (map them to 
    minimum values of -sockets-per-node, --cores-per-socket and 
    --threads-per-core for now).
Loading
Loading full blame...