Skip to content
Snippets Groups Projects
NEWS 136 KiB
Newer Older
Christopher J. Morrone's avatar
Christopher J. Morrone committed
This file describes changes in recent versions of SLURM. It primarily
documents those changes that are of interest to users and admins.
* Changes in SLURM 1.3.0-pre8
=============================
 -- Modify how strings are packed in the RPCs, Maximum string size 
    increased from 64KB (16-bit size field) to 4GB (32-bit size field).
 -- Fix bug that prevented time value of "INFINITE" from being processed.
 -- Added new srun/sbatch option "--open-mode" to control how output/error 
    files are opened ("t" for truncate, "a" for append).
 -- Added checkpoint/xlch plugin for use with XLCH (Hongjia Cao, NUDT).
 -- Added srun option --checkpoint-path for use with XLCH (Hongjia Cao, NUDT).
 -- Added new srun/salloc/sbatch option "--acctg-freq" for user control over 
    accounting data collection polling interval.

* Changes in SLURM 1.3.0-pre7
=============================
 -- Fix a bug in the processing of srun's --exclusive option for a job step.

Moe Jette's avatar
Moe Jette committed
* Changes in SLURM 1.3.0-pre6
=============================
 -- Add support for configurable number of jobs to share resources using the 
    partition Shared parameter in slurm.conf (e.g. "Shared=FORCE:3" for two 
    jobs to share the resources). From Chris Holmes, HP.
 -- Made salloc use api instead of local code for message handling.
Moe Jette's avatar
Moe Jette committed
* Changes in SLURM 1.3.0-pre5
=============================
 -- Add select_g_reconfigure() function to node changes in slurmctld configuration
    that can impact node scheduling.
 -- scontrol to set/get partition's MaxTime and job's Timelimit in minutes plus
    new formats: min:sec, hr:min:sec, days-hr:min:sec, days-hr, etc.
 -- scontrol "notify" command added to send message to stdout of srun for 
    specified job id.
 -- For BlueGene, make alpha part of node location specification be case insensitive.
 -- Report scheduler-plugin specific configuration information with the 
    "scontrol show configuration" command on the SCHEDULER_CONF line. This
    information is not found in the "slurm.conf" file, but a scheduler plugin 
    specific configuration (e.g. "wiki.conf").
 -- sview partition information reported now includes partition priority.
 -- Expand job dependency specification to support concurrent execution, 
    testing of job exit status and multiple job IDs.
Moe Jette's avatar
Moe Jette committed

* Changes in SLURM 1.3.0-pre4
=============================
 -- Job step launch in srun is now done from the slurm api's all further
    modifications to job launch should be done there.
 -- Add new partition configuration parameter Priority. Add job count to 
    Shared parameter.
 -- Add new configuration parameters DefMemPerTask, MaxMemPerTask, and 
    SchedulerTimeSlice.
 -- In sched/wiki2, return REJMESSAGE with details on why a job was 
    requeued (e.g. what node failed).

Moe Jette's avatar
Moe Jette committed
* Changes in SLURM 1.3.0-pre3
=============================
 -- Remove slaunch command
Moe Jette's avatar
Moe Jette committed
 -- Added srun option "--checkpoint=time" for job step to automatically be 
    checkpointed on a period basis.
 -- Change behavior of "scancel -s KILL <jobid>" to send SIGKILL to all job
    steps rather than cancelling the job. This now matches the behavior of
    all other signals. "scancel <jobid>" still cancels the job and all steps.
 -- Add support for new job step options --exclusive and --immediate. Permit
    job steps to be queued when resources are not available within an existing 
    job allocation to dedicate the resources to the job step. Useful for
    executing simultaneous job steps. Provides resource management both at 
    the level of jobs and job steps.
Moe Jette's avatar
Moe Jette committed
 -- Add support for feature count in job constraints, for example
    srun --nodes=16 --constraint=graphics*4 ...
    Based upon work by Kumar Krishna (HP, India).
Moe Jette's avatar
Moe Jette committed
 -- Add multi-core options to salloc and sbatch commands (sbatch.patch and
    cleanup.patch from Chris Holmes, HP).
 -- In select/cons_res properly release resources allocated to job being 
    suspended (rmbreak.patch, from Chris Holmes, HP).
 -- Removed database and jobacct plugin replaced with jobacct_storage 
    and jobacct_gather for easier hooks for further expansion of the
    jobacct plugin.
Moe Jette's avatar
Moe Jette committed

* Changes in SLURM 1.3.0-pre2
=============================
 -- Added new srun option --pty to start job with pseudo terminal attached 
    to task 0 (all other tasks have I/O discarded)
 -- Disable user specifying jobid when sched/wiki2 configured (needed for 
    Moab releases until early 2007).
 -- Report command, args and working directory for batch jobs with 
    "scontrol show job".
* Changes in SLURM 1.3.0-pre1
=============================
 -- !!! SRUN CHANGES !!!
    The srun options -A/--allocate, -b/--batch, and -a/--attach have been
    removed!  That functionality is now available in the separate commands
    salloc, sbatch, and sattach, respectively.
 -- Add new node state FAILING plus trigger for when node enters that state.
 -- Add new configuration paramter "PrivateData". This can be used to 
    prevent a user from seeing jobs or job steps belonging to other users.
 -- Added configuration parameters for node power save mode: ResumeProgram
    ResumeRate, SuspendExcNodes, SuspendExcParts, SuspendProgram and 
    SuspendRate.
 -- Slurmctld maintains the IP address (rather than hostname) for srun 
    communications. This fixes some possible network routing issues.
Danny Auble's avatar
Danny Auble committed
 -- Added global database plugin.  Job accounting and Job completion are the 
    first to use it.  Follow documentation to add more to the plugin.
 -- Removed no-longer-needed jobacct/common/common_slurmctld.c since that is
    replaced by the database plugin.
Moe Jette's avatar
Moe Jette committed
 -- Added new configuration parameter: CryptoType.
    Moved existing digital signature logic into new plugin: crypto/openssl.
    Added new support for crypto/munge (available with GPL license).
* Changes in SLURM 1.2.21
=========================
 -- Fixed torque wrappers to look in the correct spot for the perl api
 -- Do not treat user resetting his time limit to the current value as
    an error.
 -- Set correct executable names for Totalview when --multi-prog option 
    is used and more than one node is allocated to the job step.
 -- When a batch job gets requeued, record in accounting logs that 
    the job was cancelled, the requeued job's submit time will be 
    set to the time of its requeue so it looks like a different job.
 -- Prevent communication problems if the slurmd/slurmstepd have a 
    different JobAcct plugin configured than slurmctld.
 -- Adding Gold plugin for job accounting
 -- In sched/wiki2, add support for MODIFYJOB option "JOBNAME=<name>"
    to modify a job's name.
 -- Add configuration check for sys/syslog.h and include it as needed.
* Changes in SLURM 1.2.20
=========================
 -- In switch/federation, fix small memory leak effecting slurmd.
 -- Add PMI_FANOUT_OFF_HOST environment variable to control how message 
    forwarding is done for PMI (MPICH2). See "man srun" for details.
 -- From sbatch set SLURM_NTASKS_PER_NODE when --ntasks-per-node option is 
    specified.
 -- BLUEGENE: Documented the prefix should always be lower case and the 3
    digit suffix should be uppercase if any letters are used as digits. 
 -- In sched/wiki and sched/wiki2, add support for --cpus-per-task option.
    From Miguel Ros, BSC.
 -- In sched/wiki2, prevent invalid memory pointer (and likely seg fault) 
    for job associated with a partition that has since been deleted.
 -- In sched/wiki2 plus select/cons_res, prevent invalid memory pointer 
    (and likely seg fault) when a job is requeued.
 -- In sched/wiki, add support for job suspend, resume, and modify.
 -- In sched/wiki, add suppport for processor allocation (not just node allocation)
    with layout control.
 -- Prevent re-sending job termination RPC to a node that has already completed 
    the job. Only send it to specific nodes which have not reported completion.
 -- Support larger environment variables 64K instead of BUFSIZ (8k on some 
    systems).
 -- If a job is being requeued, job step create requests will print a 
    warning and repeatedly retry rather than aborting.
 -- Add optional mode value to srun and sbatch --get-user-env option.
 -- Print error message and retry job submit commands when MaxJobCount 
    is reached. From Don Albert, Bull.
 -- Treat invalid begin time specification as a fatal error in sbatch and 
    srun. From Don Albert, Bull.
 -- Validate begin time specification to avoid hours >24, minutes >59, etc.
* Changes in SLURM 1.2.19
=========================
*** NOTE IMPORTANT CHANGE IN RPM BUILD BELOW ****
 -- slurm.spec file (used to build RPMs) was updated in order to support Mock, a
    chroot build environment. See https://hosted.fedoraproject.org/projects/mock/
    for more information. The following RPMs are no longer build by default:
    aix-federation, auth_none, authd, bluegene, sgijob, and switch-elan. Change 
    the RPMs built using the following options in ~/rpmmacros: "%_with_authd 1", 
    "%_without_munge 1", etc. See the slurm.spec file for more details.
 -- Print warning if non-privileged user requests negative "--nice" value on
    job submission (srun, salloc, and sbatch commands).
 -- In sched/wiki and sched/wiki2, add support for srun's --ntasks-per-node 
    option.
 -- In select/bluegene with Groups defined for Images, fix possible memory 
    corruption. Other configurations are not affected. 
 -- BLUEGENE - Fix bug that prevented user specification of linux-image, 
    mloader-image, and ramdisk-image on job submission.
 -- BLUEGENE - filter Groups specified for image not just by submitting 
    user's current group, but all groups the user has access to.
 -- BLUEGENE - Add salloc options to specify images to be loaded (--blrts-image, 
    --linux-image, --mloader-image, and --ramdisk-image).
 -- BLUEGENE - In bluegene.conf, permit Groups to be comma separated in addition 
    to colon separators previously supported.
 -- sbatch will accept batch script containing "#SLURM" options and advise
    changed to "#SBATCH".
 -- If srun --output or --error specification contains a task number rather 
    than a file name, send stdout/err from specified task to srun's stdout/err
    rather than to a file by the same name as the task's number.
 -- For srun --multi-prog option, verify configuration file before attempting 
    to launch tasks, report clear explanation of any configuration file errors.
 -- For sched/wiki2, add optional timeout option to srun's --get-user-env
    parameter, change default timeout for "su - <user> env" from 3 to 8 seconds.
    On timeout, attempt to load env from file at StateSaveLocation/env_cache/<user>.
    The format of this file is the same as output of "env" command. If there
    is no env cache file, then abort the request.
 -- squeue modified for completing job to remove nodes that have already 
    completed the job before applying node filter logic.
 -- squeue formatted output option added for job comment, "%q" (the obvious 
    choices for letters are already in use).
 -- Added configure option --enable-load-env-no-login for use with Moab. If
    set then the user job runs with the environment built without a login
    ("su <user> env" rather than "su - <user> env").
 -- Fix output of "srun -o %C" (allocated CPU count) for running jobs. This was
Loading
Loading full blame...