Skip to content
Snippets Groups Projects
NEWS 375 KiB
Newer Older
David Bigagli's avatar
David Bigagli committed
This file describes changes in recent versions of Slurm. It primarily
documents those changes that are of interest to users and administrators.

* Changes in Slurm 16.05.0pre2
==============================
 -- Split partition's "Priority" field into "PriorityTier" (used to order
    partitions for scheduling and preemption) plus "PriorityJobFactor" (used by
    priority/multifactor plugin in calculating job priority, which is used to
    order jobs within a partition for scheduling).
 -- Revert call to getaddrinfo, restoring gethostbyaddr (introduced in Slurm
    16.05.0pre1) which was failing on some systems.
 -- knl_cray.conf - Added AllowMCDRAM, AllowNUMA and ALlowUserBoot
    configuration options.
 -- Add node_features_p_user_update() function to node_features plugin.
 -- Don't print Weight=1 lines in 'scontrol write config' (its the default).
 -- Remove PARAMS macro from slurm.h.
 -- Remove BEGIN_C_DECLS and END_C_DECLS macros.
 -- Check that PowerSave mode configured for node_features/knl_cray plugin.
    It is required to reconfigure and reboot nodes.
 -- Update documentation to reflect new cgroup default location change from
    /cgroup to /sys/fs/cgroup.
 -- If NodeHealthCheckProgram configured HealthCheckInterval is non-zero, then
    modify slurmd to run it before registering with slurmctld.
Brian Christiansen's avatar
Brian Christiansen committed
 -- Fix for tasks being packed onto cores when the requested --cpus-per-task is
    greater than the number of threads on a core and --ntasks-per-core is 1.
 -- Make it so jobs/steps track ':' named gres/tres, before hand gres/gpu:tesla
    would only track gres/gpu, now it will track both gres/gpu and
    gres/gpu:tesla as separate gres if configured like
    AccountingStorageTRES=gres/gpu,gres/gpu:tesla
 -- Added new job dependency type of "aftercorr" which will start a task of a
    job array after the corresponding task of another job array completes.
 -- Increase default MaxTasksPerNode configuration parameter from 128 to 512.
 -- Enable sbcast data compression logic (compress option previously ignored).
 -- Add --compress option to srun command for use with --bcast option.
 -- Add TCPTimeout option to slurm[dbd].conf. Decouples MessageTimeout from TCP
    connections.
 -- Don't call primary controller for every RPC when backup is in control.
 -- Add --gres-flags=enforce-binding option to salloc, sbatch and srun commands.
    If set, the only CPUs available to the job will be those bound to the
    selected GRES (i.e. the CPUs identifed in the gres.conf file will be
    strictly enforced rather than advisory).
 -- Change how a node's allocated CPU count is calculated to avoid double
    counting CPUs allocated to multiple jobs at the same time.
* Changes in Slurm 16.05.0pre1
==============================
Morris Jette's avatar
Morris Jette committed
 -- Add sbatch "--wait" option that waits for job completion before exiting.
    Exit code will match that of spawned job.
 -- Modify advanced reservation save/restore logic for core reservations to
    support configuration changes (changes in configured nodes or cores counts).
 -- Allow ControlMachine, BackupController, DbdHost and DbdBackupHost to be
    either short or long hostname.
 -- Job output and error files can now contain "%" character by specifying
    a file name with two consecutive "%" characters. For example,
    "sbatch -o "slurm.%%.%j" for job ID 123 will generate an output file named
    "slurm.%.123".
 -- Pass user name in Prolog RPC from controller to slurmd when using
    PrologFlags=Alloc. Allows SLURM_JOB_USER env variable to be set when using
    Native Slurm on a Cray.
 -- Add "NumTasks" to job information visible to Slurm commands.
 -- Add mail wrapper script "smail" that will include job statistics in email
    notification messages.
Morris Jette's avatar
Morris Jette committed
 -- Remove vestigial "SICP" job option (inter-cluster job option). Completely
    different logic will be forthcoming.
 -- Fix case where the primary and backup dbds would both be performing rollup.
 -- Add an ack reply from slurmd to slurmstepd when job setup is done and the
    job is ready to be executed.
 -- Removed support for authd. authd has not been developed and supported since
    several years. 
 -- Introduce a new parameter requeue_setup_env_fail in SchedulerParameters.
    A job that fails to setup the environment will be requeued and the node
    drained.
 -- Add ValidateTimeout and OtherTimeout to "scontrol show burst" output.
 -- Increase default sbcast buffer size from 512KB to 8MB.
 -- Enable the hdf5 profiling of the batch step.
 -- Eliminate redundant environment and script files for job arrays.
 -- Stop searching sbatch scripts for #PBS directives after 100 lines of
    non-comments. Stop parsing #PBS or #SLURM directives after 1024 characters
    into a line. Required for decent perforamnce with huge scripts.
 -- Add debug flag for timing Cray portions of the code.
 -- Remove all *.la files from RPMs.
 -- Add Multi-Category Security (MCS) infrastructure to permit nodes to be bound
    to specific users or groups.
David Bigagli's avatar
David Bigagli committed
 -- Install the pmi2 unix sockets in slurmd spool directory instead of /tmp.
 -- Implement the getaddrinfo and getnameinfo instead of gethostbyaddr and
    gethostbyname.
David Bigagli's avatar
David Bigagli committed
 -- Finished PMIx implementation.
 -- Implemented the --without=package option for configure.
 -- Fix sshare to show each individual cluster with -M,--clusters option.
 -- Added --deadline option to salloc, sbatch and srun. Jobs which can not be
    completed by the user specified deadline will be terminated with a state of
    "Deadline" or "DL".
David Bigagli's avatar
David Bigagli committed
 -- Implemented and documented PMIX protocol which is used to bootstrap an
    MPI job. PMIX is an alternative to PMI and PMI2.
 -- Change default CgroupMountpoint (in cgroup.conf) from "/cgroup" to
    "/sys/fs/cgroup" to match current standard.
 -- Add #BSUB options to sbatch to read in from the batch script.
 -- HDF: Change group name of node from nodename to nodeid.
 -- The partition-specific SelectTypeParameters parameter can now be used to
    change the memory allocation tracking specification in the global
    SelectTypeParameters configuration parameter. Supported partition-specific
    values are CR_Core, CR_Core_Memory, CR_Socket and CR_Socket_Memory. If the
    global SelectTypeParameters value includes memory allocation management and
    the partition-specific value does not, then memory allocation management for
    that partition will NOT be supported (i.e. memory can be over-allocated).
    Likewise the global SelectTypeParameters might not include memory management
    while the partition-specific value does.
Morris Jette's avatar
Morris Jette committed
 -- Burst buffer/cray - Add support for multiple buffer pools including support
    for different resource granularity by pool.
Morris Jette's avatar
Morris Jette committed
 -- Burst buffer advanced reservation units treated as bytes (per documentation)
    rather than GB.
 -- Add an "scontrol top <jobid>" command to re-order the priorities of a user's
    pending jobs. May be disabled with the "disable_user_top" option in the
    SchedulerParameters configuration parameter.
 -- Modify sview to display negative job nice values.
 -- Increase job's nice value field from 16 to 32 bits.
 -- Remove deprecated job_submit/cnode plugin.
 -- Enhance slurm.conf option EnforcePartLimit to include options like "ANY" and
    "ALL".  "Any" is equivalent to "Yes" and "All" will check all partitions
    a job is submitted to and if any partition limit is violated the job will
    be rejected even if it could possibly run on another partition.
 -- Add "features_act" field (currently active features) to the node
    information. Output of scontrol, sinfo, and sview changed accordingly.
    The field previously displayed as "Features" is now "AvailableFeatures"
    while the new field is displayed as "ActiveFeatures".
Tim Wickberg's avatar
Tim Wickberg committed
 -- Remove Sun Constellation, IBM Federation Switches (replaced by NRT switch
    plugin) and long-defunct Quadrics Elan support.
 -- Add -M<clusters> option to sreport.
 -- Rework group caching to work better in environments with
    enumeration disabled. Removed CacheGroups config directive, group
    membership lists are now always cached, controlled by
    GroupUpdateTime parameter. GroupUpdateForce parameter default
    value changed to 1.
 -- Add reservation flag of "purge_comp" which will purge an advanced
    reservation once it has no more active (pending, suspended or running) jobs.
 -- Add new configuration parameter "KNLPlugins" and plugin infrastructure.
 -- Add optional job "features" to node reboot RPC.
 -- Add slurmd "-b" option to report node rebooted at daemon start time. Used
    for testing purposes.
 -- contribs/cray: Add framework for powering nodes up and down.
 -- For job constraint, convert comma separator to "&".
 -- Add Max*PerAccount options for QOS.
 -- Protect slurm_mutex_* calls with abort() on failure.
* Changes in Slurm 15.08.9
==========================
 -- BurstBuffer/cray - Defer job cancellation or time limit while "pre-run"
    operation in progress to avoid inconsistent state due to multiple calls
    to job termination functions.
 -- Fix issue with resizing jobs and limits not be kept track of correctly.
 -- BGQ - Remove redeclaration of job_read_lock.
 -- BGQ - Tighter locks around structures when nodes/cables change state.
 -- Make it possible to change CPUsPerTask with scontrol.
 -- Make it so scontrol update part qos= will take away a partition QOS from
    a partition.
 -- Fix issue where SocketsPerBoard didn't translate to Sockets when CPUS=
    was also given.
 -- Add note to slurm.conf man page about setting "--cpu_bind=no" as part
    of SallocDefaultCommand if a TaskPlugin is in use.
 -- Set correct reason when a QOS' MaxTresMins is violated.
 -- Insure that a job is completely launched before trying to suspend it.
Tim Wickberg's avatar
Tim Wickberg committed
 -- Remove historical presentations and design notes. Only distribute
    maintained doc/html and doc/man directories.
 -- Remove duplicate xmalloc() in task/cgroup plugin.
 -- Backfill scheduler to validate correct job partition for job submitted to
    multiple partitions.
 -- Force close on exec on first 256 file descriptors when launching a
    slurmstepd to close potential open ones.
 -- Step GRES value changed from type "int" to "int64_t" to support larger
    values.
 -- Fix getting reservations to database when database is down.
 -- Fix issue with sbcast not doing a correct fanout.
 -- Fix issue where steps weren't always getting the gres/tres involved.
 -- Fixed double read lock on getting job's gres/tres.
 -- Fix display for RoutePlugin parameter to display the correct value.
 -- Fix route/topology plugin to prevent segfault in sbcast when in use.
 -- Fix Cray slurmconfgen_smw.py script to use nid as nid, not nic.
Morris Jette's avatar
Morris Jette committed
 -- Fix Cray NHC spawning on job requeue. Previous logic would leave nodes
    allocated to a requeued job as non-usable on job termination.
 -- burst_buffer/cray plugin: Prevent a requeued job from being restarted while
    file stage-out is still in progress. Previous logic could restart the job
    and not perform a new stage-in.
 -- Fix job array formatting to allow return [0-100:2] display for arrays with
    step functions rather than [0,2,4,6,8,...] .
 -- FreeBSD - replace Linux-specific set_oom_adj to avoid errors in slurmd log.
 -- Add option for TopologyParam=NoInAddrAnyCtld to make the slurmctld listen
    on only one port like TopologyParam=NoInAddrAny does for everything else.
 -- Fix burst buffer plugin to prevent corruption of the CPU TRES data when bb
    is not set as an AccountingStorageTRES type.
 -- Surpress error messages in acct_gather_energy/ipmi plugin after repeated
    failures.
 -- Change burst buffer use completion email message from
    "SLURM Job_id=1360353 Name=tmp Staged Out, StageOut time 00:01:47" to
    "SLURM Job_id=1360353 Name=tmp StageOut/Teardown time 00:01:47"
 -- Generate burst buffer use completion email immediately afer teardown
    completes rather than at job purge time (likely minutes later).
 -- Fix issue when adding a new TRES to AccountingStorageTRES for the first
    time.
 -- Update gang scheduling tables when job manually suspended or resumed. Prior
    logic could mess up job suspend/resume sequencing.
Loading
Loading full blame...