Skip to content
GitLab
Explore
Sign in
Primary navigation
Search or go to…
Project
S
Slurm
Manage
Activity
Members
Labels
Plan
Issues
Issue boards
Milestones
Code
Merge requests
Repository
Branches
Commits
Tags
Repository graph
Compare revisions
Snippets
Build
Pipelines
Jobs
Pipeline schedules
Artifacts
Deploy
Releases
Package Registry
Model registry
Operate
Environments
Terraform modules
Monitor
Incidents
Service Desk
Analyze
Value stream analytics
Contributor analytics
CI/CD analytics
Repository analytics
Model experiments
Help
Help
Support
GitLab documentation
Compare GitLab plans
Community forum
Contribute to GitLab
Provide feedback
Terms and privacy
Keyboard shortcuts
?
Snippets
Groups
Projects
Show more breadcrumbs
tud-zih-energy
Slurm
Commits
ee5eb188
Commit
ee5eb188
authored
16 years ago
by
Moe Jette
Browse files
Options
Downloads
Patches
Plain Diff
Start RELEASE_NODES file and update news.html for slurm v1.4
parent
e41110bd
No related branches found
No related tags found
No related merge requests found
Changes
3
Hide whitespace changes
Inline
Side-by-side
Showing
3 changed files
NEWS
+2
-1
2 additions, 1 deletion
NEWS
RELEASE_NOTES
+36
-246
36 additions, 246 deletions
RELEASE_NOTES
doc/html/news.shtml
+16
-3
16 additions, 3 deletions
doc/html/news.shtml
with
54 additions
and
250 deletions
NEWS
+
2
−
1
View file @
ee5eb188
...
...
@@ -16,7 +16,8 @@ documents those changes that are of interest to users and admins.
-- Add core_bitmap_job to slurmctld's job step structure to identify
which specific cores are allocated to the step.
-- Add new configuration option OverTimeLimit to permit jobs to exceed
their (soft) time limits by a configurable amount.
their (soft) time limit by a configurable amount. Backfill scheduling
will be based upon the soft time limit.
-- Remove select_g_get_job_cores(). That data is now within the slurmctld's
job structure.
...
...
This diff is collapsed.
Click to expand it.
RELEASE_NOTES
+
36
−
246
View file @
ee5eb188
RELEASE NOTES FOR SLURM VERSION 1.
3
27 June
2008
RELEASE NOTES FOR SLURM VERSION 1.
4
6 October
2008
IMPORTANT NOTE:
SLURM state files in version 1.
3
are different from those of version 1.
2
.
After installing SLURM version 1.
2
, plan to restart without preserving
jobs or other state information. While SLURM version 1.
2
is still running,
SLURM state files in version 1.
4
are different from those of version 1.
3
.
After installing SLURM version 1.
4
, plan to restart without preserving
jobs or other state information. While SLURM version 1.
3
is still running,
cancel all pending and running jobs (e.g.
"scancel --state=pending; scancel --state=running"). Then stop and restart
daemons with the "-c" option or use "/etc/init.d/slurm startclean".
There are substantial changes in the slurm.conf configuration file. It
is recommended that you rebuild your configuration file using the tool
doc/html/configurator.html that comes with the distribution. The node
information is unchanged and the partition information only changes for
the Shared and Priority parameters, so those portions of your old
slurml.conf file may be copied into the new file.
Two areas of substantial change are accounting and job scheduling.
Slurm is now able to save accounting information in a database,
either MySQL or PostGreSQL. We have written a new daemon, slurmdbd
(Slurm DataBase Daemon), to serve as a centralized data manager for
multiple Slurm clusters. A new tool sacctmgr is available to manage
user accounting information through SlurmdDBD and a variety of
other tools are still under development to generate assorted
acccounting reports including graphics and a web interface. Slurm
now supports gang scheduling (time-slicing of parallel jobs for
improved responsiveness and system utilization). Many related
scheduling changes have also been made.
There are changes in SLURM's RPMs. "slurm-auth-munge" was changed to
"slurm-munge" since it now contains the authentication and cryptographic
signature plugins for Munge. The SLURM plugins have been moved out of
the main "slurm" RPM to a new RPM called "slurm-plugins". There is a
new RPM called "slurm-slurmdbd" (SLURM DataBase Daemon). Slurmdbd is
used to provide a secure SLURM database interface for accounting purposes
(more information about that below). The "slurm-slurmdbd" RPM requires
the "slurm-plugins" RPM, but none of the other SLURM RPMs. The main
"slurm" RPM also requires the "slurm-plugins" RPM.
To archive accounting records in a database then database RPMs must be
installed where the SLURM RPMs are build and where the database is used.
You have a choise of database, either "mysql" plus "mysql-devel" or
"postgres" plus "postgres-devel" RPMs.
Many enhancements have been made for better Slurm integration with
Moab and Maui schedulers. Moab version 5.2.3 or higher should be
used with SLURM version 1.3. In the Moab configuration file, moab.cfg,
change the SUBMITCMD option from "srun --batch" to "sbatch" since the
"srun --batch" option is no longer valid (use of full pathnames to
the commands are recommended, e.g. "/usr/local/bin/sbatch").
Major changes in Slurm version 1.3 are described below. Some changes
made after the initial release of Slurm version 1.2 are also noted.
Many less significant changes are not identified here. A complete list
of changes can be found in the NEWS file. Man pages should be consulted
for more details about command and configuration parameter changes.
COMMAND CHANGES
===============
* The srun options --allocate, --attach and --batch have been removed.
Use the new commands added in SLURM version 1.2 for this functionality:
salloc - Create a job allocation (functions like "srun --allocate")
sattach - Attach to an existing job step (functions like "srun --attach")
sbatch - Submit a batch job script (functions like "srun --batch")
These commands generally have the same options as the srun command.
See the individual man pages for more information.
* The slaunch command has been removed. Use the srun command instead.
* The srun option --exclusive has been added for job steps to be
allocated processors not already assigned to other job steps. This
can be used to execute multiple job steps simultaneously within a
job allocation and have SLURM perform resource management for the
job steps much like it does for jobs. If dedicated resources are
not immediately available, the job step will be executed later
unless the --immediate option is also set.
* Support is now provided for feature counts in job constraints. For
example: srun --nodes=16 --constraint=graphics*4 ...
* The srun option --pty has been added to start the job with a pseudo
terminal attached to task zero (all other tasks have I/O discarded).
* Job time limits can be specified using the following formats: min,
min:sec, hour:min:sec, and days-hour:min:sec (formerly only supported
minutes).
* scontrol now shows job TimeLimit and partition MaxTime in the format of
[days-]hours:minutes:seconds or "UNLIMITED". The scontrol update options
for times now accept minutes, minutes:seconds, hours:minutes:seconds,
days-hours, days-hours:minutes, days-hours:minutes:seconds or "UNLIMITED".
This new format also applies to partition MaxTime in the slurm.conf file.
* scontrol "notify" command added to send message to stdout of srun for
specified job id.
* Support has been added for a much richer job dependency specification
including testing of exit codes and multiple dependencies.
* The srun options --checkpoint=<interval> and --checkpoint-path=<file_path>
have been added.
* Event trigger support was added in Slurm v1.2.2. The command strigger
was added to manage the triggers.
* Added a --task-mem option and removed --job-mem option from srun, salloc,
and sbatch commands. Memory limits are applied on a per-task basis.
SCHEDULING CHANGES
==================
* The sched/backfill plugin has been largely re-written. It now supports
select/cons_res and all job options (required nodes, excluded nodes,
contiguous, etc.).
* Added a new partition parameter, Priority. A job's scheduling priority is
based upon two factors. First the priority of its partition and second the
job's priority. Since nodes can be configured in multiple partitions, this
can be used to configure high priority partitions (queues).
* The partition parameter Shared now has a job count. For example:
Shared=YES:4 (Up to 4 jobs may share each resource, user control)
Shared=FORCE:2 (Up to 2 jobs may share each resource, no user control)
* Added new parameters DefMemPerTask and MaxMemPerTask to control the default
and maximum memory per task. Any task that exceeds the specified size will
be terminated (enforcement requires job accounting to be enabled with a
non-zero value for JoabAcctGatherFrequency).
* The select linear plugin (allocating whole nodes to jobs) can treat memory
as a consumable resource with SelectTypeParameter=CR_Memory configured.
* A new scheduler type, gang, was added for gang scheduling (time-slicing of
parallel jobs). Note: The Slurm gang scheduler is not compatible with the
LSF, Maui or Moab schedulers.
* The new parameter, SchedulerTimeSlice, controls the length of gang scheduler
time slices.
* Added a new parameter, Licenses to support cluster-wide consumable
resources. The --licenses option was also added to salloc, sbatch,
and srun.
* The Shared=exclusive option in conjunction with SelectType=select/cons_res
can be used to dedicate whole nodes to jobs in specific partitions while
allocating sockets, cores, or hyperthreads in other partitions.
* Changes in the interface with the Moab and Maui scheduler have been
extensive providing far better integration between the systems.
* Many more parameters are shared between the systems.
* A new wiki.conf parameter, ExcludePartitions, can be used to enable
Slurm-based scheduling of jobs in specific partitions to achieve
better responsiveness while losing Moab or Maui policy controls.
* Another new wiki.conf parameter, HidePartitionJobs, can be used to
to hide jobs in specific partitions from Moab or Maui as well. See
the wiki.conf man pages for details.
* Moab relies upon Slurm to get a user's environment variables upon
job submission. If this can not be accomplished within a few seconds
(see the GetEnvTimeout parameter) then cache files can be used. Use
contribs/env_cache_builder.c to build these cache files.
ACCOUNTING CHANGES
==================
* The job accounting plugin has been split into two components: gathering
of data and storing the data. The JobAcctType parameter has been replaced by
JobAcctGatherType (AIX or Linux) and AccountingStorageType (MySQL, PostGreSQL,
filetext, and SlurmDBD). Storing the accounting information into a database
will provide you with greater flexibility in managing the data.
* A new daemon SlurmDBD (Slurm DataBase Daemon) has been added. This can
be used to securely manage the accounting data for several Slurm clusters
in a central location. Several new parameters have been added to support
SlurmDBD, all starting with SlurmDBD. Note that the SlurmDBD daemon is
designed to use a Slurm JobAcctStorageType plugin to use MySQL now.
It also uses existing Slurm authentication plugins.
* A new command, sacctmgr, is available for managing user accounts in
SlurmDBD has been added. This information is required for use of SlurmDBD
to manage job accounting data. Information is maintained based upon
an "association", which has four components: cluster name, Slurm partition,
user name and bank account. This tool can also be used to maintain
scheduling policy information that can be uploaded to Moab (various
resource limits and fair-share values) See the sacctmgr man page and
accounting web page for more information. Additional tools to generate
accounting reports are currently under development and will be released
soon.
* A new command, sreport, is available for generating accounting reports.
While the sacct command can be used to generate information about
individual jobs, sreport can combine this data to report utilization
information by cluster, bank account, user, etc.
* Job completion records can now be written to a MySQL or PostGreSQL
database in addition to a test file as controlled using the JobCompType
parameter.
OTHER CONFIGURATION CHANGES
===========================
* A new parameter, JobRequeue, to control default job behavior after a node
failure (requeue or kill the job). The sbatch--requeue option can be used to
override the system default.
* Added new parameters HealthCheckInterval and HealthCheckProgram to
automatically test the health of compute nodes.
* New parameters UnkillableStepProgram and UnkillableStepTimeout offer
better control when user processes can not be killed. For example
nodes can be automatically rebooted (added in Slurm v1.2.12)
* A new parameter, JobFileAppend, controls how to proceed when a job's
output or error file already exist (truncate the file or append to it,
added in slurm v1.2.13). Users can override this using the --open-mode
option when submitting a job.
* A new parameter, EnforcePartLimits, was dded. If set then immediately
reject a job that exceeds a partition's size and/or time limits rather
then queued for a later change in the partition's limits. NOTE: Not
reported by "scontrol show config" to avoid changing RPCs. It will be
reported in SLURM version 1.4.
* Checkpoint plugins have been added for XLCH and OpenMPI.
* A new parameter, PrivateData, can be used to prevent users from being
able to view jobs or job steps belonging to other users.
* A new parameter CryptoType to specify digital signature plugin to be used
Options are crypto/openssl (default) or crypto/munge (for a GPL license).
* Several Slurm MPI plugins were added to support srun launch of MPI tasks
including mpich1_p4 (Slurm v1.2.10) and mpich-mx (Slurm v1.2.11).
* Cpuset logic was added to the task/affinity plugin in Slurm v1.2.3.
Set TaskPluginParam=cpusets to enable.
OTHER CHANGES
=============
* Perl APIs and Torque wrappers for Torque/PBS to SLURM migration were
added in Slurm v1.2.13 in the contribs directory. SLURM now works
directly with Globus using the PBS GRAM.
* Support was added for several additional PMI functions to be used by
MPICH2 and MVAPICH2. Support for an PMI_TIME environment variable was
also added for user to control how PMI communications are spread out
in time. Scalability up to 16k tasks has been achieved.
* New node state FAILING has been added along with event trigger for it.
This is similar to DRAINING, but is intended for fault prediction work.
A trigger was also added for nodes becoming DRAINED.
doc/html/configurator.html that comes with the distribution.
HIGHLIGHTS
* Idle nodes can now be completely powered down when idle and automatically
restarted when their is work available. For more information see:
https://computing.llnl.gov/linux/slurm/power_save.html
* SLURM has been modified to allocate specific cores to jobs and job steps in
order to effectively preempt or gang schedule jobs.
* A new configuration parameter, PrologSlurmctld, can be used to support the
booting of different operating systems for each job.
CONFIGURATION FILE CHANGES (see "man slurm.conf" for details)
* DefMemPerTask has been removed. Use DefMemPerCPU or DefMemPerNode instead.
* Added new node state of "FUTURE". These node records are created in SLURM
tables for future use without a reboot of the SLURM daemons, but are not
reported by any SLURM commands or APIs.
* Default AuthType is now "auth/munge" rather than auth/none.
* Default CryptoType is now "crypto/munge". OpenSSL is no longer required by
SLURM in the default configuration.
* CompleteTime has been added to control how long to wait for a job's
completion before allocating already released resources to pending jobs.
* OverTimeLimit added to permit jobs to exceed their (soft) time limit by a
configurable amount. Backfill scheduling will be based upon the soft time
limit.
* PrologSlurmctld has been aded and can be used to boot nodes into a particular
state for each job.
COMMAND CHANGES (see man pages for details)
* --task-mem and --job-mem options have been removed from sall,c sbatch and
srun. Use --mem-per-cpu or --mem instead.
* --ctrl-comm-ifhn-addr option has been removed from the srun comma.d
This diff is collapsed.
Click to expand it.
doc/html/news.shtml
+
16
−
3
View file @
ee5eb188
...
...
@@ -7,7 +7,8 @@
<li><a href="#11">SLURM Version 1.1, May 2006</a></li>
<li><a href="#12">SLURM Version 1.2, February 2007</a></li>
<li><a href="#13">SLURM Version 1.3, March 2008</a></li>
<li><a href="#14">SLURM Version 1.4 and beyond</a></li>
<li><a href="#14">SLURM Version 1.4, May 2009</a></li>
<li><a href="#15">SLURM Version 1.5 and beyond</a></li>
</ul>
<h2><a name="11">Major Updates in SLURM Version 1.1</a></h2>
...
...
@@ -86,7 +87,19 @@ spawned tasks.</li>
including testing of exit codes and multiple dependencies.</li>
</ul>
<h2><a name="14">Major Updates in SLURM Version 1.4 and beyond</a></h2>
<h2><a name="14">Major Updates in SLURM Version 1.4</a></h2>
<p>SLURM Version 1.4 is scheduled for relased in May 2009.
Major enhancements include:
<ul>
<li>Idle nodes can now be completely powered down when idle and automatically
restarted when their is work available.</li>
<li>Specific cores are allocated to jobs and jobs steps in order to effective
preempt or gang schedule jobs.</li>
<li>A new configuration parameter, <i>PrologSlurmctld</i>, can be used to
support the booting of different operating systems for each job.</li>
</ul>
<h2><a name="15">Major Updates in SLURM Version 1.5 and beyond</a></h2>
<p> Detailed plans for release dates and contents of future SLURM releases have
not been finalized. Anyone desiring to perform SLURM development should notify
<a href="mailto:slurm-dev@lists.llnl.gov">slurm-dev@lists.llnl.gov</a>
...
...
@@ -97,6 +110,6 @@ to coordinate activies. Future development plans includes:
and refresh.</li>
</ul>
<p style="text-align:center;">Last modified
11 March
2008</p>
<p style="text-align:center;">Last modified
6 October
2008</p>
<!--#include virtual="footer.txt"-->
This diff is collapsed.
Click to expand it.
Preview
0%
Loading
Try again
or
attach a new file
.
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Save comment
Cancel
Please
register
or
sign in
to comment