Skip to content
GitLab
Explore
Sign in
Primary navigation
Search or go to…
Project
S
Slurm
Manage
Activity
Members
Labels
Plan
Issues
Issue boards
Milestones
Code
Merge requests
Repository
Branches
Commits
Tags
Repository graph
Compare revisions
Snippets
Build
Pipelines
Jobs
Pipeline schedules
Artifacts
Deploy
Releases
Package registry
Model registry
Operate
Environments
Terraform modules
Monitor
Incidents
Service Desk
Analyze
Value stream analytics
Contributor analytics
CI/CD analytics
Repository analytics
Model experiments
Help
Help
Support
GitLab documentation
Compare GitLab plans
Community forum
Contribute to GitLab
Provide feedback
Terms and privacy
Keyboard shortcuts
?
Snippets
Groups
Projects
Show more breadcrumbs
tud-zih-energy
Slurm
Commits
f5f13a68
Commit
f5f13a68
authored
16 years ago
by
Moe Jette
Browse files
Options
Downloads
Patches
Plain Diff
add draft scheduling policy document
parent
346042d9
No related branches found
Branches containing commit
No related tags found
No related merge requests found
Changes
1
Hide whitespace changes
Inline
Side-by-side
Showing
1 changed file
doc/html/sched_policy.shtml
+109
-0
109 additions, 0 deletions
doc/html/sched_policy.shtml
with
109 additions
and
0 deletions
doc/html/sched_policy.shtml
0 → 100644
+
109
−
0
View file @
f5f13a68
<!--#include virtual="header.txt"-->
<h1>Sheduling Policy</h1>
<p>SLURM scheduling policy support was significantly changed
in version 1.3 in order to take advantage of the database
integration used for storing accounting information.
This document describes the capabilities available in
SLURM version 1.3.4.
New features are under active development.
Familiarity with SLURM's <a href="accounting">Accounting</a> web page
is strongly recommended before use of this document.</p>
<h2>Configuration</h2>
<p>Scheduling policy information must be stored in a database
as specified by the <b>AccountingStorageType</b> configuration parameter
in the <b>slurm.conf</b> configuration file.
Information can be recorded in either <a href="http://www.mysql.com/">MySQL</a>
or <a href="http://www.postgresql.org/">PostgreSQL</a>.
For security and performance reasons, the use of
SlurmDBD (SLURM Database Daemon) as a front-end to the
database is strongly recommended.
SlurmDBD uses a SLURM authentication plugin (e.g. Munge).
SlurmDBD also uses an existing SLURM accounting storage plugin
to maximize code reuse.
SlurmDBD uses data caching and prioritization of pending requests
in order to optimize performance.
While SlurmDBD relies upon existing SLURM plugins for authentication
and database use, the other SLURM commands and daemons are not required
on the host where SlurmDBD is installed.
Only the <i>slurmdbd</i> and <i>slurm-plugins</i> RPMs are required
for SlurmDBD execution.</p>
<p>Both accounting and scheduling policy are configured based upon
an <i>association</i>. An <i>association</i> is a 4-tuple consisting
of the cluster name, bank account, user and (optionally) the SLURM
partition.
In order to enforce scheduling policy, set the value of
<b>AccountingStorageEnforce</b> to "1" in <b>slurm.conf</b>.
This prevents users from running any jobs without an valid
<i>association</i> record in the database and enforces scheduling
policy limits that have been configured.</p>
<h2>Tools</h2>
<p>The tool used to manage accounting policy is <i>sacctmgr</i>.
It can be used to create and delete cluster, user, bank account,
and partition records plus their combined <i>association</i> record.
See <i>man sacctmgr</i> for details on this tools and examples of
its use.</p>
<p>A web interface with graphical output is currently under development.</p>
<p>Changes made to the scheduling policy are uploaded to
the SLURM control daemons on the various clusters and take effect
immediately. When an association is delete, all jobs running or
pending which belong to that association are immediately cancelled.
When limits are lowered, running jobs will not be cancelled to
satisfy the new limits, but the new lower limits will be enforced.</p>
<h2>Policies supported</h2>
<p> A limited subset of scheduling policy options are currently
supported.
The available options are expected to increase as development
continues.
Most of these scheduling policy options are available not only
for an association, but also for each cluster and account.
If a new association is created for some user and some scheduling
policy options is not specified, the default will be the option
for the cluster plus account pair and if that is not specified
then the cluster and if that is not specified then no limit
will apply.</p>
<p>Currently available cheduling policy options:</p>
<ul>
<li><b>MaxJobs</b> Maxiumum number of running jobs for this association</li>
<li><b>MaxNodes</b> Maxiumum number of nodes for any single jobs in this association</li>
<li><b>MaxWall</b> Maxiumum wall clock time limit for any single jobs in this association</li>
</ul>
<p>The <b>MaxNodes</b> and <b>MaxWall</b> options already exist in
SLURM's configuration on a per-partition basis, but these options
provide the ability to establish limits on a per-user basis.
The <b>MaxJobs</b> option provides an entirely new mechanism
for SLURM to control the workload any individual may place on
a cluster in order to achieve some balance between users.</p>
<p>The next scheduling policy expected to be added is the concept
of fair-share scheduling based upon the hierarchical bank account
data is already maintained in the SLURM database.
The priorities of pending jobs will be adjusted in order to
deliver resources in proportion to each association's fair-share.
Consider the trivial example of a single bank account with
two users named Alice and Brian.
We might allocate Alice 60 percent of the resources and Brian the
remaining 40 precent.
If Alice has actually used 80 percent of available resources in the
recent past, then Brian's pending jobs will automatically be given a
higher priority than Alice's in order to deliver resources in
proportion to the fair-share target.
The time window considered in fair-share scheduling will be configurable
as well as the relative importance of job age (time waiting to run),
but this this example illustrates the concepts involved.</p>
<p style="text-align: center;">Last modified 24 June 2008</p>
</ul></body></html>
This diff is collapsed.
Click to expand it.
Preview
0%
Loading
Try again
or
attach a new file
.
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Save comment
Cancel
Please
register
or
sign in
to comment