Skip to content
GitLab
Explore
Sign in
Primary navigation
Search or go to…
Project
S
Slurm
Manage
Activity
Members
Labels
Plan
Issues
Issue boards
Milestones
Code
Merge requests
Repository
Branches
Commits
Tags
Repository graph
Compare revisions
Snippets
Build
Pipelines
Jobs
Pipeline schedules
Artifacts
Deploy
Releases
Package Registry
Model registry
Operate
Environments
Terraform modules
Monitor
Incidents
Service Desk
Analyze
Value stream analytics
Contributor analytics
CI/CD analytics
Repository analytics
Model experiments
Help
Help
Support
GitLab documentation
Compare GitLab plans
Community forum
Contribute to GitLab
Provide feedback
Terms and privacy
Keyboard shortcuts
?
Snippets
Groups
Projects
Show more breadcrumbs
tud-zih-energy
Slurm
Commits
5da3d363
Commit
5da3d363
authored
22 years ago
by
Moe Jette
Browse files
Options
Downloads
Patches
Plain Diff
Updated job info in admin.guide.html
parent
d2336ded
No related branches found
Branches containing commit
No related tags found
No related merge requests found
Changes
1
Hide whitespace changes
Inline
Side-by-side
Showing
1 changed file
doc/html/admin.guide.html
+56
-28
56 additions, 28 deletions
doc/html/admin.guide.html
with
56 additions
and
28 deletions
doc/html/admin.guide.html
+
56
−
28
View file @
5da3d363
...
...
@@ -99,10 +99,6 @@ For example, if the configuration for NodeName=charlie immediately
follows the configuration for NodeName=baker they will be
considered adjacent in the computer.
<dt>
CPUs
<dd>
Number of processors on the node (e.g. "2").
The default value is 1.
<dt>
Feature
<dd>
A comma delimited list of arbitrary strings indicative of some
characteristic associated with the node.
...
...
@@ -115,7 +111,11 @@ By default a node has no features.
<dd>
Size of real memory on the node in MegaBytes (e.g. "2048").
The default value is 1.
<dt>
State
<dt>
Procs
<dd>
Number of processors on the node (e.g. "2").
The default value is 1.
dt>State
<dd>
State of the node with respect to the initiation of user jobs.
Acceptable values are "DOWN", "UNKNOWN", "IDLE", and "DRAINING".
The
<a
href=
"#NodeStates"
>
node states
</a>
are fully described below.
...
...
@@ -145,7 +145,7 @@ utilization, responsiveness and capability. It would be
preferable to allocate smaller memory nodes rather than larger
memory nodes if either will satisfy a job's requirements.
The units of weight are arbitrary, but larger weights
should be assigned to nodes with more
CPU
s, memory,
should be assigned to nodes with more
processor
s, memory,
disk space, higher processor speed, etc.
Weight is an integer value with a default value of 1.
...
...
@@ -163,7 +163,7 @@ scheduling process by permitting it to compare job requirements
against these (relatively few) configuration parameters and
possibly avoid having to perform checks job requirements
against every individual node's configuration.
The resources checked at node registration time are:
CPU
s,
The resources checked at node registration time are:
Proc
s,
RealMemory and TmpDisk.
While baseline values for each of these can be established
in the configuration file, the actual values upon node
...
...
@@ -254,8 +254,8 @@ A sample SLURM configuration file (node information only) follows.
#
NodeName=DEFAULT TmpDisk=16384 State=IDLE
NodeName=lx[0001-0002] State=DRAINED
NodeName=lx[0003-8000]
CPU
s=16 RealMemory=2048 Weight=16
NodeName=lx[8001-9999]
CPU
s=32 RealMemory=4096 Weight=40 Feature=1200MHz,VizTools
NodeName=lx[0003-8000]
Proc
s=16 RealMemory=2048 Weight=16
NodeName=lx[8001-9999]
Proc
s=32 RealMemory=4096 Weight=40 Feature=1200MHz,VizTools
</pre>
<p>
The partition configuration permits you to establish different job
...
...
@@ -362,11 +362,8 @@ scontrol administration tool to modify job state information:
<dl>
<dt>
Contiguous
<dd>
If this keyword is present, the nodes allocated to the job must
be contiguous.
<dt>
CPUCount
<dd>
Minimum total number of CPUs to be allocated to the job.
<dd>
Determine if the nodes allocated to the job must be contiguous.
Acceptable values are "YES" and "NO" with the default being "NO".
<dt>
Features
<dd>
Required features of nodes to be allocated to this job.
...
...
@@ -386,9 +383,17 @@ subset of nodes assessing a single parallel file system.
This might be specified with a specification of
"Features=[PFS1|PFS2|PFS3|PFS4]".
<dt>
Group
<dt>
Group
s
<dd>
Comma separated list of group names to which the user belongs.
<dt>
JobName
<dd>
Name to be associated with the job
<dt>
JobId
<dd>
Identification for the job. By default this is the partition's
name, followed by a period, followed by a sequence number (e.g.
"batch.1234").
<dt>
Key
<dd>
Key granted to user root for optional access control to partitions.
<dt>
Name
...
...
@@ -399,8 +404,8 @@ This name can be specified by users when submitting their jobs.
<dd>
Maximum wall-time limit for the job in minutes. An "UNLIMITED"
value is represented internally as -1.
<dt>
Min
CPU
s
<dd>
Minimum number of
CPU
s per node.
<dt>
Min
Proc
s
<dd>
Minimum number of
processor
s per node.
<dt>
MinRealMemory
<dd>
Minimum number of megabytes of real memory per node.
...
...
@@ -408,11 +413,8 @@ value is represented internally as -1.
<dt>
MinTmpDisk
<dd>
Minimum number of megabytes of temporary disk storage per node.
<dt>
NodeCount
<dd>
Minimum total number of nodes to be allocated to the job.
<dt>
NodeList
<dd>
Comma separated list of nodes which are allocated to the job.
<dt>
ReqNodes
<dd>
Comma separated list of nodes which must be allocated to the job.
The nodes may be specified using regular expressions (e.g.
"lx[0010-0020],lx[0033-0040]".
This value may not be changed by scontrol.
...
...
@@ -424,14 +426,34 @@ may not be changed by scontrol.
<dt>
Partition
<dd>
Name of the partition in which this job should execute.
<dt>
State
<dd>
State of the job. Possible values are "PENDING", "STARTING",
"RUNNING", and "ENDING".
<dt>
Priority
<dd>
Floating point priority of the pending job. The value may
be specified by user root initiated jobs, otherwise SLURM will
select a value. Generally, higher priority jobs will be initiated
before lower priority jobs. Backfill scheduling will permit
lower priority jobs to be initiated before higher priority jobs
only if doing so will not delay the anticipated initiation time
of the higher priority job .
<dt>
Script
<dd>
Pathname of the script to be executed for the job.
The script will typically contain "srun" commands to initiate
the parallel commands.
<dt>
Shared
<dd>
Job can share nodes with other jobs. Possible values are 1
and 0 for YES and NO respectively.
<dt>
State
<dd>
State of the job. Possible values are "PENDING", "STARTING",
"RUNNING", and "ENDING".
<dt>
TotalNodes
<dd>
Minimum total number of nodes to be allocated to the job.
<dt>
TotalProcs
<dd>
Minimum total number of processors to be allocated to the job.
<dt>
User
<dd>
Name of the user executing this job.
...
...
@@ -468,6 +490,12 @@ This can be used to remove temporary files created by the job or other clean-up.
This file must be accessible to every SLURM compute server.
By default there is no epilog program.
<dt>
FAST_SCHEDULE
<dd>
SLURM will only check the job's memory, processor, and disk
contraints against the configuration file entries if set. If set,
the specific values of each node will not be tested and scheduling
will be considerably faster for large clusters.
<dt>
HASH_BASE
<dd>
SLURM uses a hash table in order to locate table entries rapidly.
Each table entry can be directly accessed without any searching
...
...
@@ -607,8 +635,8 @@ BackupController=lx0002
#
NodeName=DEFAULT TmpDisk=16384 State=IDLE
NodeName=lx[0001-0002] State=DRAINED
NodeName=lx[0003-8000]
CPU
s=16 RealMemory=2048 Weight=16
NodeName=lx[8001-9999]
CPU
s=32 RealMemory=4096 Weight=40 Feature=1200MHz
NodeName=lx[0003-8000]
Proc
s=16 RealMemory=2048 Weight=16
NodeName=lx[8001-9999]
Proc
s=32 RealMemory=4096 Weight=40 Feature=1200MHz
#
# Partition Configurations
#
...
...
@@ -631,7 +659,7 @@ Remove node lx0030 from service, removing jobs as needed:
scontrol: show job 1234
Job 1234 not found
scontrol: show node lx0030
Name=lx0030 Partition=class State=DRAINED
CPU
s=16 RealMemory=2048 TmpDisk=16384
Name=lx0030 Partition=class State=DRAINED
Proc
s=16 RealMemory=2048 TmpDisk=16384
scontrol: quit
</pre>
...
...
This diff is collapsed.
Click to expand it.
Preview
0%
Loading
Try again
or
attach a new file
.
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Save comment
Cancel
Please
register
or
sign in
to comment