Skip to content
GitLab
Explore
Sign in
Primary navigation
Search or go to…
Project
S
Slurm
Manage
Activity
Members
Labels
Plan
Issues
Issue boards
Milestones
Code
Merge requests
Repository
Branches
Commits
Tags
Repository graph
Compare revisions
Snippets
Build
Pipelines
Jobs
Pipeline schedules
Artifacts
Deploy
Releases
Package Registry
Model registry
Operate
Environments
Terraform modules
Monitor
Incidents
Service Desk
Analyze
Value stream analytics
Contributor analytics
CI/CD analytics
Repository analytics
Model experiments
Help
Help
Support
GitLab documentation
Compare GitLab plans
Community forum
Contribute to GitLab
Provide feedback
Terms and privacy
Keyboard shortcuts
?
Snippets
Groups
Projects
Show more breadcrumbs
tud-zih-energy
Slurm
Commits
5da3d363
Commit
5da3d363
authored
22 years ago
by
Moe Jette
Browse files
Options
Downloads
Patches
Plain Diff
Updated job info in admin.guide.html
parent
d2336ded
No related branches found
No related tags found
No related merge requests found
Changes
1
Hide whitespace changes
Inline
Side-by-side
Showing
1 changed file
doc/html/admin.guide.html
+56
-28
56 additions, 28 deletions
doc/html/admin.guide.html
with
56 additions
and
28 deletions
doc/html/admin.guide.html
+
56
−
28
View file @
5da3d363
...
@@ -99,10 +99,6 @@ For example, if the configuration for NodeName=charlie immediately
...
@@ -99,10 +99,6 @@ For example, if the configuration for NodeName=charlie immediately
follows the configuration for NodeName=baker they will be
follows the configuration for NodeName=baker they will be
considered adjacent in the computer.
considered adjacent in the computer.
<dt>
CPUs
<dd>
Number of processors on the node (e.g. "2").
The default value is 1.
<dt>
Feature
<dt>
Feature
<dd>
A comma delimited list of arbitrary strings indicative of some
<dd>
A comma delimited list of arbitrary strings indicative of some
characteristic associated with the node.
characteristic associated with the node.
...
@@ -115,7 +111,11 @@ By default a node has no features.
...
@@ -115,7 +111,11 @@ By default a node has no features.
<dd>
Size of real memory on the node in MegaBytes (e.g. "2048").
<dd>
Size of real memory on the node in MegaBytes (e.g. "2048").
The default value is 1.
The default value is 1.
<dt>
State
<dt>
Procs
<dd>
Number of processors on the node (e.g. "2").
The default value is 1.
dt>State
<dd>
State of the node with respect to the initiation of user jobs.
<dd>
State of the node with respect to the initiation of user jobs.
Acceptable values are "DOWN", "UNKNOWN", "IDLE", and "DRAINING".
Acceptable values are "DOWN", "UNKNOWN", "IDLE", and "DRAINING".
The
<a
href=
"#NodeStates"
>
node states
</a>
are fully described below.
The
<a
href=
"#NodeStates"
>
node states
</a>
are fully described below.
...
@@ -145,7 +145,7 @@ utilization, responsiveness and capability. It would be
...
@@ -145,7 +145,7 @@ utilization, responsiveness and capability. It would be
preferable to allocate smaller memory nodes rather than larger
preferable to allocate smaller memory nodes rather than larger
memory nodes if either will satisfy a job's requirements.
memory nodes if either will satisfy a job's requirements.
The units of weight are arbitrary, but larger weights
The units of weight are arbitrary, but larger weights
should be assigned to nodes with more
CPU
s, memory,
should be assigned to nodes with more
processor
s, memory,
disk space, higher processor speed, etc.
disk space, higher processor speed, etc.
Weight is an integer value with a default value of 1.
Weight is an integer value with a default value of 1.
...
@@ -163,7 +163,7 @@ scheduling process by permitting it to compare job requirements
...
@@ -163,7 +163,7 @@ scheduling process by permitting it to compare job requirements
against these (relatively few) configuration parameters and
against these (relatively few) configuration parameters and
possibly avoid having to perform checks job requirements
possibly avoid having to perform checks job requirements
against every individual node's configuration.
against every individual node's configuration.
The resources checked at node registration time are:
CPU
s,
The resources checked at node registration time are:
Proc
s,
RealMemory and TmpDisk.
RealMemory and TmpDisk.
While baseline values for each of these can be established
While baseline values for each of these can be established
in the configuration file, the actual values upon node
in the configuration file, the actual values upon node
...
@@ -254,8 +254,8 @@ A sample SLURM configuration file (node information only) follows.
...
@@ -254,8 +254,8 @@ A sample SLURM configuration file (node information only) follows.
#
#
NodeName=DEFAULT TmpDisk=16384 State=IDLE
NodeName=DEFAULT TmpDisk=16384 State=IDLE
NodeName=lx[0001-0002] State=DRAINED
NodeName=lx[0001-0002] State=DRAINED
NodeName=lx[0003-8000]
CPU
s=16 RealMemory=2048 Weight=16
NodeName=lx[0003-8000]
Proc
s=16 RealMemory=2048 Weight=16
NodeName=lx[8001-9999]
CPU
s=32 RealMemory=4096 Weight=40 Feature=1200MHz,VizTools
NodeName=lx[8001-9999]
Proc
s=32 RealMemory=4096 Weight=40 Feature=1200MHz,VizTools
</pre>
</pre>
<p>
<p>
The partition configuration permits you to establish different job
The partition configuration permits you to establish different job
...
@@ -362,11 +362,8 @@ scontrol administration tool to modify job state information:
...
@@ -362,11 +362,8 @@ scontrol administration tool to modify job state information:
<dl>
<dl>
<dt>
Contiguous
<dt>
Contiguous
<dd>
If this keyword is present, the nodes allocated to the job must
<dd>
Determine if the nodes allocated to the job must be contiguous.
be contiguous.
Acceptable values are "YES" and "NO" with the default being "NO".
<dt>
CPUCount
<dd>
Minimum total number of CPUs to be allocated to the job.
<dt>
Features
<dt>
Features
<dd>
Required features of nodes to be allocated to this job.
<dd>
Required features of nodes to be allocated to this job.
...
@@ -386,9 +383,17 @@ subset of nodes assessing a single parallel file system.
...
@@ -386,9 +383,17 @@ subset of nodes assessing a single parallel file system.
This might be specified with a specification of
This might be specified with a specification of
"Features=[PFS1|PFS2|PFS3|PFS4]".
"Features=[PFS1|PFS2|PFS3|PFS4]".
<dt>
Group
<dt>
Group
s
<dd>
Comma separated list of group names to which the user belongs.
<dd>
Comma separated list of group names to which the user belongs.
<dt>
JobName
<dd>
Name to be associated with the job
<dt>
JobId
<dd>
Identification for the job. By default this is the partition's
name, followed by a period, followed by a sequence number (e.g.
"batch.1234").
<dt>
Key
<dt>
Key
<dd>
Key granted to user root for optional access control to partitions.
<dd>
Key granted to user root for optional access control to partitions.
<dt>
Name
<dt>
Name
...
@@ -399,8 +404,8 @@ This name can be specified by users when submitting their jobs.
...
@@ -399,8 +404,8 @@ This name can be specified by users when submitting their jobs.
<dd>
Maximum wall-time limit for the job in minutes. An "UNLIMITED"
<dd>
Maximum wall-time limit for the job in minutes. An "UNLIMITED"
value is represented internally as -1.
value is represented internally as -1.
<dt>
Min
CPU
s
<dt>
Min
Proc
s
<dd>
Minimum number of
CPU
s per node.
<dd>
Minimum number of
processor
s per node.
<dt>
MinRealMemory
<dt>
MinRealMemory
<dd>
Minimum number of megabytes of real memory per node.
<dd>
Minimum number of megabytes of real memory per node.
...
@@ -408,11 +413,8 @@ value is represented internally as -1.
...
@@ -408,11 +413,8 @@ value is represented internally as -1.
<dt>
MinTmpDisk
<dt>
MinTmpDisk
<dd>
Minimum number of megabytes of temporary disk storage per node.
<dd>
Minimum number of megabytes of temporary disk storage per node.
<dt>
NodeCount
<dt>
ReqNodes
<dd>
Minimum total number of nodes to be allocated to the job.
<dd>
Comma separated list of nodes which must be allocated to the job.
<dt>
NodeList
<dd>
Comma separated list of nodes which are allocated to the job.
The nodes may be specified using regular expressions (e.g.
The nodes may be specified using regular expressions (e.g.
"lx[0010-0020],lx[0033-0040]".
"lx[0010-0020],lx[0033-0040]".
This value may not be changed by scontrol.
This value may not be changed by scontrol.
...
@@ -424,14 +426,34 @@ may not be changed by scontrol.
...
@@ -424,14 +426,34 @@ may not be changed by scontrol.
<dt>
Partition
<dt>
Partition
<dd>
Name of the partition in which this job should execute.
<dd>
Name of the partition in which this job should execute.
<dt>
State
<dt>
Priority
<dd>
State of the job. Possible values are "PENDING", "STARTING",
<dd>
Floating point priority of the pending job. The value may
"RUNNING", and "ENDING".
be specified by user root initiated jobs, otherwise SLURM will
select a value. Generally, higher priority jobs will be initiated
before lower priority jobs. Backfill scheduling will permit
lower priority jobs to be initiated before higher priority jobs
only if doing so will not delay the anticipated initiation time
of the higher priority job .
<dt>
Script
<dd>
Pathname of the script to be executed for the job.
The script will typically contain "srun" commands to initiate
the parallel commands.
<dt>
Shared
<dt>
Shared
<dd>
Job can share nodes with other jobs. Possible values are 1
<dd>
Job can share nodes with other jobs. Possible values are 1
and 0 for YES and NO respectively.
and 0 for YES and NO respectively.
<dt>
State
<dd>
State of the job. Possible values are "PENDING", "STARTING",
"RUNNING", and "ENDING".
<dt>
TotalNodes
<dd>
Minimum total number of nodes to be allocated to the job.
<dt>
TotalProcs
<dd>
Minimum total number of processors to be allocated to the job.
<dt>
User
<dt>
User
<dd>
Name of the user executing this job.
<dd>
Name of the user executing this job.
...
@@ -468,6 +490,12 @@ This can be used to remove temporary files created by the job or other clean-up.
...
@@ -468,6 +490,12 @@ This can be used to remove temporary files created by the job or other clean-up.
This file must be accessible to every SLURM compute server.
This file must be accessible to every SLURM compute server.
By default there is no epilog program.
By default there is no epilog program.
<dt>
FAST_SCHEDULE
<dd>
SLURM will only check the job's memory, processor, and disk
contraints against the configuration file entries if set. If set,
the specific values of each node will not be tested and scheduling
will be considerably faster for large clusters.
<dt>
HASH_BASE
<dt>
HASH_BASE
<dd>
SLURM uses a hash table in order to locate table entries rapidly.
<dd>
SLURM uses a hash table in order to locate table entries rapidly.
Each table entry can be directly accessed without any searching
Each table entry can be directly accessed without any searching
...
@@ -607,8 +635,8 @@ BackupController=lx0002
...
@@ -607,8 +635,8 @@ BackupController=lx0002
#
#
NodeName=DEFAULT TmpDisk=16384 State=IDLE
NodeName=DEFAULT TmpDisk=16384 State=IDLE
NodeName=lx[0001-0002] State=DRAINED
NodeName=lx[0001-0002] State=DRAINED
NodeName=lx[0003-8000]
CPU
s=16 RealMemory=2048 Weight=16
NodeName=lx[0003-8000]
Proc
s=16 RealMemory=2048 Weight=16
NodeName=lx[8001-9999]
CPU
s=32 RealMemory=4096 Weight=40 Feature=1200MHz
NodeName=lx[8001-9999]
Proc
s=32 RealMemory=4096 Weight=40 Feature=1200MHz
#
#
# Partition Configurations
# Partition Configurations
#
#
...
@@ -631,7 +659,7 @@ Remove node lx0030 from service, removing jobs as needed:
...
@@ -631,7 +659,7 @@ Remove node lx0030 from service, removing jobs as needed:
scontrol: show job 1234
scontrol: show job 1234
Job 1234 not found
Job 1234 not found
scontrol: show node lx0030
scontrol: show node lx0030
Name=lx0030 Partition=class State=DRAINED
CPU
s=16 RealMemory=2048 TmpDisk=16384
Name=lx0030 Partition=class State=DRAINED
Proc
s=16 RealMemory=2048 TmpDisk=16384
scontrol: quit
scontrol: quit
</pre>
</pre>
...
...
This diff is collapsed.
Click to expand it.
Preview
0%
Loading
Try again
or
attach a new file
.
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Save comment
Cancel
Please
register
or
sign in
to comment