Skip to content
GitLab
Explore
Sign in
Primary navigation
Search or go to…
Project
S
Slurm
Manage
Activity
Members
Labels
Plan
Issues
Issue boards
Milestones
Code
Merge requests
Repository
Branches
Commits
Tags
Repository graph
Compare revisions
Snippets
Build
Pipelines
Jobs
Pipeline schedules
Artifacts
Deploy
Releases
Package registry
Model registry
Operate
Environments
Terraform modules
Monitor
Incidents
Service Desk
Analyze
Value stream analytics
Contributor analytics
CI/CD analytics
Repository analytics
Model experiments
Help
Help
Support
GitLab documentation
Compare GitLab plans
Community forum
Contribute to GitLab
Provide feedback
Terms and privacy
Keyboard shortcuts
?
Snippets
Groups
Projects
Show more breadcrumbs
tud-zih-energy
Slurm
Commits
d3176332
Commit
d3176332
authored
22 years ago
by
Moe Jette
Browse files
Options
Downloads
Patches
Plain Diff
Added some clarifications, add "get key" call for partition control.
It still needs some work, but is getting close.
parent
b6d1e845
No related branches found
No related tags found
No related merge requests found
Changes
1
Hide whitespace changes
Inline
Side-by-side
Showing
1 changed file
doc/txt/message.summary.txt
+30
-15
30 additions, 15 deletions
doc/txt/message.summary.txt
with
30 additions
and
15 deletions
doc/txt/message.summary.txt
+
30
−
15
View file @
d3176332
...
@@ -10,35 +10,45 @@ Command(s): Get job information, separate commands for
...
@@ -10,35 +10,45 @@ Command(s): Get job information, separate commands for
accounting, node, partition, job step and build info
accounting, node, partition, job step and build info
Client: squeue and scontrol commands, plus DPCS from API, any node in cluster
Client: squeue and scontrol commands, plus DPCS from API, any node in cluster
Server: slurmctld
Server: slurmctld
Input: time-stamp, version
Input: time-stamp, version, user id
flags : might be useful for filtering data sent, e.g. just this user's jobs
Output: error code, version, time-stamp, record count, array of records
Output: error code, version, time-stamp, record count, array of records
Notes:
Notes: most information generally available, some might be restricted by user id
Command(s): Get key
Client: API call (used by DPCS)
Server: slurmctld
Input: uid (must be root)
Output: key
Notes: used to control access to some partitions. for example, any user
can run jobs in the "batch" partition, but only when initiated by
a batch controller (e.g. DPCS). this prevents users from running
jobs outside of the queue structure
Command(s): Allocate
Command(s): Allocate
Client: srun or slurm api call
Client: srun or slurm api call
Server: slurmctld
Server: slurmctld
Input: username/uid,nnodes,ntasks,cpus_per_task,distribution
Input: username/uid,nnodes,ntasks, group
optional: partition,time_limit,constraints,features
optional: partition,time_limit,constraints,features,node list, key
flags : wait_for_resources
flags : wait_for_resources, test only (don't allocate resources,
Output: jobid, return code, error code, node list, ncpus/node
just reply whether or not allocate would have succeeded,
used by DPCS)
Output: jobid, return code, error code, node list, ncpus for *each* node in list
Notes: allocate resources to a ``job''
Notes: allocate resources to a ``job''
Command(s): Submit
Command(s): Submit
Client: srun or slurm api call
Client: srun or slurm api call
Server: slurmctld
Server: slurmctld
Input: Allocate input + script path, environment, cwd
Input: Allocate input + script path, environment, cwd
optional: partition, time_limit, constraints, features,
optional: partition, time_limit, constraints, features,
I/O location, signal handling
I/O location, signal handling
, key
flags:
flags:
Output: jobid, return code, error code
Output: jobid, return code, error code
Notes: submit a batch job to the slurm queue
Notes: submit a batch job to the slurm queue
Command(s): will job run inquiry
Client: slurm api call (e.g. DPCS)
Server: slurmctld
Input: like Allocate
Output: error code, version, job_id, node list
Notes:
Command(s): Run Job Step
Command(s): Run Job Step
Client: srun or slurm api call
Client: srun or slurm api call
...
@@ -53,6 +63,7 @@ Notes: run a set of parallel tasks under an allocated job
...
@@ -53,6 +63,7 @@ Notes: run a set of parallel tasks under an allocated job
allocate resources if jobid < MIN_JOBID, otherwise assume
allocate resources if jobid < MIN_JOBID, otherwise assume
resources are already available
resources are already available
Command(s): Job Resource Request
Command(s): Job Resource Request
Client: srun, scancel
Client: srun, scancel
Server: slurmctld
Server: slurmctld
...
@@ -60,7 +71,8 @@ Input: stepid
...
@@ -60,7 +71,8 @@ Input: stepid
Output: return code, error code, node list, ncpus/node, credentials
Output: return code, error code, node list, ncpus/node, credentials
Notes: obtain a new set of credentials for a job. Needed for
Notes: obtain a new set of credentials for a job. Needed for
at least `srun --attach`
at least `srun --attach`
Command(s): Run Job Request
Command(s): Run Job Request
Client: srun or slurmctld
Client: srun or slurmctld
Server: slurmd
Server: slurmd
...
@@ -78,6 +90,7 @@ Input: uid, jobid or stepid, signal no.
...
@@ -78,6 +90,7 @@ Input: uid, jobid or stepid, signal no.
Output: return code
Output: return code
Notes:
Notes:
Command(s): Kill Job Request
Command(s): Kill Job Request
Client: srun or slurmctld (possibly scancel)
Client: srun or slurmctld (possibly scancel)
Server: slurmd
Server: slurmd
...
@@ -86,6 +99,7 @@ Output: return code
...
@@ -86,6 +99,7 @@ Output: return code
Notes: explicitly kill job as opposed to implicit job kill
Notes: explicitly kill job as opposed to implicit job kill
with a signal job request.
with a signal job request.
Command(s): Job Attach Request
Command(s): Job Attach Request
Client: srun
Client: srun
Server: slurmd
Server: slurmd
...
@@ -96,6 +110,7 @@ Notes: srun process ``attaches'' to a currently running job. This
...
@@ -96,6 +110,7 @@ Notes: srun process ``attaches'' to a currently running job. This
request is used for srun recovery, or by a user who wants
request is used for srun recovery, or by a user who wants
to interactively reattach to a batch job.
to interactively reattach to a batch job.
Command(s): Cancel job or allocation
Command(s): Cancel job or allocation
Client: scancel user command, plus DPCS from API, any node in cluster
Client: scancel user command, plus DPCS from API, any node in cluster
Server: slurmctld
Server: slurmctld
...
@@ -142,7 +157,7 @@ Client: DPCS API
...
@@ -142,7 +157,7 @@ Client: DPCS API
Server: slurmd daemon on the same node as DPCS API is executed
Server: slurmd daemon on the same node as DPCS API is executed
Input: process id
Input: process id
Output: SLURM job id
Output: SLURM job id
Notes: until SLURM accounting is funcational, DPCS needs help figuring
Notes: until SLURM accounting is
fully
funcational, DPCS needs help figuring
out what processes are associated with each job
out what processes are associated with each job
...
...
This diff is collapsed.
Click to expand it.
Preview
0%
Loading
Try again
or
attach a new file
.
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Save comment
Cancel
Please
register
or
sign in
to comment