Skip to content
GitLab
Explore
Sign in
Primary navigation
Search or go to…
Project
S
Slurm
Manage
Activity
Members
Labels
Plan
Issues
Issue boards
Milestones
Code
Merge requests
Repository
Branches
Commits
Tags
Repository graph
Compare revisions
Snippets
Build
Pipelines
Jobs
Pipeline schedules
Artifacts
Deploy
Releases
Package registry
Model registry
Operate
Environments
Terraform modules
Monitor
Incidents
Service Desk
Analyze
Value stream analytics
Contributor analytics
CI/CD analytics
Repository analytics
Model experiments
Help
Help
Support
GitLab documentation
Compare GitLab plans
Community forum
Contribute to GitLab
Provide feedback
Terms and privacy
Keyboard shortcuts
?
Snippets
Groups
Projects
Show more breadcrumbs
tud-zih-energy
Slurm
Commits
5b2f74b9
Commit
5b2f74b9
authored
7 years ago
by
Morris Jette
Browse files
Options
Downloads
Patches
Plain Diff
scancel of pack job leader signals all pack job components
parent
dffaeaad
No related branches found
No related tags found
No related merge requests found
Changes
2
Hide whitespace changes
Inline
Side-by-side
Showing
2 changed files
doc/html/heterogeneous_jobs.shtml
+22
-3
22 additions, 3 deletions
doc/html/heterogeneous_jobs.shtml
src/slurmctld/job_mgr.c
+22
-0
22 additions, 0 deletions
src/slurmctld/job_mgr.c
with
44 additions
and
3 deletions
doc/html/heterogeneous_jobs.shtml
+
22
−
3
View file @
5b2f74b9
...
...
@@ -41,7 +41,7 @@ unique <i>job_id</i>.</li>
<li><i>pack_job_id</i>: This identification number applies to all components
of the heterogeneous job. All components of the same job will have the same
<i>pack_job_id</i> value and it will be equal to the <i>job_id</i> of the
first component.</li>
first component.
We refer to this as the "pack leader".
</li>
<li><i>pack_job_id_set</i>: Regular expression identifying all <i>job_id</i>
values associated with the job.</li>
<li><i>pack_job_offset</i>: A unique sequence number applied to each component
...
...
@@ -72,8 +72,8 @@ For example "123+4" would represent heterogeneous job id 123 and it's fifth
component (note: the first component has a <i>pack_job_offset</i>value of 0).</p>
<p>A request for a specific job ID that identifes a ID of the first component
of a heterogenous job will return information about
all pack job components.
For example:</p>
of a heterogenous job
(i.e. the "pack leader"
will return information about
all components of that job.
For example:</p>
<pre>
$ squeue --job=93
JOBID PARTITION NAME USER ST TIME NODES NODELIST
...
...
@@ -82,6 +82,25 @@ JOBID PARTITION NAME USER ST TIME NODES NODELIST
93+2 debug bash adam R 18:18 1 nid00021
</pre>
<p>A request to cancel or otherwise signal a pack leader will be applied to
all components of that pack job. A request to cancel a specific component of
the pack job using the "#+#" notation will apply on to that specific component.
For example:</p>
<pre>
$ squeue --job=93
JOBID PARTITION NAME USER ST TIME NODES NODELIST
93+0 debug bash adam R 19:18 1 nid00001
93+1 debug bash adam R 19:18 1 nid00011
93+2 debug bash adam R 19:18 1 nid00021
$ scancel 93+1
$ squeue --job=93
JOBID PARTITION NAME USER ST TIME NODES NODELIST
93+0 debug bash adam R 19:38 1 nid00001
93+2 debug bash adam R 19:38 1 nid00021
$ squeue --job=93
JOBID PARTITION NAME USER ST TIME NODES NODELIST
</pre>
<h2><a name="limitations">Limitations</a></h2>
<p>In a federation of clusters, a heterogeneous job will execute entirely on
...
...
This diff is collapsed.
Click to expand it.
src/slurmctld/job_mgr.c
+
22
−
0
View file @
5b2f74b9
...
...
@@ -4785,6 +4785,24 @@ extern int job_signal(uint32_t job_id, uint16_t signal, uint16_t flags,
return _job_signal(job_ptr, signal, flags, uid, preempt);
}
/* Signal all components of a pack job */
static int _pack_job_signal(struct job_record *job_ptr, uint16_t signal,
uint16_t flags, uid_t uid, bool preempt)
{
ListIterator iter;
int rc = SLURM_SUCCESS, rc1;
struct job_record *pack_job_ptr;
iter = list_iterator_create(job_ptr->pack_job_list);
while ((pack_job_ptr = (struct job_record *) list_next(iter))) {
rc1 = _job_signal(pack_job_ptr, signal, flags, uid, preempt);
rc = MAX(rc, rc1);
}
list_iterator_destroy(iter);
return rc;
}
/*
* job_str_signal - signal the specified job
* IN job_id_str - id of the job to be signaled, valid formats include "#"
...
...
@@ -4830,6 +4848,10 @@ extern int job_str_signal(char *job_id_str, uint16_t signal, uint16_t flags,
int jobs_done = 0, jobs_signalled = 0;
struct job_record *job_ptr_done = NULL;
job_ptr = find_job_record(job_id);
if (job_ptr && job_ptr->pack_job_list) {
return _pack_job_signal(job_ptr, signal, flags, uid,
preempt);
}
if (job_ptr && (job_ptr->array_task_id == NO_VAL) &&
(job_ptr->array_recs == NULL)) {
/* This is a regular job, not a job array */
...
...
This diff is collapsed.
Click to expand it.
Preview
0%
Loading
Try again
or
attach a new file
.
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Save comment
Cancel
Please
register
or
sign in
to comment