Skip to content
GitLab
Explore
Sign in
Primary navigation
Search or go to…
Project
S
Slurm
Manage
Activity
Members
Labels
Plan
Issues
Issue boards
Milestones
Code
Merge requests
Repository
Branches
Commits
Tags
Repository graph
Compare revisions
Snippets
Build
Pipelines
Jobs
Pipeline schedules
Artifacts
Deploy
Releases
Package Registry
Model registry
Operate
Environments
Terraform modules
Monitor
Incidents
Service Desk
Analyze
Value stream analytics
Contributor analytics
CI/CD analytics
Repository analytics
Model experiments
Help
Help
Support
GitLab documentation
Compare GitLab plans
Community forum
Contribute to GitLab
Provide feedback
Terms and privacy
Keyboard shortcuts
?
Snippets
Groups
Projects
Show more breadcrumbs
tud-zih-energy
Slurm
Commits
541aa2b9
Commit
541aa2b9
authored
10 years ago
by
Morris Jette
Browse files
Options
Downloads
Plain Diff
Merge branch 'slurm-14.03'
parents
d7d055cc
90a2ab79
No related branches found
No related tags found
No related merge requests found
Changes
3
Hide whitespace changes
Inline
Side-by-side
Showing
3 changed files
NEWS
+2
-0
2 additions, 0 deletions
NEWS
doc/html/faq.shtml
+21
-19
21 additions, 19 deletions
doc/html/faq.shtml
src/squeue/print.c
+13
-6
13 additions, 6 deletions
src/squeue/print.c
with
36 additions
and
25 deletions
NEWS
+
2
−
0
View file @
541aa2b9
...
...
@@ -66,6 +66,8 @@ documents those changes that are of interest to users and admins.
-- Fix segfault of sacct -c if spaces are in the variables.
-- Release held job only with "scontrol release <jobid>" and not by resetting
the job's priority. This is needed to support job arrays better.
-- Correct squeue command not to merge jobs with state pending and completing
together.
* Changes in Slurm 14.03.1-2
==========================
...
...
This diff is collapsed.
Click to expand it.
doc/html/faq.shtml
+
21
−
19
View file @
541aa2b9
...
...
@@ -178,13 +178,18 @@ launch a shell on a node in the job's allocation?</a></li>
Free Open Source Software (FOSS) does not mean that it is without cost.
It does mean that the you have access to the code so that you are free to
use it, study it, and/or enhance it.
These reasons contribute to Slurm (and FOSS in general) being subject to
active research and development worldwide, displacing proprietary software
in many environments.
If the software is large and complex, like Slurm or the Linux kernel,
then its use is not without cost.
If your work is important, you'll want the leading Slurm experts at your
then
while there is no license fee,
its use is not without cost.
</p>
<p>
If your work is important, you'll want the leading Slurm experts at your
disposal to keep your systems operating at peak efficiency.
While Slurm has a global development community incorporating leading edge
technology, <a href="http://www.schedmd.com">SchedMD</a> personnel have developed
most of the code and can provide competitively priced commercial support.
SchedMD works with various organizations to provide a range of support
options ranging from remote level-3 support to 24x7 on-site personnel.
Customers switching from commercial workload mangers to Slurm typically
report higher scalability, better performance and lower costs.</p>
...
...
@@ -629,13 +634,13 @@ or <b>--distribution</b>' is 'arbitrary'. This means you can tell slurm to
layout your tasks in any fashion you want. For instance if I had an
allocation of 2 nodes and wanted to run 4 tasks on the first node and
1 task on the second and my nodes allocated from SLURM_NODELIST
where tux[0-1] my srun line would look like this
.<p
>
<i>srun -n5 -m arbitrary -w tux[0,0,0,0,1] hostname</i><
p
>
where tux[0-1] my srun line would look like this
:<br><br
>
<i>srun -n5 -m arbitrary -w tux[0,0,0,0,1] hostname</i><
br><br
>
If I wanted something similar but wanted the third task to be on tux 1
I could run this
...<p
>
<i>srun -n5 -m arbitrary -w tux[0,0,1,0,0] hostname</i><
p
>
I could run this
:<br><br
>
<i>srun -n5 -m arbitrary -w tux[0,0,1,0,0] hostname</i><
br><br
>
Here is a simple perl script named arbitrary.pl that can be ran to easily lay
out tasks on nodes as they are in SLURM_NODELIST
<
p>
out tasks on nodes as they are in SLURM_NODELIST
.</
p>
<pre>
#!/usr/bin/perl
my @tasks = split(',', $ARGV[0]);
...
...
@@ -663,9 +668,9 @@ foreach my $task (@tasks) {
print $layout;
</pre>
We can now use this script in our srun line in this fashion.<
p
>
<i>srun -m arbitrary -n5 -w `arbitrary.pl 4,1` -l hostname</i><
p
>
This will layout 4 tasks on the first node in the allocation and 1
<p>
We can now use this script in our srun line in this fashion.<
br><br
>
<i>srun -m arbitrary -n5 -w `arbitrary.pl 4,1` -l hostname</i><
br><br
>
<p>
This will layout 4 tasks on the first node in the allocation and 1
task on the second node.</p>
<p><a name="hold"><b>21. How can I temporarily prevent a job from running
...
...
@@ -926,11 +931,10 @@ $ srun -p mic ./hello.mic
<br>
<p>
Slurm supports requeue jobs in done or failed state. Use the
command:
command:
</p>
<p align=left><b>scontrol requeue job_id</b></p>
</head>
</p>
The job will be requeued back in PENDING state and scheduled again.
<p>The job will be requeued back in PENDING state and scheduled again.
See man(1) scontrol.
</p>
<p>Consider a simple job like this:</p>
...
...
@@ -957,12 +961,10 @@ $->squeue
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
10 mira zoppo david R 0:03 1 alanz1
</pre>
<p>
Slurm supports requeuing jobs in hold state with the command:
<p>Slurm supports requeuing jobs in hold state with the command:</p>
<p align=left><b>'scontrol requeuehold job_id'</b></p>
The job can be in state RUNNING, SUSPENDED, COMPLETED or FAILED
before being requeued.
</p>
<p>The job can be in state RUNNING, SUSPENDED, COMPLETED or FAILED
before being requeued.</p>
<pre>
$->scontrol requeuehold 10
$->squeue
...
...
@@ -1929,6 +1931,6 @@ sacctmgr delete user name=adam cluster=tux account=chemistry
<p class="footer"><a href="#top">top</a></p>
<p style="text-align:center;">Last modified
25
April 2014</p>
<p style="text-align:center;">Last modified
30
April 2014</p>
<!--#include virtual="footer.txt"-->
This diff is collapsed.
Click to expand it.
src/squeue/print.c
+
13
−
6
View file @
541aa2b9
...
...
@@ -157,16 +157,21 @@ static bool _merge_job_array(List l, job_info_t * job_ptr)
return
merge
;
if
(
!
IS_JOB_PENDING
(
job_ptr
))
return
merge
;
if
(
IS_JOB_COMPLETING
(
job_ptr
))
return
merge
;
xfree
(
job_ptr
->
node_inx
);
if
(
!
l
)
return
merge
;
iter
=
list_iterator_create
(
l
);
while
((
list_job_ptr
=
list_next
(
iter
)))
{
if
((
list_job_ptr
->
array_task_id
==
NO_VAL
)
||
(
job_ptr
->
array_job_id
!=
list_job_ptr
->
array_job_id
)
||
(
!
IS_JOB_PENDING
(
list_job_ptr
)))
if
((
list_job_ptr
->
array_task_id
==
NO_VAL
)
||
(
job_ptr
->
array_job_id
!=
list_job_ptr
->
array_job_id
)
||
(
!
IS_JOB_PENDING
(
list_job_ptr
))
||
(
IS_JOB_COMPLETING
(
list_job_ptr
)))
continue
;
/* We re-purpose the job's node_inx array to store the
* array_task_id values */
if
(
!
list_job_ptr
->
node_inx
)
{
...
...
@@ -396,9 +401,11 @@ int _print_job_job_id(job_info_t * job, int width, bool right, char* suffix)
{
if
(
job
==
NULL
)
{
/* Print the Header instead */
_print_str
(
"JOBID"
,
width
,
right
,
true
);
}
else
if
((
job
->
array_task_id
!=
NO_VAL
)
&&
!
params
.
array_flag
&&
IS_JOB_PENDING
(
job
)
&&
job
->
node_inx
)
{
}
else
if
((
job
->
array_task_id
!=
NO_VAL
)
&&
!
params
.
array_flag
&&
IS_JOB_PENDING
(
job
)
&&
job
->
node_inx
&&
(
!
IS_JOB_COMPLETING
(
job
)))
{
uint32_t
i
,
local_width
=
width
,
max_task_id
=
0
;
char
*
id
,
*
task_str
;
bitstr_t
*
task_bits
;
...
...
This diff is collapsed.
Click to expand it.
Preview
0%
Loading
Try again
or
attach a new file
.
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Save comment
Cancel
Please
register
or
sign in
to comment