Skip to content
Snippets Groups Projects
Commit 541aa2b9 authored by Morris Jette's avatar Morris Jette
Browse files

Merge branch 'slurm-14.03'

parents d7d055cc 90a2ab79
No related branches found
No related tags found
No related merge requests found
......@@ -66,6 +66,8 @@ documents those changes that are of interest to users and admins.
-- Fix segfault of sacct -c if spaces are in the variables.
-- Release held job only with "scontrol release <jobid>" and not by resetting
the job's priority. This is needed to support job arrays better.
-- Correct squeue command not to merge jobs with state pending and completing
together.
* Changes in Slurm 14.03.1-2
==========================
......
......@@ -178,13 +178,18 @@ launch a shell on a node in the job's allocation?</a></li>
Free Open Source Software (FOSS) does not mean that it is without cost.
It does mean that the you have access to the code so that you are free to
use it, study it, and/or enhance it.
These reasons contribute to Slurm (and FOSS in general) being subject to
active research and development worldwide, displacing proprietary software
in many environments.
If the software is large and complex, like Slurm or the Linux kernel,
then its use is not without cost.
If your work is important, you'll want the leading Slurm experts at your
then while there is no license fee, its use is not without cost.</p>
<p>If your work is important, you'll want the leading Slurm experts at your
disposal to keep your systems operating at peak efficiency.
While Slurm has a global development community incorporating leading edge
technology, <a href="http://www.schedmd.com">SchedMD</a> personnel have developed
most of the code and can provide competitively priced commercial support.
SchedMD works with various organizations to provide a range of support
options ranging from remote level-3 support to 24x7 on-site personnel.
Customers switching from commercial workload mangers to Slurm typically
report higher scalability, better performance and lower costs.</p>
......@@ -629,13 +634,13 @@ or <b>--distribution</b>' is 'arbitrary'. This means you can tell slurm to
layout your tasks in any fashion you want. For instance if I had an
allocation of 2 nodes and wanted to run 4 tasks on the first node and
1 task on the second and my nodes allocated from SLURM_NODELIST
where tux[0-1] my srun line would look like this.<p>
<i>srun -n5 -m arbitrary -w tux[0,0,0,0,1] hostname</i><p>
where tux[0-1] my srun line would look like this:<br><br>
<i>srun -n5 -m arbitrary -w tux[0,0,0,0,1] hostname</i><br><br>
If I wanted something similar but wanted the third task to be on tux 1
I could run this...<p>
<i>srun -n5 -m arbitrary -w tux[0,0,1,0,0] hostname</i><p>
I could run this:<br><br>
<i>srun -n5 -m arbitrary -w tux[0,0,1,0,0] hostname</i><br><br>
Here is a simple perl script named arbitrary.pl that can be ran to easily lay
out tasks on nodes as they are in SLURM_NODELIST<p>
out tasks on nodes as they are in SLURM_NODELIST.</p>
<pre>
#!/usr/bin/perl
my @tasks = split(',', $ARGV[0]);
......@@ -663,9 +668,9 @@ foreach my $task (@tasks) {
print $layout;
</pre>
We can now use this script in our srun line in this fashion.<p>
<i>srun -m arbitrary -n5 -w `arbitrary.pl 4,1` -l hostname</i><p>
This will layout 4 tasks on the first node in the allocation and 1
<p>We can now use this script in our srun line in this fashion.<br><br>
<i>srun -m arbitrary -n5 -w `arbitrary.pl 4,1` -l hostname</i><br><br>
<p>This will layout 4 tasks on the first node in the allocation and 1
task on the second node.</p>
<p><a name="hold"><b>21. How can I temporarily prevent a job from running
......@@ -926,11 +931,10 @@ $ srun -p mic ./hello.mic
<br>
<p>
Slurm supports requeue jobs in done or failed state. Use the
command:
command:</p>
<p align=left><b>scontrol requeue job_id</b></p>
</head>
</p>
The job will be requeued back in PENDING state and scheduled again.
<p>The job will be requeued back in PENDING state and scheduled again.
See man(1) scontrol.
</p>
<p>Consider a simple job like this:</p>
......@@ -957,12 +961,10 @@ $->squeue
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
10 mira zoppo david R 0:03 1 alanz1
</pre>
<p>
Slurm supports requeuing jobs in hold state with the command:
<p>Slurm supports requeuing jobs in hold state with the command:</p>
<p align=left><b>'scontrol requeuehold job_id'</b></p>
The job can be in state RUNNING, SUSPENDED, COMPLETED or FAILED
before being requeued.
</p>
<p>The job can be in state RUNNING, SUSPENDED, COMPLETED or FAILED
before being requeued.</p>
<pre>
$->scontrol requeuehold 10
$->squeue
......@@ -1929,6 +1931,6 @@ sacctmgr delete user name=adam cluster=tux account=chemistry
<p class="footer"><a href="#top">top</a></p>
<p style="text-align:center;">Last modified 25 April 2014</p>
<p style="text-align:center;">Last modified 30 April 2014</p>
<!--#include virtual="footer.txt"-->
......@@ -157,16 +157,21 @@ static bool _merge_job_array(List l, job_info_t * job_ptr)
return merge;
if (!IS_JOB_PENDING(job_ptr))
return merge;
if (IS_JOB_COMPLETING(job_ptr))
return merge;
xfree(job_ptr->node_inx);
if (!l)
return merge;
iter = list_iterator_create(l);
while ((list_job_ptr = list_next(iter))) {
if ((list_job_ptr->array_task_id == NO_VAL) ||
(job_ptr->array_job_id != list_job_ptr->array_job_id) ||
(!IS_JOB_PENDING(list_job_ptr)))
if ((list_job_ptr->array_task_id == NO_VAL)
|| (job_ptr->array_job_id != list_job_ptr->array_job_id)
|| (!IS_JOB_PENDING(list_job_ptr))
|| (IS_JOB_COMPLETING(list_job_ptr)))
continue;
/* We re-purpose the job's node_inx array to store the
* array_task_id values */
if (!list_job_ptr->node_inx) {
......@@ -396,9 +401,11 @@ int _print_job_job_id(job_info_t * job, int width, bool right, char* suffix)
{
if (job == NULL) { /* Print the Header instead */
_print_str("JOBID", width, right, true);
} else if ((job->array_task_id != NO_VAL) &&
!params.array_flag && IS_JOB_PENDING(job) &&
job->node_inx) {
} else if ((job->array_task_id != NO_VAL)
&& !params.array_flag
&& IS_JOB_PENDING(job)
&& job->node_inx
&& (!IS_JOB_COMPLETING(job))) {
uint32_t i, local_width = width, max_task_id = 0;
char *id, *task_str;
bitstr_t *task_bits;
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment