Skip to content
Snippets Groups Projects
Commit 43bb4daa authored by Morris Jette's avatar Morris Jette
Browse files

Add squeue filtering for jobs or steps with job array ID values

parent bd8732a6
No related branches found
No related tags found
No related merge requests found
...@@ -33,20 +33,6 @@ $ sbatch --array=1,3,5,7 -N1 tmp ...@@ -33,20 +33,6 @@ $ sbatch --array=1,3,5,7 -N1 tmp
$ sbatch --array=1-7:2 -N1 tmp $ sbatch --array=1-7:2 -N1 tmp
</pre> </pre>
<p>Slurm support for job arrays at this time does not use a meta-job data
structure, but creates a separate job record for each element of the array.
Two additional fields were added to Slurm's job record for managing job arrays.
The first new field is internally called "array_job_id" and is the job ID of
the first job in the array.
Subsequent elements of the job array will have a unique Slurm "job_id", but
all will have the same "array_job_id" value.
Some Slurm commands interpret the array_job_id as representing all elements of
the job array, while other commands use the unique job_id assigned to each.
Support for Slurm job arrays can be expected to improve in later releases.
The second new field is called "array_task_id" which is the job array index
value of the job array element.
More details about these new fields follow.</p>
<h2>Job ID and Environment Variables</h2> <h2>Job ID and Environment Variables</h2>
<p>Job arrays will have two additional environment variable set. <p>Job arrays will have two additional environment variable set.
...@@ -147,7 +133,17 @@ $ squeue ...@@ -147,7 +133,17 @@ $ squeue
1088_4 debug tmp mac R 0:03 1 tux3 1088_4 debug tmp mac R 0:03 1 tux3
</pre> </pre>
<p>Two additional job output formats have been added to squeue:<br> <p>The squeue --step/-s and --job/-j options can accept job or step
specifications of the same format.</p>
<pre>
$ squeue -j 1234_2,1234_3
...
$ squeue -s 1234_2.0,1234_3.0
...
</pre>
<p>Two additional job output format field options have been added to squeue:<br>
<b>%F</b> prints the array_job_id value<br> <b>%F</b> prints the array_job_id value<br>
<b>%K</b> prints the array_task_id value<br> <b>%K</b> prints the array_task_id value<br>
(all of the obvious letters to use were already assigned to other job fields).</p> (all of the obvious letters to use were already assigned to other job fields).</p>
...@@ -198,6 +194,19 @@ The default value of MaxArraySize is 1001. Be mindful about the value of ...@@ -198,6 +194,19 @@ The default value of MaxArraySize is 1001. Be mindful about the value of
MaxArraySize as job arrays offer an easy way for users to submit large numbers MaxArraySize as job arrays offer an easy way for users to submit large numbers
of jobs very quickly.</p> of jobs very quickly.</p>
<p>Slurm support for job arrays at this time does not use a meta-job data
structure, but creates a separate job record for each element of the array.
Two additional fields were added to Slurm's job record for managing job arrays.
The first new field is internally called "array_job_id" and is the job ID of
the first job in the array.
Subsequent elements of the job array will have a unique Slurm "job_id", but
all will have the same "array_job_id" value.
Some Slurm commands interpret the array_job_id as representing all elements of
the job array, while other commands use the unique job_id assigned to each.
Support for Slurm job arrays can be expected to improve in later releases.
The second new field is called "array_task_id" which is the job array index
value of the job array element.</p>
<p style="text-align:center;">Last modified 25 January 2013</p> <p style="text-align:center;">Last modified 25 January 2013</p>
<!--#include virtual="footer.txt"--> <!--#include virtual="footer.txt"-->
...@@ -51,7 +51,7 @@ By default, prints a time stamp with the header. ...@@ -51,7 +51,7 @@ By default, prints a time stamp with the header.
.TP .TP
\fB\-j <job_id_list>\fR, \fB\-\-jobs=<job_id_list>\fR \fB\-j <job_id_list>\fR, \fB\-\-jobs=<job_id_list>\fR
Requests a comma separated list of job ids to display. Defaults to all jobs. Requests a comma separated list of job IDs to display. Defaults to all jobs.
The \fB\-\-jobs=<job_id_list>\fR option may be used in conjunction with the The \fB\-\-jobs=<job_id_list>\fR option may be used in conjunction with the
\fB\-\-steps\fR option to print step information about specific jobs. \fB\-\-steps\fR option to print step information about specific jobs.
Note: If a list of job IDs is provided, the jobs are displayed even if Note: If a list of job IDs is provided, the jobs are displayed even if
...@@ -59,6 +59,7 @@ they are on hidden partitions. Since this option's argument is optional, ...@@ -59,6 +59,7 @@ they are on hidden partitions. Since this option's argument is optional,
for proper parsing the single letter option must be followed immediately for proper parsing the single letter option must be followed immediately
with the value and not include a space between them. For example "\-j1008" with the value and not include a space between them. For example "\-j1008"
and not "\-j 1008". and not "\-j 1008".
The job ID format is "job_id[_array_id]".
Performance of the command can be measurably improved for systems with large Performance of the command can be measurably improved for systems with large
numbers of jobs when a single job ID is specified. numbers of jobs when a single job ID is specified.
...@@ -373,7 +374,7 @@ Specify the reservation of the jobs to view. ...@@ -373,7 +374,7 @@ Specify the reservation of the jobs to view.
\fB\-s\fR, \fB\-\-steps\fR \fB\-s\fR, \fB\-\-steps\fR
Specify the job steps to view. This flag indicates that a comma separated list Specify the job steps to view. This flag indicates that a comma separated list
of job steps to view follows without an equal sign (see examples). of job steps to view follows without an equal sign (see examples).
The job step format is "job_id.step_id". Defaults to all job The job step format is "job_id[_array_id].step_id". Defaults to all job
steps. Since this option's argument is optional, for proper parsing steps. Since this option's argument is optional, for proper parsing
the single letter option must be followed immediately with the value the single letter option must be followed immediately with the value
and not include a space between them. For example "\-s1008.0" and not and not include a space between them. For example "\-s1008.0" and not
......
...@@ -904,7 +904,6 @@ _print_options(void) ...@@ -904,7 +904,6 @@ _print_options(void)
uint32_t *user; uint32_t *user;
enum job_states *state_id; enum job_states *state_id;
squeue_job_step_t *job_step_id; squeue_job_step_t *job_step_id;
uint32_t *job_id;
char hostlist[8192]; char hostlist[8192];
if (params.nodes) { if (params.nodes) {
...@@ -936,8 +935,15 @@ _print_options(void) ...@@ -936,8 +935,15 @@ _print_options(void)
if ((params.verbose > 1) && params.job_list) { if ((params.verbose > 1) && params.job_list) {
i = 0; i = 0;
iterator = list_iterator_create( params.job_list ); iterator = list_iterator_create( params.job_list );
while ( (job_id = list_next( iterator )) ) { while ( (job_step_id = list_next( iterator )) ) {
printf( "job_list[%d] = %u\n", i++, *job_id); if (job_step_id->array_id == (uint16_t) NO_VAL) {
printf( "job_list[%d] = %u\n", i++,
job_step_id->job_id );
} else {
printf( "job_list[%d] = %u_%u\n", i++,
job_step_id->job_id,
job_step_id->array_id );
}
} }
list_iterator_destroy( iterator ); list_iterator_destroy( iterator );
} }
...@@ -975,8 +981,16 @@ _print_options(void) ...@@ -975,8 +981,16 @@ _print_options(void)
i = 0; i = 0;
iterator = list_iterator_create( params.step_list ); iterator = list_iterator_create( params.step_list );
while ( (job_step_id = list_next( iterator )) ) { while ( (job_step_id = list_next( iterator )) ) {
printf( "step_list[%d] = %u.%u\n", i++, if (job_step_id->array_id == (uint16_t) NO_VAL) {
job_step_id->job_id, job_step_id->step_id ); printf( "step_list[%d] = %u.%u\n", i++,
job_step_id->job_id,
job_step_id->step_id );
} else {
printf( "step_list[%d] = %u_%u.%u\n", i++,
job_step_id->job_id,
job_step_id->array_id,
job_step_id->step_id );
}
} }
list_iterator_destroy( iterator ); list_iterator_destroy( iterator );
} }
...@@ -1003,9 +1017,9 @@ static List ...@@ -1003,9 +1017,9 @@ static List
_build_job_list( char* str ) _build_job_list( char* str )
{ {
List my_list; List my_list;
char *job = NULL, *tmp_char = NULL, *my_job_list = NULL; char *end_ptr, *job = NULL, *tmp_char = NULL, *my_job_list = NULL;
int i; int job_id, array_id;
uint32_t *job_id = NULL; squeue_job_step_t *job_step_id;
if ( str == NULL ) if ( str == NULL )
return NULL; return NULL;
...@@ -1013,14 +1027,20 @@ _build_job_list( char* str ) ...@@ -1013,14 +1027,20 @@ _build_job_list( char* str )
my_job_list = xstrdup( str ); my_job_list = xstrdup( str );
job = strtok_r( my_job_list, ",", &tmp_char ); job = strtok_r( my_job_list, ",", &tmp_char );
while (job) { while (job) {
i = strtol( job, (char **) NULL, 10 ); job_id = strtol( job, &end_ptr, 10 );
if (i <= 0) { if (end_ptr[0] == '_')
array_id = strtol( end_ptr + 1, &end_ptr, 10 );
else
array_id = (uint16_t) NO_VAL;
if (job_id <= 0) {
error( "Invalid job id: %s", job ); error( "Invalid job id: %s", job );
exit( 1 ); exit( 1 );
} }
job_id = xmalloc( sizeof( uint32_t ) );
*job_id = (uint32_t) i; job_step_id = xmalloc( sizeof( squeue_job_step_t ) );
list_append( my_list, job_id ); job_step_id->job_id = (uint32_t) job_id;
job_step_id->array_id = (uint16_t) array_id;
list_append( my_list, job_step_id );
job = strtok_r (NULL, ",", &tmp_char); job = strtok_r (NULL, ",", &tmp_char);
} }
return my_list; return my_list;
...@@ -1113,16 +1133,16 @@ _build_all_states_list( void ) ...@@ -1113,16 +1133,16 @@ _build_all_states_list( void )
/* /*
* _build_step_list- build a list of job/step_ids * _build_step_list- build a list of job/step_ids
* IN str - comma separated list of job_id.step_ids * IN str - comma separated list of job_id[array_id].step_id values
* RET List of job/step_ids (structure of uint32_t's) * RET List of job/step_ids (structure of uint32_t's)
*/ */
static List static List
_build_step_list( char* str ) _build_step_list( char* str )
{ {
List my_list; List my_list;
char *step = NULL, *tmp_char = NULL, *tmps_char = NULL; char *end_ptr, *step = NULL, *tmp_char = NULL, *tmps_char = NULL;
char *job_name = NULL, *step_name = NULL, *my_step_list = NULL; char *job_name = NULL, *step_name = NULL, *my_step_list = NULL;
int i, j; int job_id, array_id, step_id;
squeue_job_step_t *job_step_id = NULL; squeue_job_step_t *job_step_id = NULL;
if ( str == NULL) if ( str == NULL)
...@@ -1130,25 +1150,29 @@ _build_step_list( char* str ) ...@@ -1130,25 +1150,29 @@ _build_step_list( char* str )
my_list = list_create( NULL ); my_list = list_create( NULL );
my_step_list = xstrdup( str ); my_step_list = xstrdup( str );
step = strtok_r( my_step_list, ",", &tmp_char ); step = strtok_r( my_step_list, ",", &tmp_char );
while (step) while (step) {
{
job_name = strtok_r( step, ".", &tmps_char ); job_name = strtok_r( step, ".", &tmps_char );
step_name = strtok_r( NULL, ".", &tmps_char ); step_name = strtok_r( NULL, ".", &tmps_char );
i = strtol( job_name, (char **) NULL, 10 ); job_id = strtol( job_name, &end_ptr, 10 );
if (end_ptr[0] == '_')
array_id = strtol( end_ptr + 1, &end_ptr, 10 );
else
array_id = (uint16_t) NO_VAL;
if (step_name == NULL) { if (step_name == NULL) {
error ( "Invalid job_step id: %s.??", error ( "Invalid job_step id: %s.??",
job_name ); job_name );
exit( 1 ); exit( 1 );
} }
j = strtol( step_name, (char **) NULL, 10 ); step_id = strtol( step_name, &end_ptr, 10 );
if ((i <= 0) || (j < 0)) { if ((job_id <= 0) || (step_id < 0)) {
error( "Invalid job_step id: %s.%s", error( "Invalid job_step id: %s.%s",
job_name, step_name ); job_name, step_name );
exit( 1 ); exit( 1 );
} }
job_step_id = xmalloc( sizeof( squeue_job_step_t ) ); job_step_id = xmalloc( sizeof( squeue_job_step_t ) );
job_step_id->job_id = (uint32_t) i; job_step_id->job_id = (uint32_t) job_id;
job_step_id->step_id = (uint32_t) j; job_step_id->array_id = (uint16_t) array_id;
job_step_id->step_id = (uint32_t) step_id;
list_append( my_list, job_step_id ); list_append( my_list, job_step_id );
step = strtok_r( NULL, ",", &tmp_char); step = strtok_r( NULL, ",", &tmp_char);
} }
......
...@@ -1474,17 +1474,19 @@ static int _filter_job(job_info_t * job) ...@@ -1474,17 +1474,19 @@ static int _filter_job(job_info_t * job)
{ {
int filter; int filter;
ListIterator iterator; ListIterator iterator;
uint32_t *job_id, *user; uint32_t *user;
uint16_t *state_id; uint16_t *state_id;
char *account, *part, *qos, *name; char *account, *part, *qos, *name;
squeue_job_step_t *job_step_id;
if (params.job_list) { if (params.job_list) {
filter = 1; filter = 1;
iterator = list_iterator_create(params.job_list); iterator = list_iterator_create(params.job_list);
while ((job_id = list_next(iterator))) { while ((job_step_id = list_next(iterator))) {
if ((*job_id == job->job_id) || if (((job_step_id->array_id == (uint16_t) NO_VAL) &&
((job->array_task_id != (uint16_t) NO_VAL) && (job_step_id->job_id == job->array_job_id)) ||
(*job_id == job->array_job_id))) { ((job_step_id->array_id == job->array_task_id) &&
(job_step_id->job_id == job->array_job_id))) {
filter = 0; filter = 0;
break; break;
} }
...@@ -1622,7 +1624,7 @@ static int _filter_step(job_step_info_t * step) ...@@ -1622,7 +1624,7 @@ static int _filter_step(job_step_info_t * step)
{ {
int filter; int filter;
ListIterator iterator; ListIterator iterator;
uint32_t *job_id, *user; uint32_t *user;
char *part; char *part;
squeue_job_step_t *job_step_id; squeue_job_step_t *job_step_id;
...@@ -1632,10 +1634,11 @@ static int _filter_step(job_step_info_t * step) ...@@ -1632,10 +1634,11 @@ static int _filter_step(job_step_info_t * step)
if (params.job_list) { if (params.job_list) {
filter = 1; filter = 1;
iterator = list_iterator_create(params.job_list); iterator = list_iterator_create(params.job_list);
while ((job_id = list_next(iterator))) { while ((job_step_id = list_next(iterator))) {
if ((*job_id == step->job_id) || if (((job_step_id->array_id == (uint16_t) NO_VAL) &&
((step->array_task_id != (uint16_t) NO_VAL) && (job_step_id->job_id == step->array_job_id)) ||
(*job_id == step->array_job_id))) { ((job_step_id->array_id == step->array_task_id) &&
(job_step_id->job_id == step->array_job_id))) {
filter = 0; filter = 0;
break; break;
} }
...@@ -1663,8 +1666,12 @@ static int _filter_step(job_step_info_t * step) ...@@ -1663,8 +1666,12 @@ static int _filter_step(job_step_info_t * step)
filter = 1; filter = 1;
iterator = list_iterator_create(params.step_list); iterator = list_iterator_create(params.step_list);
while ((job_step_id = list_next(iterator))) { while ((job_step_id = list_next(iterator))) {
if ((job_step_id->job_id == step->job_id) && if (job_step_id->step_id != step->step_id)
(job_step_id->step_id == step->step_id)) { continue;
if (((job_step_id->array_id == (uint16_t) NO_VAL) &&
(job_step_id->job_id == step->array_job_id)) ||
((job_step_id->array_id == step->array_task_id) &&
(job_step_id->job_id == step->array_job_id))) {
filter = 0; filter = 0;
break; break;
} }
......
...@@ -70,11 +70,11 @@ ...@@ -70,11 +70,11 @@
#include "src/common/slurmdb_defs.h" #include "src/common/slurmdb_defs.h"
#include "src/squeue/print.h" #include "src/squeue/print.h"
struct job_step { typedef struct job_step {
uint32_t job_id; uint32_t job_id;
uint16_t array_id;
uint32_t step_id; uint32_t step_id;
}; } squeue_job_step_t;
typedef struct job_step squeue_job_step_t;
struct squeue_parameters { struct squeue_parameters {
bool all_flag; bool all_flag;
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment