Skip to content
Snippets Groups Projects
Commit a34a24b2 authored by Morris Jette's avatar Morris Jette
Browse files

Document job step resouce contention operation

parent 822b3da8
No related branches found
No related tags found
No related merge requests found
...@@ -331,6 +331,17 @@ P0 1: nid00012 ...@@ -331,6 +331,17 @@ P0 1: nid00012
P1 0: Wed Jul 5 16:23:07 MDT 2017 P1 0: Wed Jul 5 16:23:07 MDT 2017
</pre> </pre>
<p>If multiple srun commands are executed concurrently, this may result in resource
contention (e.g. memory limits preventing some job steps components from being
allocated resources because of two srun commands executing at the same time).
If the srun --pack-group option is used to create multiple job steps (for the
different components of a heterogeneous job), those job steps will be created
sequentially.
When multiple srun commmands execute at the same time, this may result in some
step allocations taking place, while others are delayed.
Only after all job step allocations have been granted will the application
being launched.</p>
<h2><a name="env_var">Environment Variables</a></h2> <h2><a name="env_var">Environment Variables</a></h2>
<p>Slurm environment variables will be set independently for each component of <p>Slurm environment variables will be set independently for each component of
...@@ -507,6 +518,6 @@ especially other heterogeneous jobs.</p> ...@@ -507,6 +518,6 @@ especially other heterogeneous jobs.</p>
<p class="footer"><a href="#top">top</a></p> <p class="footer"><a href="#top">top</a></p>
<p style="text-align:center;">Last modified 16 August 2017</p> <p style="text-align:center;">Last modified 17 August 2017</p>
<!--#include virtual="footer.txt"--> <!--#include virtual="footer.txt"-->
...@@ -798,7 +798,6 @@ extern int launch_p_step_wait(srun_job_t *job, bool got_alloc, opt_t *opt_local) ...@@ -798,7 +798,6 @@ extern int launch_p_step_wait(srun_job_t *job, bool got_alloc, opt_t *opt_local)
{ {
int rc = 0; int rc = 0;
//FIXME-PACK: should we create multiple steps in a single RPC or use threads?
slurm_step_launch_wait_finish(job->step_ctx); slurm_step_launch_wait_finish(job->step_ctx);
if ((MPIR_being_debugged == 0) && retry_step_begin && if ((MPIR_being_debugged == 0) && retry_step_begin &&
(retry_step_cnt < MAX_STEP_RETRIES)) { (retry_step_cnt < MAX_STEP_RETRIES)) {
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment