Skip to content
GitLab
Explore
Sign in
Primary navigation
Search or go to…
Project
S
Slurm
Manage
Activity
Members
Labels
Plan
Issues
Issue boards
Milestones
Code
Merge requests
Repository
Branches
Commits
Tags
Repository graph
Compare revisions
Snippets
Build
Pipelines
Jobs
Pipeline schedules
Artifacts
Deploy
Releases
Package Registry
Model registry
Operate
Environments
Terraform modules
Monitor
Incidents
Service Desk
Analyze
Value stream analytics
Contributor analytics
CI/CD analytics
Repository analytics
Model experiments
Help
Help
Support
GitLab documentation
Compare GitLab plans
Community forum
Contribute to GitLab
Provide feedback
Terms and privacy
Keyboard shortcuts
?
Snippets
Groups
Projects
Show more breadcrumbs
tud-zih-energy
Slurm
Commits
2d28ef52
Commit
2d28ef52
authored
11 years ago
by
Morris Jette
Browse files
Options
Downloads
Patches
Plain Diff
Update switch documentation to clarify most calls are for a job step
Rather than a job
parent
0e4f975a
No related branches found
No related tags found
No related merge requests found
Changes
1
Hide whitespace changes
Inline
Side-by-side
Showing
1 changed file
doc/html/switchplugins.shtml
+42
-42
42 additions, 42 deletions
doc/html/switchplugins.shtml
with
42 additions
and
42 deletions
doc/html/switchplugins.shtml
+
42
−
42
View file @
2d28ef52
...
@@ -36,8 +36,8 @@ for sample implementations of a SLURM switch plugin.</p>
...
@@ -36,8 +36,8 @@ for sample implementations of a SLURM switch plugin.</p>
<h2>Data Objects</h2>
<h2>Data Objects</h2>
<p> The implementation must support two opaque data classes.
<p> The implementation must support two opaque data classes.
One is used as an job's switch "credential."
One is used as an job
step
's switch "credential."
This class must encapsulate all job
-
specific information necessary
This class must encapsulate all job
step
specific information necessary
for the operation of the API specification below.
for the operation of the API specification below.
The second is a node's switch state record.
The second is a node's switch state record.
Both data classes are referred to in SLURM code using an anonymous
Both data classes are referred to in SLURM code using an anonymous
...
@@ -116,7 +116,7 @@ switch state information on a periodic basis.</p>
...
@@ -116,7 +116,7 @@ switch state information on a periodic basis.</p>
<p class="commandline">int switch_p_clear_node_state (void);</p>
<p class="commandline">int switch_p_clear_node_state (void);</p>
<p style="margin-left:.2in"><b>Description</b>: Initialize node state.
<p style="margin-left:.2in"><b>Description</b>: Initialize node state.
If any switch state has previously been established for a job, it will be cleared.
If any switch state has previously been established for a job
step
, it will be cleared.
This will be used to establish a "clean" state for the switch on the node upon
This will be used to establish a "clean" state for the switch on the node upon
which it is executed.</p>
which it is executed.</p>
<p style="margin-left:.2in"><b>Returns</b>: SLURM_SUCCESS if successful. On failure,
<p style="margin-left:.2in"><b>Returns</b>: SLURM_SUCCESS if successful. On failure,
...
@@ -192,10 +192,10 @@ of buf in bytes.</p>
...
@@ -192,10 +192,10 @@ of buf in bytes.</p>
<h3>Job's Switch Credential Management Functions</h3>
<h3>Job's Switch Credential Management Functions</h3>
<p class="commandline">int switch_p_alloc_jobinfo(switch_jobinfo_t *switch_job);</p>
<p class="commandline">int switch_p_alloc_jobinfo(switch_jobinfo_t *switch_job);</p>
<p style="margin-left:.2in"><b>Description</b>: Allocate storage for a job's switch credential.
<p style="margin-left:.2in"><b>Description</b>: Allocate storage for a job
step
's switch credential.
It is recommended that the credential contain a magic number for validation purposes.</p>
It is recommended that the credential contain a magic number for validation purposes.</p>
<p style="margin-left:.2in"><b>Arguments</b>:<span class="commandline"> switch_job</span>
<p style="margin-left:.2in"><b>Arguments</b>:<span class="commandline"> switch_job</span>
(output) location for writing location of job's switch credential.</p>
(output) location for writing location of job
step
's switch credential.</p>
<p style="margin-left:.2in"><b>Returns</b>: SLURM_SUCCESS if successful. On failure,
<p style="margin-left:.2in"><b>Returns</b>: SLURM_SUCCESS if successful. On failure,
the plugin should return SLURM_ERROR and set the errno to an appropriate value
the plugin should return SLURM_ERROR and set the errno to an appropriate value
to indicate the reason for failure.</p>
to indicate the reason for failure.</p>
...
@@ -208,14 +208,14 @@ It is recommended that the credential's magic number be validated.</p>
...
@@ -208,14 +208,14 @@ It is recommended that the credential's magic number be validated.</p>
<span class="commandline">switch_job</span> (input/output) Job's
<span class="commandline">switch_job</span> (input/output) Job's
switch credential to be updated<br>
switch credential to be updated<br>
<span class="commandline">nodelist</span> (input) List of nodes
<span class="commandline">nodelist</span> (input) List of nodes
allocated to the job. This may contain expressions to specify node ranges (e.g.
allocated to the job
step
. This may contain expressions to specify node ranges (e.g.
"linux[1-20]" or "linux[2,4,6,8]").<br>
"linux[1-20]" or "linux[2,4,6,8]").<br>
<span class="commandline">tasks_per_node</span> (input) count
<span class="commandline">tasks_per_node</span> (input) count
of processes per node to be initiated as part of the job.<br>
of processes per node to be initiated as part of the job
step
.<br>
<span class="commandline">tids</span> (input) List of task
<span class="commandline">tids</span> (input) List of task
IDs to be initiated. The first array index is the node ID. The second array
IDs to be initiated. The first array index is the node ID. The second array
index ranges from 0 to tasks_per_node of that node ID minus 1.<br>
index ranges from 0 to tasks_per_node of that node ID minus 1.<br>
<span class="commandline">network</span> (input) Job's network
<span class="commandline">network</span> (input) Job
step
's network
specification from srun command. </p>
specification from srun command. </p>
<p style="margin-left:.2in"><b>Returns</b>: SLURM_SUCCESS if successful. On failure,
<p style="margin-left:.2in"><b>Returns</b>: SLURM_SUCCESS if successful. On failure,
the plugin should return SLURM_ERROR and set the errno to an appropriate value
the plugin should return SLURM_ERROR and set the errno to an appropriate value
...
@@ -225,23 +225,23 @@ to indicate the reason for failure.</p>
...
@@ -225,23 +225,23 @@ to indicate the reason for failure.</p>
<p style="margin-left:.2in"><b>Description</b>: Allocate storage for a job's switch credential
<p style="margin-left:.2in"><b>Description</b>: Allocate storage for a job's switch credential
and copy an existing credential to that location.</p>
and copy an existing credential to that location.</p>
<p style="margin-left:.2in"><b>Arguments</b>:<span class="commandline"> switch_job</span>
<p style="margin-left:.2in"><b>Arguments</b>:<span class="commandline"> switch_job</span>
(input) an existing job switch credential.</p>
(input) an existing job
step
switch credential.</p>
<p style="margin-left:.2in"><b>Returns</b>: A newly allocated job switch
credential containing a
<p style="margin-left:.2in"><b>Returns</b>: A newly allocated job s
tep s
witch
copy of the function argument.</p>
credential containing a
copy of the function argument.</p>
<p class="commandline">void switch_p_free_jobinfo (switch_jobinfo_t switch_job);</p>
<p class="commandline">void switch_p_free_jobinfo (switch_jobinfo_t switch_job);</p>
<p style="margin-left:.2in"><b>Description</b>: Release the storage associated with a job's
<p style="margin-left:.2in"><b>Description</b>: Release the storage associated with a job's
switch credential.</p>
switch credential.</p>
<p style="margin-left:.2in"><b>Arguments</b>:<span class="commandline"> switch_job</span>
<p style="margin-left:.2in"><b>Arguments</b>:<span class="commandline"> switch_job</span>
(input) an existing job switch credential.</p>
(input) an existing job
step
switch credential.</p>
<p style="margin-left:.2in"><b>Returns</b>: None</p>
<p style="margin-left:.2in"><b>Returns</b>: None</p>
<p class="commandline">int switch_p_pack_jobinfo (switch_jobinfo_t switch_job, Buf buffer);</p>
<p class="commandline">int switch_p_pack_jobinfo (switch_jobinfo_t switch_job, Buf buffer);</p>
<p style="margin-left:.2in"><b>Description</b>: Pack the data associated with a job's
<p style="margin-left:.2in"><b>Description</b>: Pack the data associated with a job
step
's
switch credential into a buffer for network transmission.</p>
switch credential into a buffer for network transmission.</p>
<p style="margin-left:.2in"><b>Arguments</b>:<br>
<p style="margin-left:.2in"><b>Arguments</b>:<br>
<span class="commandline"> switch_job</span> (input) an
existing job
<span class="commandline"> switch_job</span> (input) an
switch credential.<br>
existing job step
switch credential.<br>
<span class="commandline"> buffer</span> (input/output) buffer onto
<span class="commandline"> buffer</span> (input/output) buffer onto
which the credential's contents are appended.</p>
which the credential's contents are appended.</p>
<p style="margin-left:.2in"><b>Returns</b>:
<p style="margin-left:.2in"><b>Returns</b>:
...
@@ -254,7 +254,7 @@ to indicate the reason for failure.</p>
...
@@ -254,7 +254,7 @@ to indicate the reason for failure.</p>
switch credential from a buffer.</p>
switch credential from a buffer.</p>
<p style="margin-left:.2in"><b>Arguments</b>:<br>
<p style="margin-left:.2in"><b>Arguments</b>:<br>
<span class="commandline"> switch_job</span> (input/output) a previously
<span class="commandline"> switch_job</span> (input/output) a previously
allocated job switch credential to be filled in with data read from the buffer.<br>
allocated job
step
switch credential to be filled in with data read from the buffer.<br>
<span class="commandline"> buffer</span> (input/output) buffer from
<span class="commandline"> buffer</span> (input/output) buffer from
which the credential's contents are read.</p>
which the credential's contents are read.</p>
<p style="margin-left:.2in"><b>Returns</b>: SLURM_SUCCESS if successful. On failure,
<p style="margin-left:.2in"><b>Returns</b>: SLURM_SUCCESS if successful. On failure,
...
@@ -279,10 +279,10 @@ char *nodelist);</p>
...
@@ -279,10 +279,10 @@ char *nodelist);</p>
with the specified nodelist has completed execution.</p>
with the specified nodelist has completed execution.</p>
<p style="margin-left:.2in"><b>Arguments</b>:<br>
<p style="margin-left:.2in"><b>Arguments</b>:<br>
<span class="commandline"> switch_job</span> (input)
<span class="commandline"> switch_job</span> (input)
The completed job's switch credential.<br>
The completed job
step
's switch credential.<br>
<span class="commandline"> nodelist</span> (input) A list of nodes
<span class="commandline"> nodelist</span> (input) A list of nodes
on which the job has completed. This may contain expressions to specify
node ranges.
on which the job
step
has completed. This may contain expressions to specify
(e.g. "linux[1-20]" or "linux[2,4,6,8]").</p>
node ranges.
(e.g. "linux[1-20]" or "linux[2,4,6,8]").</p>
<p style="margin-left:.2in"><b>Returns</b>: SLURM_SUCCESS if successful. On failure,
<p style="margin-left:.2in"><b>Returns</b>: SLURM_SUCCESS if successful. On failure,
the plugin should return SLURM_ERROR and set the errno to an appropriate value
the plugin should return SLURM_ERROR and set the errno to an appropriate value
to indicate the reason for failure.</p>
to indicate the reason for failure.</p>
...
@@ -335,11 +335,11 @@ bytes</p>
...
@@ -335,11 +335,11 @@ bytes</p>
<p class="commandline">int switch_p_get_data_jobinfo(switch_jobinfo_t switch_job,
<p class="commandline">int switch_p_get_data_jobinfo(switch_jobinfo_t switch_job,
int key, void *resulting_data);</p>
int key, void *resulting_data);</p>
<p style="margin-left:.2in"><b>Description</b>: Get data from a job's
<p style="margin-left:.2in"><b>Description</b>: Get data from a job
step
's
switch credential.</p>
switch credential.</p>
<p style="margin-left:.2in"><b>Arguments</b>:<br>
<p style="margin-left:.2in"><b>Arguments</b>:<br>
<span class="commandline"> switch_job</span> (input) a job
's
<span class="commandline"> switch_job</span> (input) a job
switch credential.<br>
step's
switch credential.<br>
<span class="commandline"> key</span> (input) identification
<span class="commandline"> key</span> (input) identification
of the type of data to be retrieved from the switch credential. NOTE: The
of the type of data to be retrieved from the switch credential. NOTE: The
interpretation of this key is dependent upon the switch type. <br>
interpretation of this key is dependent upon the switch type. <br>
...
@@ -370,7 +370,7 @@ the plugin should return SLURM_ERROR and set the errno to an appropriate value
...
@@ -370,7 +370,7 @@ the plugin should return SLURM_ERROR and set the errno to an appropriate value
to indicate the reason for failure.</p>
to indicate the reason for failure.</p>
<p class="footer"><a href="#top">top</a></p>
<p class="footer"><a href="#top">top</a></p>
<h3>Job Management Functions</h3>
<h3>Job
Step
Management Functions</h3>
<pre>
<pre>
=========================================================================
=========================================================================
Process 1 (root) Process 2 (root, user) | Process 3 (user task)
Process 1 (root) Process 2 (root, user) | Process 3 (user task)
...
@@ -387,7 +387,7 @@ switch_p_job_postfini |
...
@@ -387,7 +387,7 @@ switch_p_job_postfini |
<p class="commandline">int switch_p_job_preinit (switch_jobinfo_t jobinfo switch_job);</p>
<p class="commandline">int switch_p_job_preinit (switch_jobinfo_t jobinfo switch_job);</p>
<p style="margin-left:.2in"><b>Description</b>: Preinit is run as root in the first slurmd process,
<p style="margin-left:.2in"><b>Description</b>: Preinit is run as root in the first slurmd process,
the so called job manager. This function can be used to perform any initialization
the so called job
step
manager. This function can be used to perform any initialization
that needs to be performed in the same process as switch_p_job_fini().</p>
that needs to be performed in the same process as switch_p_job_fini().</p>
<p style="margin-left:.2in"><b>Arguments</b>:
<p style="margin-left:.2in"><b>Arguments</b>:
<span class="commandline"> switch_job</span> (input) a job's
<span class="commandline"> switch_job</span> (input) a job's
...
@@ -402,10 +402,10 @@ This function is run from the second slurmd process (some interconnect implement
...
@@ -402,10 +402,10 @@ This function is run from the second slurmd process (some interconnect implement
may require the switch_p_job_init functions to be executed from a separate process
may require the switch_p_job_init functions to be executed from a separate process
than the process executing switch_p_job_fini() [e.g. Quadrics Elan]).</p>
than the process executing switch_p_job_fini() [e.g. Quadrics Elan]).</p>
<p style="margin-left:.2in"><b>Arguments</b>:<br>
<p style="margin-left:.2in"><b>Arguments</b>:<br>
<span class="commandline"> switch_job</span> (input) a job
's
<span class="commandline"> switch_job</span> (input) a job
switch credential.<br>
step's
switch credential.<br>
<span class="commandline"> uid</span> (input) the user id
<span class="commandline"> uid</span> (input) the user id
to execute a job.</p>
to execute a job
step
.</p>
<p style="margin-left:.2in"><b>Returns</b>: SLURM_SUCCESS if successful. On failure,
<p style="margin-left:.2in"><b>Returns</b>: SLURM_SUCCESS if successful. On failure,
the plugin should return SLURM_ERROR and set the errno to an appropriate value
the plugin should return SLURM_ERROR and set the errno to an appropriate value
to indicate the reason for failure.</p>
to indicate the reason for failure.</p>
...
@@ -419,16 +419,16 @@ environment variables here).</p>
...
@@ -419,16 +419,16 @@ environment variables here).</p>
<span class="commandline"> switch_job</span> (input) a job's
<span class="commandline"> switch_job</span> (input) a job's
switch credential.<br>
switch credential.<br>
<span class="commandline"> env</span> (input/output) the
<span class="commandline"> env</span> (input/output) the
environment variables to be set upon job initiation. Switch specific
environment
environment variables to be set upon job
step
initiation. Switch specific
variables are added as needed.<br>
environment
variables are added as needed.<br>
<span class="commandline"> nodeid</span> (input) zero-origin
<span class="commandline"> nodeid</span> (input) zero-origin
id of this node.<br>
id of this node.<br>
<span class="commandline"> procid</span> (input) zero-origin
<span class="commandline"> procid</span> (input) zero-origin
process id local to slurmd and <b>not</b> equivalent to the global task id or MPI rank.<br>
process id local to slurmd and <b>not</b> equivalent to the global task id or MPI rank.<br>
<span class="commandline"> nnodes</span> (input) count of
<span class="commandline"> nnodes</span> (input) count of
nodes allocated to this job.<br>
nodes allocated to this job
step
.<br>
<span class="commandline"> nprocs</span> (input) total count of
<span class="commandline"> nprocs</span> (input) total count of
processes or tasks to be initiated for this job.<br>
processes or tasks to be initiated for this job
step
.<br>
<span class="commandline"> rank</span> (input) zero-origin
<span class="commandline"> rank</span> (input) zero-origin
id of this task.</p>
id of this task.</p>
<p style="margin-left:.2in"><b>Returns</b>: SLURM_SUCCESS if successful. On failure,
<p style="margin-left:.2in"><b>Returns</b>: SLURM_SUCCESS if successful. On failure,
...
@@ -438,10 +438,10 @@ to indicate the reason for failure.</p>
...
@@ -438,10 +438,10 @@ to indicate the reason for failure.</p>
<p class="commandline">int switch_p_job_fini (switch_jobinfo_t jobinfo switch_job);</p>
<p class="commandline">int switch_p_job_fini (switch_jobinfo_t jobinfo switch_job);</p>
<p style="margin-left:.2in"><b>Description</b>: This function is run from the same process
<p style="margin-left:.2in"><b>Description</b>: This function is run from the same process
as switch_p_job_init() after all job tasks have exited. It is *not* run as root, because
as switch_p_job_init() after all job tasks have exited. It is *not* run as root, because
the process in question has already setuid to the job owner.</p>
the process in question has already setuid to the job
step
owner.</p>
<p style="margin-left:.2in"><b>Arguments</b>:
<p style="margin-left:.2in"><b>Arguments</b>:
<span class="commandline"> switch_job</span> (input) a job
's
<span class="commandline"> switch_job</span> (input) a job
switch credential.</p>
step's
switch credential.</p>
<p style="margin-left:.2in"><b>Returns</b>: SLURM_SUCCESS if successful. On failure,
<p style="margin-left:.2in"><b>Returns</b>: SLURM_SUCCESS if successful. On failure,
the plugin should return SLURM_ERROR and set the errno to an appropriate value
the plugin should return SLURM_ERROR and set the errno to an appropriate value
to indicate the reason for failure.</p>
to indicate the reason for failure.</p>
...
@@ -452,8 +452,8 @@ uid_t pgid, uint32_t job_id, uint32_t step_id );</p>
...
@@ -452,8 +452,8 @@ uid_t pgid, uint32_t job_id, uint32_t step_id );</p>
process (same process as switch_p_job_preinit()), and is run as root. Any cleanup routines
process (same process as switch_p_job_preinit()), and is run as root. Any cleanup routines
that need to be run with root privileges should be run from this function.</p>
that need to be run with root privileges should be run from this function.</p>
<p style="margin-left:.2in"><b>Arguments</b>:<br>
<p style="margin-left:.2in"><b>Arguments</b>:<br>
<span class="commandline"> switch_job</span> (input) a job
's
<span class="commandline"> switch_job</span> (input) a job
switch credential.<br>
step's
switch credential.<br>
<span class="commandline"> pgid</span> (input) The process
<span class="commandline"> pgid</span> (input) The process
group id associated with this task.<br>
group id associated with this task.<br>
<span class="commandline"> job_id</span> (input) the
<span class="commandline"> job_id</span> (input) the
...
@@ -489,9 +489,9 @@ to indicate the reason for failure.</p>
...
@@ -489,9 +489,9 @@ to indicate the reason for failure.</p>
<p style="margin-left:.2in"><b>Description</b>: Determine if a specific job
<p style="margin-left:.2in"><b>Description</b>: Determine if a specific job
step can be preempted.</p>
step can be preempted.</p>
<p style="margin-left:.2in"><b>Arguments</b>:<br>
<p style="margin-left:.2in"><b>Arguments</b>:<br>
<span class="commandline"> switch_job</span> (input) a job
's
<span class="commandline"> switch_job</span> (input) a job
switch credential.</p>
step's
switch credential.</p>
<p style="margin-left:.2in"><b>Returns</b>: SLURM_SUCCESS if the job can be
<p style="margin-left:.2in"><b>Returns</b>: SLURM_SUCCESS if the job
step
can be
preempted and SLURM_ERROR otherwise.</p>
preempted and SLURM_ERROR otherwise.</p>
<p class="commandline">void switch_p_job_suspend_info_get(switch_jobinfo_t *switch_job,
<p class="commandline">void switch_p_job_suspend_info_get(switch_jobinfo_t *switch_job,
...
@@ -509,7 +509,7 @@ for addition function call (i.e. for each addition job step).</p>
...
@@ -509,7 +509,7 @@ for addition function call (i.e. for each addition job step).</p>
<p class="commandline">void switch_p_job_suspend_info_pack(void *suspend_info, Buf buffer);</p>
<p class="commandline">void switch_p_job_suspend_info_pack(void *suspend_info, Buf buffer);</p>
<p style="margin-left:.2in"><b>Description</b>: Pack the information needed
<p style="margin-left:.2in"><b>Description</b>: Pack the information needed
for a job
step
to be preempted into a buffer</p>
for a job to be preempted into a buffer</p>
<p style="margin-left:.2in"><b>Arguments</b>:<br>
<p style="margin-left:.2in"><b>Arguments</b>:<br>
<span class="commandline"> suspend_info</span> (input)
<span class="commandline"> suspend_info</span> (input)
information needed for a job to be preempted, including information for all
information needed for a job to be preempted, including information for all
...
@@ -519,7 +519,7 @@ the buffer that has suspend_info added to it.</p>
...
@@ -519,7 +519,7 @@ the buffer that has suspend_info added to it.</p>
<p class="commandline">int switch_p_job_suspend_info_unpack(void **suspend_info, Buf buffer);</p>
<p class="commandline">int switch_p_job_suspend_info_unpack(void **suspend_info, Buf buffer);</p>
<p style="margin-left:.2in"><b>Description</b>: Unpack the information needed
<p style="margin-left:.2in"><b>Description</b>: Unpack the information needed
for a job
step
to be preempted from a buffer.<br>
for a job to be preempted from a buffer.<br>
<b>NOTE</b>: Use switch_p_job_suspend_info_free() to free the opaque data structure.</p>
<b>NOTE</b>: Use switch_p_job_suspend_info_free() to free the opaque data structure.</p>
<p style="margin-left:.2in"><b>Arguments</b>:<br>
<p style="margin-left:.2in"><b>Arguments</b>:<br>
<span class="commandline"> suspend_info</span> (output)
<span class="commandline"> suspend_info</span> (output)
...
@@ -590,6 +590,6 @@ plugin that transmitted it. It is at the discretion of the plugin author whether
...
@@ -590,6 +590,6 @@ plugin that transmitted it. It is at the discretion of the plugin author whether
to maintain data format compatibility across different versions of the plugin.</p>
to maintain data format compatibility across different versions of the plugin.</p>
<p class="footer"><a href="#top">top</a></p>
<p class="footer"><a href="#top">top</a></p>
<p style="text-align:center;">Last modified 6
August
201
2
</p>
<p style="text-align:center;">Last modified
2
6
June
201
3
</p>
<!--#include virtual="footer.txt"-->
<!--#include virtual="footer.txt"-->
This diff is collapsed.
Click to expand it.
Preview
0%
Loading
Try again
or
attach a new file
.
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Save comment
Cancel
Please
register
or
sign in
to comment