From 2e03f3893b380e47ed8b8e5b733570aeccae22ab Mon Sep 17 00:00:00 2001 From: Moe Jette <jette1@llnl.gov> Date: Fri, 10 May 2002 02:19:11 +0000 Subject: [PATCH] Major revision to programmer guide based upon API development workk for DPCS. --- doc/html/programmer.guide.html | 427 ++++++++++++++++++++++----------- 1 file changed, 289 insertions(+), 138 deletions(-) diff --git a/doc/html/programmer.guide.html b/doc/html/programmer.guide.html index 86f40604767..787759db017 100644 --- a/doc/html/programmer.guide.html +++ b/doc/html/programmer.guide.html @@ -10,8 +10,12 @@ fault-tolerant, and highly scalable cluster management and job scheduling system for Linux clusters of thousands of nodes. Components include machine status, partition management, job management, and scheduling modules. The design also -includes a scalable, general-purpose communication infrastructure. -SLURM requires no kernel modifications and is relatively self-contained. +includes a scalable, general-purpose communication infrastructure +(MONGO, to be described elsewhere). +SLURM requires no kernel modifications for it operation and is +relatively self-contained. +Initial target platforms include Red Hat Linux clusters with +Quadrics interconnect and the IBM Blue Gene product line. <h2>Overview</h2> There is a description of the components and their interactions available @@ -22,16 +26,72 @@ for Resource Management</a>. <a href="http://www.linuxhq.com/kernel/v2.4/doc/CodingStyle.html"> http://www.linuxhq.com/kernel/v2.4/doc/CodingStyle.html</a>. -<h2> API Modules</h2> +<p>Functions are divided into several catagories, each in its own +directory. The details of each directory's contents are proved +below. The directories are as follows: + +<dl> +<dt>api +<dd>Application program interfaces into the SLURM code. +Used to get or send SLURM information. + +<dt>common +<dd>General purpose functions for widespread use. + +<dt>popt +<dd>TBD + +<dt>scancel +<dd>User command to cancel a job or allocation. + +<dt>scontrol +<dd>Administrator command to manage SLURM. + +<dt>slurmctld +<dd>SLURM central manager code. + +<dt>slurmd +<dd>SLURM code to manage the nodes used for executing user applications +under the control of slurmctld. + +<dt>squeue +<dd>User command to get information on SLURM jobs and allocations + +<dt>srun +<dd>User command to submit a job, get an allocation, and/or initiation +a parallel job step. + +<dt>test +<dd>Functions for testing individual SLURM modules. +</dl> + +<h2>API Modules</h2> This directory contains modules supporting the SLURM API functions. +The APIs to get SLURM information accept a time-stamp. If the data +has not changed since the specified time, a return code will indicate +this and return no other data. Otherwise a data structure is returned +including its time-stamp, element count, and an array of structures +describing the state of the entity (node, job, partition, etc.). +Each of these functions also includes a corresponding function to +release all storage associated with the data structure. <dl> <dt>allocate.c -<dd>API to allocate resources for a job's initiation. +<dd>API to allocate resources for a job's initiation. +This creates a job entry and allocates resouces to it. +The resources can be claimed at a later time to actually +run a parallel job. If the requested resouces are not +currently available, the request will fail. <dt>build_info.c <dd>API to report SLURM build parameter values. +<dt>cancel.c +<dd>API to cancel (i.e. terminate) a running or pending job. + +<dt>job_info.c +<dd>API to report job state and configuration values. + <dt>node_info.c <dd>API to report node state and configuration values. @@ -41,10 +101,19 @@ This directory contains modules supporting the SLURM API functions. <dt>reconfigure.c <dd>API to request that slurmctld reload configuration information. +<dt>submit.c +<dd>API to submit a job to slurm. The job will be queued +for initiation when resources are available. + <dt>update_config.c <dd>API to update job, node or partition state information. </dl> +<i>Future components to include: job step support (a set of parallel +tasks associated with a job or allocation, multiple job steps may +execute in serial or parallel within an allocation), association of +an allocation with a job step, and resource accounting.</i> + <h2>Common Modules</h2> This directory contains modules of general use throughout the SLURM code. The modules are described below. @@ -53,6 +122,17 @@ The modules are described below. <dt>bits_bytes.c <dd>A collection of functions for processing bit maps and strings for parsing. +<dt>bits_bytes.h +<dd>Function definitions for bits_bytes.c. + +<dt>bitstring.c +<dd>A collection of functions for managing bitmaps. We use these for rapid +node management functions including: scheduling and associating partitions +and jobs with the nodes + +<dt>bitstring.h +<dd>Function definitions for bitstring.c. + <dt>list.c <dd>Module is a general purpose list manager. One can define a list, add and delete entries, search for entries, etc. @@ -60,13 +140,61 @@ list, add and delete entries, search for entries, etc. <dt>list.h <dd>Module contains definitions for list.c and documentation for its functions. +<dt>log.c +<dd>Module is a general purpose log manager. It can filter log messages +based upon severity and route them to stderr, syslog, or a log file. + +<dt>log.h +<dd>Module contains definitions for log.c and documentation for its functions. + +<dt>macros.h +<dd>General purpose SLURM macro definitions. + +<dt>pack.c +<dd>Module for packing and unpacking unsigned integers and strings +for transmission over the network. The unsigned integers are translated +to/from machine independent form. Strings are transmitted with a length +value. + +<dt>pack.h +<dd>Module contains definitions for pack.c and documentation for its functions. + +<dt>qsw.c +<dd>Functions for interacting with the Quadrics interconnect. + +<dt>qsw.h +<dd>Module contains definitions for qsw.c and documentation for its functions. + +<dt>strlcpy.c +<dd>TBD + <dt>slurm.h <dd>Definitions for common SLURM data structures and functions. <dt>slurmlib.h <dd>Definitions for SLURM API data structures and functions. -</dl> +This would be included in user applications loading with SLURM APIs. +<dt>xassert.c +<dd>TBD + +<dt>xassert.h +<dd>Module contains definitions for xassert.c and documentation for its functions. + +<dt>xmalloc.c +<dd>"Safe" memory management functions. Includes magic cooking to insure +that freed memory was infact allocated by its functions. + +<dt>xmalloc.h +<dd>Module contains definitions for xmalloc.c and documentation for its functions. + +<dt>xstring.c +<dd>A collection of functions for string manipulations with automatic expansion +of allocated memory on an as needed basis. + +<dt>xstring.h +<dd>Module contains definitions for xstring.c and documentation for its functions. +</dl> <h2>scancel Modules</h2> scancel is a command to cancel running or pending jobs. @@ -168,141 +296,164 @@ and 4 node sets rather than use the smaller sets). All functions described below can be issued from any node in the SLURM cluster. <dl> -<dt>void free_node_info(void); -<dd>Free the node information buffer (if allocated) -<dd>NOTE: Buffer is loaded by load_node and used by load_node_name. - -<dt>void free_part_info(void); -<dd>Free the partition information buffer (if allocated) -<dd>NOTE: Buffer is loaded by load_part and used by load_part_name. - -<dt>int get_job_info(TBD); -<dd>Function to be defined. - -<dt>int load_node(time_t *last_update_time); -<dd>Load the supplied node information buffer for use by info gathering APIs if -node records have changed since the time specified. -<dd>Input: Buffer - Pointer to node information buffer -<dd>Buffer_Size - size of Buffer -<dd>Output: Returns 0 if no error, EINVAL if the buffer is invalid, ENOMEM if malloc failure -<dd>NOTE: Buffer is loaded by load_node and freed by Free_Node_Info. - -<dt>int load_node_config(char *req_name, char *next_name, int *cpus, -int *real_memory, int *tmp_disk, int *weight, char *features, -char *partition, char *node_state); -<dd>Load the state information about the named node -<dd>Input: req_name - Name of the node for which information is requested -if "", then get info for the first node in list -<dd>next_name - Location into which the name of the next node is -stored, "" if no more -<dd>cpus, etc. - Pointers into which the information is to be stored -<dd>Output: next_name - Name of the next node in the list -<dd>cpus, etc. - The node's state information -<dd>Returns 0 on success, ENOENT if not found, or EINVAL if buffer is bad -<dd>NOTE: req_name, next_name, Partition, and NodeState must be declared by the -caller and have length MAX_NAME_LEN or larger. -Features must be declared by the caller and have length FEATURE_SIZE or larger -<dd>NOTE: Buffer is loaded by load_node and freed by Free_Node_Info. - -<dt>int load_part(time_t *last_update_time); -<dd>Update the partition information buffer for use by info gathering APIs if -partition records have changed since the time specified. -<dd>Input: last_update_time - Pointer to time of last buffer -<dd>Output: last_update_time - Time reset if buffer is updated -<dd>Returns 0 if no error, EINVAL if the buffer is invalid, ENOMEM if malloc failure -<dd>NOTE: Buffer is used by load_part_name and free by Free_Part_Info. - -<dt>int load_part_name(char *req_name, char *next_name, int *max_time, int *max_nodes, -int *total_nodes, int *total_cpus, int *key, int *state_up, int *shared, int *default, -char *nodes, char *allow_groups); -<dd>Load the state information about the named partition -<dd>Input: req_name - Name of the partition for which information is requested -if "", then get info for the first partition in list -<dd>next_name - Location into which the name of the next partition is -stored, "" if no more -<dd>max_time, etc. - Pointers into which the information is to be stored -<dd>Output: req_name - The partition's name is stored here -<dd>next_name - The name of the next partition in the list is stored here -<dd>max_time, etc. - The partition's state information -<dd>Returns 0 on success, ENOENT if not found, or EINVAL if buffer is bad -<dd>NOTE: req_name and next_name must be declared by caller with have length MAX_NAME_LEN or larger. -<dd>Nodes and AllowGroups must be declared by caller with length of FEATURE_SIZE or larger. -<dd>NOTE: Buffer is loaded by load_part and free by Free_Part_Info. - -<dt>int reconfigure(void); -<dd>Request that slurmctld re-read the configuration files -Output: Returns 0 on success, errno otherwise - -<dt>int slurm_allocate(char *spec, char **node_list); -<dd>Allocate nodes for a job with supplied contraints. -<dd>Input: spec - Specification of the job's constraints; -<dd>node_list - Place into which a node list pointer can be placed; -<dd>Output: node_list - List of allocated nodes; -<dd>Returns 0 if no error, EINVAL if the request is invalid, -EAGAIN if the request can not be satisfied at present; -<dd>NOTE: Acceptable specifications include: JobName=<name> NodeList=<list>, -Features=<features>, Groups=<groups>, Partition=<part_name>, Contiguous, -TotalCPUs=<number>, TotalNodes=<number>, MinCPUs=<number>, -MinMemory=<number>, MinTmpDisk=<number>, Key=<number>, Shared=<0|1> -<dd>NOTE: The calling function must free the allocated storage at node_list[0] - -<dt>void slurm_free_build_info(void); -<dd>Free the build information buffer (if allocated). -<dd>NOTE: Buffer is loaded by slurm_load_build and used by slurm_load_build_name. - -<dt>int slurm_get_key(? *key); -<dd>Load into the location key the value of an authorization key. -<dd>To be defined. - -<dt>int slurm_kill_job(int job_id); -<dd>Terminate the specified SLURM job. -<dd>TBD. - -<dt>int slurm_load_build(void); -<dd>Update the build information buffer for use by info gathering APIs -<dd>Output: Returns 0 if no error, EINVAL if the buffer is invalid, ENOMEM if malloc failure. - -<dt>int slurm_load_build_name(char *req_name, char *next_name, char *value); -<dd>Load the state information about the named build parameter -<dd>Input: req_name - Name of the parameter for which information is requested -if "", then get info for the first parameter in list -<dd>next_name - Location into which the name of the next parameter is -stored, "" if no more -<dd>value - Pointer to location into which the information is to be stored -<dd>Output: req_name - The parameter's name is stored here -<dd>next_name - The name of the next parameter in the list is stored here -<dd>value - The parameter's value is stored here -<dd>Returns 0 on success, ENOENT if not found, or EINVAL if buffer is bad -<dd>NOTE: req_name, next_name, and value must be declared by caller with have -length BUILD_SIZE or larger -<dd>NOTE: Buffer is loaded by slurm_load_build and freed by slurm_free_build_info. -<dd>See the <a href="admin.guide.html">SLURM administrator guide</a> -for valid build parameter names. - -<dt>int slurm_run_job(char *job_spec); -<dd>Initiate the job with the specification job_spec. -<dd>TBD. - -<dt>int slurm_signal_job(int job_id, int signal); -<dd>Send the specified signal to the specified SLURM job. -<dd>TBD. - -<dt>int slurm_transfer_resources(pid_t pid, int job_id); -<dd>Transfer the ownership of resources associated with the specified -<dd>TBD. - -<dt>int update(char *spec); -<dd>Request that slurmctld update its configuration per request -<dd>Input: A line containing configuration information per the configuration file format -<dd>Output: Returns 0 on success, errno otherwise - -<dt>int slurm_will_job_run(char *job_spec); -<dd>TBD. + +<dt>int slurm_load_build (time_t update_time, struct build_buffer **build_buffer_ptr); +<dd>If the SLURM build information has changed since <i>last_time</i>, then +download from slurmctld the current information. The information includes +the data's time of update, the machine on which is the primary slurmctld server, +pathname of the prolog program, pathname of the temporary file system, etc. +See slurmlib.h for a full description of the information available. +Execute slurm_free_build_info to release the memory allocated by slurm_load_build. + +<dt>void slurm_free_build_info (struct build_buffer *build_buffer_ptr); +<dd>Release memory allocated by the slurm_load_build function. + +<dt>int slurm_load_job (time_t update_time, struct job_buffer **job_buffer_ptr); +<dd>If any SLURM job information has changed since <i>last_time</i>, then +download from slurmctld the current information. The information includes +a count of job entries, and each job's name, job id, user id, allocated +nodes, etc. Included with the node information is an array of indecies +into the node table information as downloaded with slurm_load_node. +See slurmlib.h for a full description of the information available. +Execute slurm_free_job_info to release the memory allocated by slurm_load_job. + +<dt>void slurm_free_job_info (struct job_buffer *build_buffer_ptr); +<dd>Release memory allocated by the slurm_load_job function. + +<dt>int slurm_load_node (time_t update_time, struct node_buffer **node_buffer_ptr); +<dd>If any SLURM node information has changed since <i>last_time</i>, then +download from slurmctld the current information. The information includes +a count of node entries, and each node's name, real memory size, temporary +disk space, processor count, features, etc. +See slurmlib.h for a full description of the information available. +Execute slurm_free_node_info to release the memory allocated by slurm_load_node. + +<dt>void slurm_free_node_info (struct node_buffer *node_buffer_ptr); +<dd>Release memory allocated by the slurm_load_node function. + +<dt>int slurm_load_part (time_t update_time, struct part_buffer **part_buffer_ptr); +<dd>If any SLURM partition information has changed since <i>last_time</i>, then +download from slurmctld the current information. The information includes +a count of partition entries, and each partition's name, node count limit +(per job), time limit (per job), group access restrictions, associated +nodes etc. Included with the node information is an array of indecies +into the node table information as downloaded with slurm_load_node. +See slurmlib.h for a full description of the information available. +Execute slurm_free_part_info to release the memory allocated by slurm_load_part. + +<dt>void slurm_free_part_info (struct part_buffer *part_buffer_ptr); +<dd>Release memory allocated by the slurm_load_part function. + </dl> <h2>Examples of API Use</h2> -Please see the source code of scancel, scontrol, squeue, and srun for examples -of all APIs. + +<pre> +#include <stdio.h> +#include "slurmlib.h" + +int +main (int argc, char *argv[]) +{ + static time_t last_update_time = (time_t) NULL; + int error_code, i, j; + struct build_buffer *build_buffer_ptr = NULL; + struct build_table *build_table_ptr = NULL; + struct job_buffer *job_buffer_ptr = NULL; + struct job_table *job_ptr = NULL; + struct node_buffer *node_buffer_ptr = NULL; + struct node_table *node_ptr = NULL; + struct part_buffer *part_buffer_ptr = NULL; + struct part_table *part_ptr = NULL; + + + /* get and dump some build information */ + error_code = slurm_load_build (last_update_time, &build_buffer_ptr); + if (error_code) { + printf ("slurm_load_build error %d\n", error_code); + exit (1); + } + + build_table_ptr = build_buffer_ptr->build_table_ptr; + printf("backup_interval = %u\n", build_table_ptr->backup_interval); + printf("backup_location = %s\n", build_table_ptr->backup_location); + slurm_free_build_info (build_buffer_ptr); + + + /* get and dump some job information */ + error_code = slurm_load_job (last_update_time, &job_buffer_ptr); + if (error_code) { + printf ("slurm_load_job error %d\n", error_code); + exit (error_code); + } + + printf("Jobs updated at %lx, record count %d\n", + job_buffer_ptr->last_update, job_buffer_ptr->job_count); + job_ptr = job_buffer_ptr->job_table_ptr; + + for (i = 0; i < job_buffer_ptr->job_count; i++) { + printf ("JobId=%s UserId=%u\n", + job_ptr[i].job_id, job_ptr[i].user_id); + } + slurm_free_job_info (job_buffer_ptr); + + + /* get and dump some node information */ + error_code = slurm_load_node (last_update_time, &node_buffer_ptr); + if (error_code) { + printf ("slurm_load_node error %d\n", error_code); + exit (error_code); + + node_ptr = node_buffer_ptr->node_table_ptr; + for (i = 0; i < node_buffer_ptr->node_count; i++) { + printf ("NodeName=%s CPUs=%u\n", + node_ptr[i].name, node_ptr[i].cpus); + } + + + /* get and dump some partition information */ + /* note that we use the node information loaded above */ + /* we assume the node table entries have not changed since */ + /* loaded above (only updated on slurmctld reconfiguration) */ + error_code = slurm_load_part (last_update_time, &part_buffer_ptr); + if (error_code) { + printf ("slurm_load_part error %d\n", error_code); + exit (error_code); + } + printf("Partitions updated at %lx, record count %d\n", + part_buffer_ptr->last_update, part_buffer_ptr->part_count); + part_ptr = part_buffer_ptr->part_table_ptr; + + for (i = 0; i < part_buffer_ptr->part_count; i++) { + printf ("PartitionName=%s MaxTime=%u ", + part_ptr[i].name, part_ptr[i].max_time); + printf ("Nodes=%s AllowGroups=%s\n", + part_ptr[i].nodes, part_ptr[i].allow_groups); + printf (" NodeIndecies="); + for (j = 0; part_ptr[i].node_inx; j++) { + if (j > 0) + printf(",%d", part_ptr[i].node_inx[j]); + else + printf("%d", part_ptr[i].node_inx[j]); + if (part_ptr[i].node_inx[j] == -1) + break; + } + printf("\n NodeList="); + for (j = 0; part_ptr[i].node_inx; j+=2) { + if (part_ptr[i].node_inx[j] == -1) + break; + for (k = part_ptr[i].node_inx[j]; + k <= part_ptr[i].node_inx[j+1]; k++) { + printf("%s,", node_ptr[k].name); + } + } + printf("\n\n"); + } + slurm_free_node_info (node_buffer_ptr); + slurm_free_part_info (part_buffer_ptr); + exit (0); +} +</pre> <h2>To Do</h2> <ul> @@ -314,7 +465,7 @@ of all APIs. <hr> URL = http://www-lc.llnl.gov/dctg-lc/slurm/programmer.guide.html -<p>Last Modified April 15, 2002</p> +<p>Last Modified May 9, 2002</p> <address>Maintained by <a href="mailto:slurm-dev@lists.llnl.gov"> slurm-dev@lists.llnl.gov</a></address> </body> -- GitLab