diff --git a/doc/html/ext_sensorsplugins.shtml b/doc/html/ext_sensorsplugins.shtml index ee58c8ff7ee1b6b3470c73f35c17263f75fb28a3..1daef16373c0c24341a1b374e974e2621fdaeaa2 100644 --- a/doc/html/ext_sensorsplugins.shtml +++ b/doc/html/ext_sensorsplugins.shtml @@ -28,7 +28,7 @@ for the type of external sensors. We currently use RRD database. </ul> <p>The programmer is urged to study -<span class="commandline">src/plugins/ext_sensors/rrd</span> and +<span class="commandline">src/plugins/ext_sensors/rrd</span> and <span class="commandline">src/common/slurm_ext_sensors.c</span> for a sample implementation of a SLURM external sensors plugin. <p class="footer"><a href="#top">top</a> @@ -39,7 +39,7 @@ implemented must be stubbed. <p class="commandline">extern int ext_sensors_read_conf(void) <p style="margin-left:.2in"><b>Description</b>:<br> -Reads the external sensors plugin configuration file (ext_sensors.conf) +Reads the external sensors plugin configuration file (ext_sensors.conf) and populates the configuration structure. Called by the slurmctld daemon. <p style="margin-left:.2in"><b>Arguments</b>: <br> @@ -60,7 +60,7 @@ Called by the slurmctld daemon. <p class="commandline">extern int ext_sensors_p_update_component_data(void) <p style="margin-left:.2in"><b>Description</b>:<br> -Updates external sensors data for data types and component types as configured +Updates external sensors data for data types and component types as configured in ext_sensors.conf. Called by the slurmctld daemon. <p style="margin-left:.2in"><b>Arguments</b>: <br> @@ -100,9 +100,9 @@ plugin and the frequency at which to gather external sensors data.</p> </dl> <h2>Versioning</h2> -<p>This document describes version 1 of the SLURM External Sensors Plugin API. -Future releases of SLURM may revise this API. A, energy accounting plugin -conveys its ability to implement a particular API version using the mechanism +<p>This document describes version 1 of the SLURM External Sensors Plugin API. +Future releases of SLURM may revise this API. A, energy accounting plugin +conveys its ability to implement a particular API version using the mechanism outlined for SLURM plugins.</p> <p class="footer"><a href="#top">top</a> diff --git a/doc/man/man1/scontrol.1 b/doc/man/man1/scontrol.1 index 10d32d9399af1850ceb9ad73ad3cea6bb6ca9788..f248dd5517cd56a845ae2653d2aa5138330838c4 100644 --- a/doc/man/man1/scontrol.1 +++ b/doc/man/man1/scontrol.1 @@ -190,7 +190,7 @@ be released by the job's owner. .TP \fBnotify\fP \fIjob_id\fP \fImessage\fP -Send a message to standard error of the salloc or srun command or batch job +Send a message to standard error of the salloc or srun command or batch job associated with the specified \fIjob_id\fP. .TP @@ -274,10 +274,10 @@ Resume a previously suspended job. Also see \fBsuspend\fR. .TP \fBschedloglevel\fP \fILEVEL\fP Enable or disable scheduler logging. -\fILEVEL\fP may be "0", "1", "disable" or "enable". "0" has the same +\fILEVEL\fP may be "0", "1", "disable" or "enable". "0" has the same effect as "disable". "1" has the same effect as "enable". -This value is temporary and will be overwritten when the slurmctld -daemon reads the slurm.conf configuration file (e.g. when the daemon +This value is temporary and will be overwritten when the slurmctld +daemon reads the slurm.conf configuration file (e.g. when the daemon is restarted or \fBscontrol reconfigure\fR is executed) if the SlurmSchedLogLevel parameter is present. @@ -393,7 +393,7 @@ system administrator (also see the \fBhold\fP command). .TP \fBupdate\fP \fISPECIFICATION\fP -Update job, step, node, partition, or reservation configuration per the +Update job, step, node, partition, or reservation configuration per the supplied specification. \fISPECIFICATION\fP is in the same format as the Slurm configuration file and the output of the \fIshow\fP command described above. It may be desirable to execute the \fIshow\fP command (described above) on the @@ -442,13 +442,13 @@ Set the job's requirement for contiguous (consecutive) nodes to be allocated. Possible values are "YES" and "NO". .TP \fIDependency\fP=<dependency_list> -Defer job's initiation until specified job dependency specification +Defer job's initiation until specified job dependency specification is satisfied. Cancel dependency with an empty dependency_list (e.g. "Dependency="). <\fIdependency_list\fR> is of the form <\fItype:job_id[:job_id][,type:job_id[:job_id]]\fR>. Many jobs can share the same dependency and these jobs may even belong to -different users. +different users. .PD .RS .TP @@ -761,7 +761,7 @@ Time the job was last suspended or resumed. \fIUserId\fP \fIGroupId\fP The user and group under which the job was submitted. .TP -NOTE on information displayed for various job states: +NOTE on information displayed for various job states: When you submit a request for the "show job" function the scontrol process makes an RPC request call to slurmctld with a REQUEST_JOB_INFO message type. If the state of the job is PENDING, then it returns @@ -776,8 +776,8 @@ started. \fBSPECIFICATIONS FOR UPDATE COMMAND, STEPS\fR .TP \fIStepId\fP=<job_id>[.<step_id>] -Identify the step to be updated. -If the job_id is given, but no step_id is specified then all steps of +Identify the step to be updated. +If the job_id is given, but no step_id is specified then all steps of the identified job will be modified. This specification is required. .TP @@ -805,10 +805,10 @@ simple node range expressions (e.g. "lx[10\-20]"). This specification is require \fIFeatures\fP=<features> Identify feature(s) to be associated with the specified node. Any previously defined feature(s) will be overwritten with the new value. -Features assigned via \fBscontrol\fR will only persist across the restart -of the slurmctld daemon with the \fI\-R\fR option and state files -preserved or slurmctld's receipt of a SIGHUP. -Update slurm.conf with any changes meant to be persistent across normal +Features assigned via \fBscontrol\fR will only persist across the restart +of the slurmctld daemon with the \fI\-R\fR option and state files +preserved or slurmctld's receipt of a SIGHUP. +Update slurm.conf with any changes meant to be persistent across normal restarts of slurmctld or the execution of \fBscontrol reconfig\fR. .TP @@ -868,10 +868,10 @@ systems. Use Cray tools such as \fIxtprocadmin\fR instead. Identify weight to be associated with specified nodes. This allows dynamic changes to weight associated with nodes, which will be used for the subsequent node allocation decisions. -Weight assigned via \fBscontrol\fR will only persist across the restart -of the slurmctld daemon with the \fI\-R\fR option and state files -preserved or slurmctld's receipt of a SIGHUP. -Update slurm.conf with any changes meant to be persistent across normal +Weight assigned via \fBscontrol\fR will only persist across the restart +of the slurmctld daemon with the \fI\-R\fR option and state files +preserved or slurmctld's receipt of a SIGHUP. +Update slurm.conf with any changes meant to be persistent across normal restarts of slurmctld or the execution of \fBscontrol reconfig\fR. .TP @@ -1050,7 +1050,7 @@ each resource. .TP \fIState\fP=<up|down|drain|inactive> -Specify if jobs can be allocated nodes or queued in this partition. +Specify if jobs can be allocated nodes or queued in this partition. Possible values are "UP", "DOWN", "DRAIN" and "INACTIVE". .RS .TP 10 @@ -1330,7 +1330,7 @@ The meaning of the external sensors information is as follows: .TP \fIExtSensorsJoules\fP -The energy consumed by the node between the last time it was powered on +The energy consumed by the node between the last time it was powered on and the last external sensors plugin node sample, in joules. @@ -1346,7 +1346,7 @@ node sample, in celsius. .PP If the reported value is "n/s" (not supported), the node does not support the -configured \fBExtSensorsType\fR plugin. +configured \fBExtSensorsType\fR plugin. .SH "ENVIRONMENT VARIABLES" .PP diff --git a/doc/man/man5/ext_sensors.conf.5 b/doc/man/man5/ext_sensors.conf.5 index e44021ab8bf2cddbe5b4d1d6135d2839ca48826a..d948f63f19e8843747fafdcf17670a85212ea37b 100644 --- a/doc/man/man5/ext_sensors.conf.5 +++ b/doc/man/man5/ext_sensors.conf.5 @@ -23,7 +23,7 @@ of the command "scontrol reconfigure" unless otherwise noted. .LP The following ext_sensors.conf parameters are defined to control data -collection by the ext_sensors plugins. All of these parameters are optional. +collection by the ext_sensors plugins. All of these parameters are optional. If a parameter is omitted, data collection of the omitted type is disabled. .TP diff --git a/doc/man/man5/slurm.conf.5 b/doc/man/man5/slurm.conf.5 index 763d956c04d56c2b9d408733e9f4c7eaf3e4da43..1d9094e6d9f4974015d0cdd3c1f64518c2202410 100644 --- a/doc/man/man5/slurm.conf.5 +++ b/doc/man/man5/slurm.conf.5 @@ -160,7 +160,7 @@ is "YES". \fBAcctGatherNodeFreq\fR The AcctGather plugins sampling interval for node accounting. For AcctGather plugin values of none, this parameter is ignored. -For all other values this parameter is the number +For all other values this parameter is the number of seconds between node accounting samples. For the acct_gather_energy/rapl plugin, set a value less than 300 because the counters may overflow beyond this rate. @@ -172,12 +172,12 @@ determined by the value of \fBJobAcctGatherFrequency\fR. \fBAcctGatherEnergyType\fR Identifies the plugin to be used for energy consumption accounting. The jobacct_gather plugin and slurmd daemon call this plugin to collect -energy consumption data for jobs and nodes. The collection of energy -consumption data takes place on node level, hence only in case of exclusive -job allocation the energy consumption measurements will reflect the jobs -real consumption. In case of node sharing between jobs the reported consumed -energy per job (through sstat or sacct) will not reflect the real energy -consumed by the jobs. +energy consumption data for jobs and nodes. The collection of energy +consumption data takes place on node level, hence only in case of exclusive +job allocation the energy consumption measurements will reflect the jobs +real consumption. In case of node sharing between jobs the reported consumed +energy per job (through sstat or sacct) will not reflect the real energy +consumed by the jobs. Configurable values at present are: .RS @@ -540,18 +540,18 @@ See \fBProlog and Epilog Scripts\fR for more information. \fBExtSensorsFreq\fR The external sensors plugin sampling interval. If \fBExtSensorsType=ext_sensors/none\fR, this parameter is ignored. -For all other values of \fBExtSensorsType\fR, this parameter is the number +For all other values of \fBExtSensorsType\fR, this parameter is the number of seconds between external sensors samples for hardware components (nodes, -switches, etc.) The default value is zero. This value disables external +switches, etc.) The default value is zero. This value disables external sensors sampling. Note: This parameter does not affect external sensors data collection for jobs/steps. .TP \fBExtSensorsType\fR Identifies the plugin to be used for external sensors data collection. -Slurmctld calls this plugin to collect external sensors data for jobs/steps -and hardware components. In case of node sharing between jobs the reported -values per job/step (through sstat or sacct) may not be accurate. See also +Slurmctld calls this plugin to collect external sensors data for jobs/steps +and hardware components. In case of node sharing between jobs the reported +values per job/step (through sstat or sacct) may not be accurate. See also "man ext_sensors.conf". Configurable values at present are: @@ -714,7 +714,7 @@ system), "jobacct_gather/linux" (for Linux operating system), The default value is "jobacct_gather/none". "jobacct_gather/cgroup" is an experimental plugin for the Linux operating system that uses cgroups to collect accounting statistics. The plugin collects the -following statistics: From the cgroup memory subsystem: memory.usage_in_bytes +following statistics: From the cgroup memory subsystem: memory.usage_in_bytes (reported as 'pages') and rss from memory.stat (reported as 'rss'). From the cgroup cpuacct subsystem: user cpu time and system cpu time. No value is provided by cgroups for virtual memory size ('vsize'). @@ -1086,8 +1086,8 @@ A suspended job will resume execution once the high priority job preempting it completes. The \fBSUSPEND\fR may only be used with the \fBGANG\fR option (the gang scheduler module performs the job resume operation) -and with \fBPreemptType=preempt/partition_prio\fR (the logic to -suspend and resume jobs current only has the data structures to +and with \fBPreemptType=preempt/partition_prio\fR (the logic to +suspend and resume jobs current only has the data structures to support partitions). .RE @@ -1145,7 +1145,7 @@ Applicable only if PriorityType=priority/multifactor. .RS .TP 17 \fBACCRUE_ALWAYS\fR -If set, priority age factor will be increased despite job dependencies +If set, priority age factor will be increased despite job dependencies or holds. .TP \fBTICKET_BASED\fR @@ -1203,7 +1203,7 @@ priority. Supported values are "priority/basic" (jobs are prioritized by order of arrival, also suitable for sched/wiki and sched/wiki2), "priority/multifactor" (jobs are prioritized based upon size, age, fair\-share of allocation, etc). -Also see \fBPriorityFlags\fR for configuration options. +Also see \fBPriorityFlags\fR for configuration options. The default value is "priority/basic". .TP @@ -1319,7 +1319,7 @@ which uses a site\-specific LUA script to track processes which uses Quadrics kernel patch and is the default if "SwitchType=switch/elan" .TP \fBproctrack/sgi_job\fR -which uses SGI's Process Aggregates (PAGG) kernel module, +which uses SGI's Process Aggregates (PAGG) kernel module, see \fIhttp://oss.sgi.com/projects/pagg/\fR for more information .TP \fBproctrack/pgid\fR @@ -1430,7 +1430,7 @@ The maximum size of a process's data segment .TP \fBFSIZE\fR The maximum size of files created. Note that if the user sets FSIZE to less -than the current size of the slurmd.log, job launches will fail with +than the current size of the slurmd.log, job launches will fail with a 'File size limit exceeded' error. .TP \fBMEMLOCK\fR @@ -1770,8 +1770,8 @@ For the Wiki interface to the Moab Cluster Suite \fBSelectType\fR Identifies the type of resource selection algorithm to be used. Changing this value can only be done by restarting the slurmctld daemon -and will result in the loss of all job information (running and pending) -since the job state save format used by each plugin is different. +and will result in the loss of all job information (running and pending) +since the job state save format used by each plugin is different. Acceptable values include .RS .TP @@ -1809,7 +1809,7 @@ The permitted values of \fBSelectTypeParameters\fR depend upon the configured value of \fBSelectType\fR. \fBSelectType=select/bluegene\fR supports no \fBSelectTypeParameters\fR. The only supported option for \fBSelectType=select/linear\fR are -\fBCR_ONE_TASK_PER_CORE\fR and +\fBCR_ONE_TASK_PER_CORE\fR and \fBCR_Memory\fR, which treats memory as a consumable resource and prevents memory over subscription with job preemption or gang scheduling. The following values are supported for \fBSelectType=select/cons_res\fR: @@ -1856,15 +1856,15 @@ Setting a value for \fBDefMemPerCPU\fR is strongly recommended. .TP \fBCR_ONE_TASK_PER_CORE\fR Allocate one task per core by default. -Without this option, by default one task will be allocated per +Without this option, by default one task will be allocated per thread on nodes with more than one \fBThreadsPerCore\fR configured. .TP \fBCR_CORE_DEFAULT_DIST_BLOCK\fR Allocate cores within a node using block distribution by default. This is a pseudo\-best\-fit algorithm that minimizes the number of boards and minimizes the number of sockets (within minimum boards) -used for the allocation. -This default behavior can be overridden specifying a particular +used for the allocation. +This default behavior can be overridden specifying a particular "\-m" parameter with srun/salloc/sbatch. Without this option, cores will be allocated cyclicly across the sockets. .TP @@ -2077,19 +2077,19 @@ The value may not exceed 65533 seconds. .TP \fBSlurmSchedLogFile\fR -Fully qualified pathname of the scheduling event logging file. -The syntax of this parameter is the same as for \fBSlurmctldLogFile\fR. +Fully qualified pathname of the scheduling event logging file. +The syntax of this parameter is the same as for \fBSlurmctldLogFile\fR. In order to configure scheduler logging, set both the \fBSlurmSchedLogFile\fR and \fBSlurmSchedLogLevel\fR parameters. .TP \fBSlurmSchedLogLevel\fR -The initial level of scheduling event logging, similar to the -\fBSlurmctlDebug\fR parameter used to control the initial level of -\fBslurmctld\fR logging. -Valid values for \fBSlurmSchedLogLevel\fR are "0" (scheduler logging -disabled) and "1" (scheduler logging enabled). -If this parameter is omitted, the value defaults to "0" (disabled). +The initial level of scheduling event logging, similar to the +\fBSlurmctlDebug\fR parameter used to control the initial level of +\fBslurmctld\fR logging. +Valid values for \fBSlurmSchedLogLevel\fR are "0" (scheduler logging +disabled) and "1" (scheduler logging enabled). +If this parameter is omitted, the value defaults to "0" (disabled). In order to configure scheduler logging, set both the \fBSlurmSchedLogFile\fR and \fBSlurmSchedLogLevel\fR parameters. The scheduler logging level can be changed dynamically using \fBscontrol\fR. @@ -2308,7 +2308,7 @@ variables and output for the user program. .RS .TP 20 \fBexport NAME=value\fR -Will set environment variables for the task being spawned. +Will set environment variables for the task being spawned. Everything after the equal sign to the end of the line will be used as the value for the environment variable. Exporting of functions is not currently supported. @@ -2318,7 +2318,7 @@ Will cause that line (without the leading "print ") to be printed to the job's standard output. .TP \fBunset NAME\fR -Will clear environment variables for the task being spawned. +Will clear environment variables for the task being spawned. .TP The order of task prolog/epilog execution is as follows: .TP @@ -2450,14 +2450,14 @@ lines (see above), where \fBslurm\fR is the service\-name, should be added. .TP \fBVSizeFactor\fR -Memory specifications in job requests apply to real memory size (also known -as resident set size). It is possible to enforce virtual memory limits for +Memory specifications in job requests apply to real memory size (also known +as resident set size). It is possible to enforce virtual memory limits for both jobs and job steps by limiting their virtual memory to some percentage -of their real memory allocation. The \fBVSizeFactor\fR parameter specifies -the job's or job step's virtual memory limit as a percentage of its real -memory limit. For example, if a job's real memory limit is 500MB and +of their real memory allocation. The \fBVSizeFactor\fR parameter specifies +the job's or job step's virtual memory limit as a percentage of its real +memory limit. For example, if a job's real memory limit is 500MB and VSizeFactor is set to 101 then the job will be killed if its real memory -exceeds 500MB or its virtual memory exceeds 505MB (101 percent of the +exceeds 500MB or its virtual memory exceeds 505MB (101 percent of the real memory limit). The default valus is 0, which disables enforcement of virtual memory limits. The value may not exceed 65533 percent. @@ -2653,7 +2653,7 @@ Also see \fBFeature\fR. \fBPort\fR The port number that the SLURM compute node daemon, \fBslurmd\fR, listens to for work on this particular node. By default there is a single port number -for all \fBslurmd\fR daemons on all compute nodes as defined by the +for all \fBslurmd\fR daemons on all compute nodes as defined by the \fBSlurmdPort\fR configuration parameter. Use of this option is not generally recommended except for development or testing purposes. If multiple \fBslurmd\fR daemons execute on a node this can specify a range of ports @@ -3147,9 +3147,9 @@ The value may not exceed 65533. .TP \fBReqResv\fR Specifies users of this partition are required to designate a reservation -when submitting a job. This option can be useful in restricting usage +when submitting a job. This option can be useful in restricting usage of a partition that may have higher priority or additional resources to be -allowed only within a reservation. +allowed only within a reservation. Possible values are "YES" and "NO". The default value is "NO". @@ -3656,7 +3656,7 @@ See the section \fBFILE AND DIRECTORY PERMISSIONS\fR for information about the various files and directories used by SLURM. .LP It is recommended that the logrotate utility be used to insure that -various log files do not become too large. +various log files do not become too large. This also applies to text files used for accounting, process tracking, and the slurmdbd log if they are used. .LP @@ -3664,43 +3664,43 @@ Here is a sample logrotate configuration. Make appropriate site modifications and save as /etc/logrotate.d/slurm on all nodes. See the \fBlogrotate\fR man page for more details. .LP -## +## .br -# SLURM Logrotate Configuration +# SLURM Logrotate Configuration .br -## +## .br /var/log/slurm/*log { .br - compress + compress .br - missingok + missingok .br - nocopytruncate + nocopytruncate .br - nocreate + nocreate .br - nodelaycompress + nodelaycompress .br - nomail + nomail .br - notifempty + notifempty .br - noolddir + noolddir .br - rotate 5 + rotate 5 .br - sharedscripts + sharedscripts .br - size=5M + size=5M .br - create 640 slurm root + create 640 slurm root .br - postrotate + postrotate .br - /etc/init.d/slurm reconfig + /etc/init.d/slurm reconfig .br - endscript + endscript .br } .br diff --git a/src/api/node_info.c b/src/api/node_info.c index 960e86e10e08604532651415cec76f12c22d5c80..6fec43412ffb2eb8d094408a72490ab11cee3579 100644 --- a/src/api/node_info.c +++ b/src/api/node_info.c @@ -241,7 +241,7 @@ slurm_sprint_node_table (node_info_t * node_ptr, snprintf(tmp_line, sizeof(tmp_line), "NodeAddr=%s NodeHostName=%s", node_ptr->node_addr, node_ptr->node_hostname); - xstrcat(out, tmp_line); + xstrcat(out, tmp_line); if (one_liner) xstrcat(out, " "); else @@ -312,7 +312,7 @@ slurm_sprint_node_table (node_info_t * node_ptr, "LowestJoules=%u ConsumedJoules=%u", node_ptr->energy->current_watts, node_ptr->energy->base_watts, - node_ptr->energy->consumed_energy); + node_ptr->energy->consumed_energy); xstrcat(out, tmp_line); if (one_liner) xstrcat(out, " "); @@ -322,8 +322,8 @@ slurm_sprint_node_table (node_info_t * node_ptr, /****** external sensors Line ******/ if (node_ptr->ext_sensors->consumed_energy == NO_VAL) snprintf(tmp_line, sizeof(tmp_line), "ExtSensorsJoules=n/s "); - else - snprintf(tmp_line, sizeof(tmp_line), "ExtSensorsJoules=%u ", + else + snprintf(tmp_line, sizeof(tmp_line), "ExtSensorsJoules=%u ", node_ptr->ext_sensors->consumed_energy); xstrcat(out, tmp_line); if (node_ptr->ext_sensors->current_watts == NO_VAL)