diff --git a/doc/html/accounting.shtml b/doc/html/accounting.shtml index 6c2910f14b93bb7b6e2965310e62efce6245b5a4..3cfa3e3e4cafa6bbbe012a965aa946c02c672f1f 100644 --- a/doc/html/accounting.shtml +++ b/doc/html/accounting.shtml @@ -267,7 +267,8 @@ SlurmDBD executes</li> <li><b>AccountingStoragePass</b>: If using SlurmDBD with a second MUNGE daemon, store the pathname of the named socket used by MUNGE to provide -enterprise-wide. Otherwise the default MUNGE daemon will be used.</li> +enterprise-wide authentication (i.e. /var/run/munge/moab.socket.2). Otherwise +the default MUNGE daemon will be used. </li> <li><b>AccountingStoragePort</b>: The network port that SlurmDBD accepts communication on.</li> @@ -635,9 +636,18 @@ options are available: privileges to this user. Valid options are <ul> <li>None</li> -<li>Operator: can add, modify,and remove users, and add other operators)</li> -<li>Admin: In addition to operator privileges these users can add, modify, -and remove accounts and clusters</li> +<li>Operator: can add, modify, and remove any database object (user, +account, etc), and add other operators +<br>On a SlurmDBD served slurmctld these users can<br> +<ul> +<li>View information that is blocked to regular uses by a PrivateData + flag</li> +<li>Create/Alter/Delete Reservations</li> +</ul> +</li> +<li>Admin: These users have the same level of privileges as an +operator in the database. They can also alter anything on a served +slurmctld as if they were the slurm user or root.</li> </ul> <li><b>Cluster=</b> Only add to accounts on these clusters (default is all clusters)</li> @@ -680,6 +690,13 @@ in the <a href="resource_limits.html">Resource Limits</a> document.</p> jobs will be allowed to run. </li> +<li><b>GrpCPURunMins=</b> Maximum number of CPU minutes all jobs + running with this association and its children can run at the same + time. This takes into consideration time limit of running jobs. If + the limit is reached no new jobs are started until other jobs finish + to allow time to free up. +</li> + <li><b>GrpCPUs=</b> The total count of cpus able to be used at any given time from jobs running from this association and its children. If this limit is reached new jobs will be queued but only allowed to diff --git a/doc/html/qos.shtml b/doc/html/qos.shtml index 9f34b8d843f414d0370394fc05e5960bfd253dcc..013fbdc02dc22ef643223019e68f79944269efef 100644 --- a/doc/html/qos.shtml +++ b/doc/html/qos.shtml @@ -10,6 +10,7 @@ the job in three ways: <li> <a href=#priority>Job Scheduling Priority</a> <li> <a href=#preemption>Job Preemption</a> <li> <a href=#limits>Job Limits</a> +<li> <a href=#qos_other>Other QOS Options</a> </ul> <P> The QOS's are defined in the SLURM database using the <i>sacctmgr</i> @@ -74,23 +75,77 @@ will take precedence over the association's limits. QOS</P> <UL> -<LI><b>GrpCPUMins</b> Maximum number of CPU*minutes all jobs with this QOS can run. -<LI><b>MaxCPUMinsPerJob</b> Maximum number of CPU*minutes any job with this QOS can run. <LI><b>GrpCpus</b> Maximum number of CPU's all jobs with this QOS can be allocated. -<LI><b>MaxCpusPerJob</b> Maximum number of CPU's any job with this QOS can be allocated. -<LI><b>MaxCpusPerUser</b> Maximum number of CPU's any user with this QOS can be allocated. +<LI><b>GrpCPUMins</b> A hard limit of cpu minutes to be used by jobs + running from this QOS. If this limit is reached all jobs running in + this group will be killed, and no new jobs will be allowed to run. +<LI><b>GrpCPURunMins</b> Maximum number of CPU minutes all jobs + running with this QOS can run at the same time. This takes into + consideration time limit of running jobs. If the limit is reached + no new jobs are started until other jobs finish to allow time to + free up. <LI><b>GrpJobs</b> Maximum number of jobs that can run with this QOS. -<LI><b>MaxJobsPerUser</b> Maximum number of jobs a user can run with this QOS. <LI><b>GrpMemory</b> Maximum amount of memory (MB) all jobs with this QOS can be allocated. <LI><b>GrpNodes</b> Maximum number of nodes that can be allocated to all jobs with this QOS. -<LI><b>MaxNodesPerJob</b> Maximum number of nodes that can be allocated to any job with this QOS. -<LI><b>MaxNodesPerUser</b> Maximum number of nodes that can be allocated to any user with this QOS. <LI><b>GrpSubmitJobs</b> Maximum number of jobs with this QOS that can be in the system (no matter what state). -<LI><b>MaxSubmitJobsPerUser</b> Maximum number of jobs with this QOS that can be in the system. <LI><b>GrpWall</b> Wall clock limit for all jobs running with this QOS. +<LI><b>MaxCpusPerJob</b> Maximum number of CPU's any job with this QOS can be allocated. +<LI><b>MaxCPUMinsPerJob</b> Maximum number of CPU*minutes any job with this QOS can run. +<LI><b>MaxNodesPerJob</b> Maximum number of nodes that can be allocated to any job with this QOS. <LI><b>MaxWallDurationPerJob</b> Wall clock limit for any jobs running with this QOS. +<LI><b>MaxCpusPerUser</b> Maximum number of CPU's any user with this QOS can be allocated. +<LI><b>MaxJobsPerUser</b> Maximum number of jobs a user can run with this QOS. +<LI><b>MaxNodesPerUser</b> Maximum number of nodes that can be allocated to any user with this QOS. +<LI><b>MaxSubmitJobsPerUser</b> Maximum number of jobs with this QOS that can be in the system. </UL> +<a name=qos_other> +<h2>Other QOS Options</h2></a> +<ul> +<li><b>Flags</b> Used by the slurmctld to override or enforce certain + characteristics. Valid options are + +<ul> +<li><b>EnforceUsageThreshold</b> If set, and the QOS also has a UsageThreshold, +any jobs submitted with this QOS that fall below the UsageThreshold +will be held until their Fairshare Usage goes above the Threshold. + +<li><b>NoReserve</b> If this flag is set and backfill scheduling is used, +jobs using this QOS will not reserve resources in the backfill +schedule's map of resources allocated through time. This flag is +intended for use with a QOS that may be preempted by jobs associated +with all other QOS (e.g use with a "standby" QOS). If the allocated is +used with a QOS which can not be preempted by all other QOS, it could +result in starvation of larger jobs. + +<li><b>PartitionMaxNodes</b> If set jobs using this QOS will be able to +override the requested partition's MaxNodes limit. + +<li><b>PartitionMinNodes</b> If set jobs using this QOS will be able to +override the requested partition's MinNodes limit. + +<li><b>PartitionTimeLimit</b> If set jobs using this QOS will be able to +override the requested partition's TimeLimit. + +<li><b>RequiresReservaton</b> If set jobs using this QOS must designate a +reservation when submitting a job. This option can be useful in +restricting usage of a QOS that may have greater preemptive capability +or additional resources to be allowed only within a reservation. +</ul> + +<li><b>GraceTime</b> Preemption grace time to be extended to a job + which has been selected for preemption. +<li><b>UsageFactor</b> Usage factor when running with this QOS + (i.e. .5 would make it use only half the time as normal in + accounting and 2 would make it use twice as much.) +<li><b>UsageThreshold</b> +A float representing the lowest fairshare of an association allowable +to run a job. If an association falls below this threshold and has +pending jobs or submits new jobs those jobs will be held until the +usage goes back above the threshold. Use <i>sshare</i> to see current +shares on the system. +</ul> + <h2>Configuration</h2> <P> To summarize the above, the QOS's and their associated limits are diff --git a/doc/man/man1/sacctmgr.1 b/doc/man/man1/sacctmgr.1 index c63b4ed87ba6a42f34e3d0a733084d7ed4f61187..730dbd17167e374a3068b25d81db7b853ce43a03 100644 --- a/doc/man/man1/sacctmgr.1 +++ b/doc/man/man1/sacctmgr.1 @@ -270,6 +270,14 @@ is reached all associated jobs running will be killed and all future jobs submitted with associations in the group will be delayed until they are able to run inside the limit. +.TP +\fIGrpCPURunMins\fP=<max cpu run minutes> +Maximum number of CPU minutes all jobs +running with this association and all it's child associations can run +at the same time. This takes into consideration time limit of running +jobs. If the limit is reached no new jobs are started until other +jobs finish to allow time to free up. + .TP \fIGrpCPUs\fP=<max cpus> Maximum number of CPUs running jobs are able to be allocated in aggregate for @@ -564,6 +572,14 @@ Maximum number of CPU minutes running jobs are able to be allocated in aggregate for this association and all associations which are children of this association. +.TP +\fIGrpCPURunMins\fP +Maximum number of CPU minutes all jobs +running with this association and all it's child associations can run +at the same time. This takes into consideration time limit of running +jobs. If the limit is reached no new jobs are started until other +jobs finish to allow time to free up. + .TP \fIGrpCPUs\fP Maximum number of CPUs running jobs are able to be allocated in aggregate for @@ -960,6 +976,13 @@ selected for preemption. Maximum number of CPU minutes running jobs are able to be allocated in aggregate for this QOS. +.TP +\fIGrpCPURunMins\fP Maximum number of CPU minutes all jobs +running with this QOS can run at the same time. This takes into +consideration time limit of running jobs. If the limit is reached +no new jobs are started until other jobs finish to allow time to +free up. + .TP \fIGrpCPUs\fP Maximum number of CPUs running jobs are able to be allocated in aggregate for diff --git a/doc/man/man5/slurmdbd.conf.5 b/doc/man/man5/slurmdbd.conf.5 index 1433f858a0700ff0bb21537c16292f3057ddf3ef..5177056e0dcf280a73de64bdcc1e25bd874dee0d 100644 --- a/doc/man/man5/slurmdbd.conf.5 +++ b/doc/man/man5/slurmdbd.conf.5 @@ -29,8 +29,8 @@ The overall configuration parameters available include: \fBArchiveDir\fR If ArchiveScript is not set the slurmdbd will generate a file that can be read in anytime with sacctmgr load filename. This directory is where the -file will be placed archive has ran. Default is /tmp. The format for this -files name is +file will be placed after a purge event has happened and archive for that +element is set to true. Default is /tmp. The format for this files name is .na $ArchiveDir/$ClusterName_$ArchiveObject_archive_$BeginTimeStamp_$endTimeStamp .ad @@ -117,8 +117,8 @@ This may be fine for testing purposes, but "http://www.theether.org/authd/" for more information). "auth/munge" indicates that LLNL's Munge system is to be used (this is the best supported authentication mechanism for SLURM, -see "http://home.gna.org/munge/" for more information). -SlurmDbd must be terminated prior to changing the value of \fBAuthType\fR +see "https://code.google.com/p/munge/" for more information). +SlurmDBD must be terminated prior to changing the value of \fBAuthType\fR and later restarted. .TP