diff --git a/NEWS b/NEWS index 34a3050cf20fa91c4fe03d4c05de5fc2e2c6b6b5..2f1d12b78e46244b292e4c0cbe7790ed4fab9912 100644 --- a/NEWS +++ b/NEWS @@ -111,6 +111,13 @@ documents those changes that are of interest to users and administrators. -- job_submit/lua - expose admin_comment field. -- Allow AdminComment field to be set by the job_submit plugin. -- Allow AdminComment field to be changed by any Administrator. + -- Fix key words in jobcomp select. + -- MYSQL - Streamline job flush sql when doing a clean start on the slurmctld. + -- Fix potential infinite loop when talking to the DBD when shutting down + the slurmctld. + -- Fix MCS filter. + -- Make it so pmix can be included in the plugin rpm without having to + specify --with-pmix. * Changes in Slurm 17.02.1-2 ============================ diff --git a/doc/html/mcs.shtml b/doc/html/mcs.shtml index fb003b146706d534ee179ce05e8989ab2fa3d1ac..cc9da0391c8ae3d97435f5f76caa2fe9ecaefd06 100644 --- a/doc/html/mcs.shtml +++ b/doc/html/mcs.shtml @@ -2,60 +2,168 @@ <h1>Multi-Category Security (MCS) Guide</h1> -<h2>MCS Overview</h2> -<p>Slurm can be configured to associate a category label to jobs and optionally -ensure that nodes can only be shared among jobs having the same category label. -Job and node information can optionally be filtered based on their MCS labels -in coordination with the PrivateData option: -only users having access to the associated MCS label will have access -to the information. The MCS plugin is responsible for these features.</p> - -<p>Users may either request a particular category label for a job, -or use the default value generated by the MCS plugin implementation. -The MCS plugin is responsible for checking that the user provided label -is valid for the user.</p> - -<p>MCS labels can be either enforced or specified on demand on jobs. -When set to ondemand, MCS label will only be set when users specify a valid one -at submission time. -It is the responsibility of the MCS plugin to validate the correctness -of the requested labels. -When enforced, the MCS plugin implementation will always associate -the default MCS label of users to their jobs unless users specify another -valid one.</p> - -<p>The selection of nodes can be filtered on MCS labels : -on demand (ondemand) or always (select) or never (noselect). -User can force the filter with --exclusive=mcs option (except if noselect mode).</p> - -<p>The MCS category label (also called MCS label) for a job is shown in -squeue with the format option mcslabel. -The node's inherited MCS label is shown with scontrol show nodes. -The sview command can also be used to see those MCS labels.</p> - -<p>The following configuration parameters are available:</p> +<h2>Overview</h2> +<p>The MCS Plugin is meant to extend the current Slurm functionality related to +job nodes <b>exclusivity</b> and job/node information display <b>privacy</b>. + +<p>Slurm <b>OverSubscribe</b> option controls the ability of a partition to +execute more than one job at a time on each resource, no matter "what type" +of job. Slurm job submission clients can also use the <b>--exclusive</b> and +<b>--oversubscribe</b> parameters to request how the job can be +<a href="cons_res_share.html">shared</a>. In Slurm 15.08, <b>ExclusiveUser</b> +slurm.conf parameter and the --exclusive=<b>user</b> client parameter value +were introduced, extending the original exclusivity functionality. With this +parameter enabled, the "type of job" now matters when considering exclusivity, +so that now jobs can share resources <b>based on</b> job users, meaning that +only jobs whose user is the same can share resources. This indeed added a new +dimension to how Slurm manages exclusivity. With the introduction of the MCS +Plugin, Slurm can now be configured to associate a <b>MCS_label</b> to jobs +and optionally ensure that nodes can only be shared among jobs having the same +label. This adds even more degrees of freedom to how Slurm manages exclusivity, +providing end users with way much more flexibility on this area.</p> + +<p>Slurm also has the <b>PrivateData</b> <a href="slurm.conf.html">slurm.conf</a> +parameter, which is used to control what type of information is hidden from +regular users. Similarly to the exclusivity property, the MCS Plugin also +extends the <b>privacy</b> one by filtering jobs and/or nodes information +based on the users access to their <b>MCS_label</b>. This means that privacy +is now less restrictive, and information is not just hidden or not to regular +users, but now it is filtered <b>depending</b> on these configurable/requestable +labels in coordination with the PrivateData option. + +<h2>Configuration</h2> +<p>Two parameters are currently available to configure MCS: <b>MCSPlugin</b> +and <b>MCSParameters</b>. + +<ul> +<li><b>MCSPlugin</b> Specifies which plugin should be used. Plugins are +mutually exclusive, and the type of label to be associated depends +on the loaded plugin.</li><br> +<ul> +<li><b>mcs/none</b> is the default and disables MCS labels and functionality. +</li><br> +<li><b>mcs/account</b> MCS labels can only take a value equal to the job's +--account. NOTE: this option requires enabling of accounting. +</li><br> +<li><b>mcs/group</b> MCS labels can only take a value equal to the job's user +group. +</li><br> +<li><b>mcs/user</b> MCS labels can only take a value equal to the username of +the job's --uid. +</li><br> +</ul> +</ul> + +<p>MCS_labels on jobs can be displayed through 'squeue' with the format option +<b>mcslabel</b> or through 'scontrol show job'. Nodes also acquire a MCS_label, +which is inherited from the allocated job's MCS_label. The nodes label can be +displayed with 'scontrol show nodes'. The 'sview' command can also be used to +see these MCS_labels.</p> + +<p>Users may either request a particular category label for a job (through the +<b>--mcs-label</b> option), or use the default value generated by the specific +MCS plugin implementation. So labels can be configured to be enforced or set +on demand, and the specific MCS Plugin is responsible for checking the validity +of these labels. When enforced, the MCS Plugin implementation will always +associate a MCS label to a submited job, either the default value or the one +requested by the user (if it's considered correct).</p> + +<p>The selection (exclusivity) of nodes can be filtered on MCS labels either +on demand (ondemand) or always (select) or never (noselect). User can force the +filter with <b>--exclusive=mcs</b> option (except if noselect mode is enabled). +</p> + +<p>Label enforcement, node selection filtering policy, private data based on +labels and a list of user groups allowed to be mapped to MCS labels can be +configured through MCSParameters option.</p> + <ul> -<li><b>MCSPlugin:</b> -Specifies which plugin should be used. +<li><b>MCSParameters</b> Specifies the options to pass to the specific MCS +Plugin implementation. Options should satisfy the following expression: +<br> +"[ondemand|enforced][,noselect|select|ondemandselect][,privatedata]:[mcs_plugin_parameters]". +The defaults are "ondemand,ondemandselect" and no privatedata. </li> -<li><b>MCSParameters:</b> -Specifies options to pass to the MCS plugin implementation. -The string is of the form:<br> -"[ondemand|enforced][,noselect|,select|,ondemandselect][,privatedata]:[mcs_plugin_parameters]"<br> -The defaults are "ondemand,ondemandselect" and no privatedata</li> +<br> <ul> -<li>[ondemand|enforced]: set MCS label on jobs either on demand (using --mcs-label option) or always</li> -<li>[,noselect|,select|,ondemandselect]: select nodes with filter on MCS label : never, -always or on demand (using --exclusive=mcs option)</li> -<li>[,privatedata]: accordingly with privatedata option:<br> -if privatedata and privatedata=jobs: jobs informations are filtered based on their MCS labels<br> -if privatedata and privatedata=nodes: nodes informations are filtered based on their MCS labels<br> -Only mcs/group is currently supporting the mcs_plugin_parameters option. -It can be used to specify the list of user groups (separated by |) that can be mapped -to MCS labels by the mcs/group plugin.</li> +<li><b>ondemand|enforced</b> set MCS label on jobs either on demand (using +--mcs-label option) or always</li> +<li><b>noselect|select|ondemandselect</b> select nodes with filter on MCS +label: never, always or on demand (using --exclusive=mcs option)</li> +<li><b>privatedata</b> accordingly with PrivateData specific option:<br> +<ul> +<li> +if MCS' privatedata and PrivateData's privatedata=jobs, jobs information is +filtered based on their MCS labels<br> +</li> +<li> +if MCS' privatedata and PrivateData's privatedata=nodes, nodes informations is +filtered based on their MCS labels<br> +</li> </ul> +<li><b>mcs_plugin_parameters</b> Only mcs/group is currently supporting this +option. It can be used to specify the list of user groups (separated by the +'<b>|</b>' character) that are allowed to be mapped to MCS labels by the +mcs/group plugin.</li> </li> </ul> +</li> +</ul> + +Different requests and configurations lead to different combinations of +use-cases. The following table is intended to help understand the end user +the expected behavior (related to exclusivity) for a subset of these use-cases: +<br> +<br> + +<table style="page-break-inside: avoid; font-family: Arial,Helvetica,sans-serif;" border="1" bordercolor="#000000" cellpadding="6" cellspacing="0" width="100%"> + <tr> + <td bgcolor="#e0e0e0"> + Node filtering: + </td> + <td bgcolor="#e0e0e0"> + Label enforcement: <b>ondemand</b><br> + (MCS_label set only if requested.) + </td> + <td bgcolor="#e0e0e0"> + Label enforcement: <b>enforced</b><br> + (MCS_label is mandatory.) + </td> + </tr> + <tr> + <td bgcolor="#e0e0e0"> + <b>noselect</b> + </td> + <td> + No filter on nodes even if --excluisve=mcs requested. + </td> + <td> + No filter on nodes even if --excluisve=mcs requested. + </td> + </tr> + <tr> + <td bgcolor="#e0e0e0"> + <b>select</b> + </td> + <td> + Filter on nodes <b>only</b> if job MCS_label is set. + </td> + <td> + Always filter on nodes. + </td> + </tr> + <tr> + <td bgcolor="#e0e0e0"> + <b>ondemandselect</b> + </td> + <td> + Filter on nodes <b>only</b> if --exclusive=mcs. + </td> + <td> + Filter on nodes <b>only</b> if --exclusive=mcs. + </td> + </tr> +</table> <h2>Some examples</h2> diff --git a/slurm.spec b/slurm.spec index 6bb3014d7eca1162920c2b8794a2a7eec40525e7..41a4957bebf54e0844e13e7ddd3b67fd350c73d0 100644 --- a/slurm.spec +++ b/slurm.spec @@ -16,7 +16,6 @@ # --with cray_alps %_with_cray_alps 1 build for a Cray system with ALPS # --with cray_network %_with_cray_network 1 build for a non-Cray system with a Cray network # --without debug %_without_debug 1 don't compile with debugging symbols -# --with pmix %_with_pmix 1 build pmix support # --with lua %_with_lua 1 build Slurm lua bindings (proctrack only for now) # --without munge %_without_munge path don't build auth-munge RPM # --with mysql %_with_mysql 1 require mysql/mariadb support @@ -46,7 +45,6 @@ %slurm_without_opt cray_network %slurm_without_opt salloc_background %slurm_without_opt multiple_slurmd -%slurm_without_opt pmix # These options are only here to force there to be these on the build. # If they are not set they will still be compiled if the packages exist. @@ -397,7 +395,7 @@ according to the Slurm %{?slurm_with_salloc_background:--enable-salloc-background} \ %{!?slurm_with_readline:--without-readline} \ %{?slurm_with_multiple_slurmd:--enable-multiple-slurmd} \ - %{?slurm_with_pmix:--with-pmix=%{?with_pmix_dir}} \ + %{?slurm_with_pmix:--with-pmix=%{?slurm_with_pmix}} \ %{?with_freeipmi:--with-freeipmi=%{?with_freeipmi}}\ %{?slurm_with_shared_libslurm:--with-shared-libslurm}\ %{?with_cflags} \ @@ -666,6 +664,12 @@ test -f $RPM_BUILD_ROOT/%{_libdir}/slurm/launch_aprun.so && echo %{_libdir}/slurm/launch_aprun.so >> $LIST test -f $RPM_BUILD_ROOT/%{_libdir}/slurm/mpi_mvapich.so && echo %{_libdir}/slurm/mpi_mvapich.so >> $LIST +test -f $RPM_BUILD_ROOT/%{_libdir}/slurm/mpi_pmix.so && + echo %{_libdir}/slurm/mpi_pmix.so >> $LIST +test -f $RPM_BUILD_ROOT/%{_libdir}/slurm/mpi_pmix_v1.so && + echo %{_libdir}/slurm/mpi_pmix_v1.so >> $LIST +test -f $RPM_BUILD_ROOT/%{_libdir}/slurm/mpi_pmix_v2.so && + echo %{_libdir}/slurm/mpi_pmix_v2.so >> $LIST test -f $RPM_BUILD_ROOT/%{_libdir}/slurm/node_features_knl_cray.so && echo %{_libdir}/slurm/node_features_knl_cray.so >> $LIST test -f $RPM_BUILD_ROOT/%{_libdir}/slurm/node_features_knl_generic.so && @@ -881,9 +885,6 @@ rm -rf $RPM_BUILD_ROOT %{_libdir}/slurm/mpi_pmi2.so %endif %{_libdir}/slurm/mpi_none.so -%if %{slurm_with pmix} -%{_libdir}/slurm/mpi_pmix.so -%endif %{_libdir}/slurm/power_none.so %{_libdir}/slurm/preempt_job_prio.so %{_libdir}/slurm/preempt_none.so diff --git a/src/common/slurm_mcs.c b/src/common/slurm_mcs.c index e1ae551ac7fcc7ce1620a155ba40f08e33896e40..cd4a692f3128cc469c51a924f6b97f83087aff6c 100644 --- a/src/common/slurm_mcs.c +++ b/src/common/slurm_mcs.c @@ -79,20 +79,20 @@ extern int slurm_mcs_init(void) if (init_run && g_mcs_context) return retval; - slurm_mutex_lock(&g_mcs_context_lock); + slurm_mutex_lock(&g_mcs_context_lock); if (g_mcs_context) goto done; xfree(mcs_params); xfree(mcs_params_common); xfree(mcs_params_specific); - type = slurm_get_mcs_plugin(); mcs_params = slurm_get_mcs_plugin_params(); - if (mcs_params == NULL) { + + if (mcs_params == NULL) info("No parameter for mcs plugin, default values set"); - } else { + else { mcs_params_common = xstrdup(mcs_params); sep = xstrchr(mcs_params_common, ':'); if (sep != NULL) { @@ -101,6 +101,7 @@ extern int slurm_mcs_init(void) *sep = '\0'; } } + _slurm_mcs_check_and_load_privatedata(mcs_params_common); _slurm_mcs_check_and_load_enforced(mcs_params_common); _slurm_mcs_check_and_load_select(mcs_params_common); @@ -113,8 +114,8 @@ extern int slurm_mcs_init(void) retval = SLURM_ERROR; goto done; } - init_run = true; + init_run = true; done: slurm_mutex_unlock(&g_mcs_context_lock); xfree(type); @@ -140,7 +141,7 @@ extern int slurm_mcs_fini(void) extern int slurm_mcs_reconfig(void) { slurm_mcs_fini(); - return(slurm_mcs_init()); + return slurm_mcs_init(); } /* slurm_mcs_get_params_specific @@ -157,28 +158,32 @@ extern char *slurm_mcs_get_params_specific(void) static int _slurm_mcs_check_and_load_enforced(char *params) { label_strict_enforced = false; + if ((params != NULL) && xstrcasestr(params, "enforced")) label_strict_enforced = true; else info("mcs: MCSParameters = %s. ondemand set.", params); + return SLURM_SUCCESS; } static int _slurm_mcs_check_and_load_select(char *params) { select_value = MCS_SELECT_ONDEMANDSELECT; + if (params == NULL) { return SLURM_SUCCESS; } - if (xstrcasestr(params, "noselect")) { + + if (xstrcasestr(params, "noselect")) select_value = MCS_SELECT_NOSELECT; - } else if (xstrcasestr(params, "ondemandselect")) { + else if (xstrcasestr(params, "ondemandselect")) select_value = MCS_SELECT_ONDEMANDSELECT; - } else if (xstrcasestr(params, "select")) { + else if (xstrcasestr(params, "select")) select_value = MCS_SELECT_SELECT; - } else { + else info("mcs: MCSParameters = %s. ondemandselect set.", params); - } + return SLURM_SUCCESS; } @@ -188,11 +193,12 @@ static int _slurm_mcs_check_and_load_privatedata(char *params) private_data = false; return SLURM_SUCCESS; } - if (xstrcasestr(params, "privatedata")) { + + if (xstrcasestr(params, "privatedata")) private_data = true; - } else { + else private_data = false; - } + return SLURM_SUCCESS; } @@ -201,18 +207,19 @@ extern int slurm_mcs_reset_params(void) label_strict_enforced = false; select_value = MCS_SELECT_ONDEMANDSELECT; private_data = false; + return SLURM_SUCCESS; } extern int slurm_mcs_get_enforced(void) { - return(label_strict_enforced); + return label_strict_enforced; } extern int slurm_mcs_get_select(struct job_record *job_ptr) { if ((select_value == MCS_SELECT_SELECT) || - ((select_value == MCS_SELECT_ONDEMANDSELECT) && + ((select_value == MCS_SELECT_ONDEMANDSELECT) && job_ptr->details && (job_ptr->details->whole_node == WHOLE_NODE_MCS))) return 1; @@ -237,5 +244,6 @@ extern int mcs_g_check_mcs_label(uint32_t user_id, char *mcs_label) { if (slurm_mcs_init() < 0) return 0; - return (int) (*(ops.check))(user_id, mcs_label); + + return (int)(*(ops.check))(user_id, mcs_label); } diff --git a/src/common/slurm_persist_conn.c b/src/common/slurm_persist_conn.c index ee749c08890c404e30db7574c26ba0983e0cf984..0ddd859b8d74ae955f955ac180af5caf24f1dd73 100644 --- a/src/common/slurm_persist_conn.c +++ b/src/common/slurm_persist_conn.c @@ -914,11 +914,12 @@ extern Buf slurm_persist_recv_msg(slurm_persist_conn_t *persist_conn) return buffer; endit: - /* Close it since we abondoned it. If the connection does still exist + /* Close it since we abandoned it. If the connection does still exist * on the other end we can't rely on it after this point since we didn't * listen long enough for this response. */ - if (persist_conn->flags & PERSIST_FLAG_RECONNECT) + if (!(*persist_conn->shutdown) && + persist_conn->flags & PERSIST_FLAG_RECONNECT) slurm_persist_conn_reopen(persist_conn, true); return NULL; diff --git a/src/plugins/accounting_storage/mysql/as_mysql_job.c b/src/plugins/accounting_storage/mysql/as_mysql_job.c index 06858d18f5cd76f94aae86759ef34077ae14644c..a00a88567491175c069d8b6aaadbf3da9fa2edf1 100644 --- a/src/plugins/accounting_storage/mysql/as_mysql_job.c +++ b/src/plugins/accounting_storage/mysql/as_mysql_job.c @@ -1554,20 +1554,21 @@ extern int as_mysql_flush_jobs_on_cluster( if (state == JOB_SUSPENDED) { if (suspended_char) xstrfmtcat(suspended_char, - " || job_db_inx=%s", row[0]); + ", %s", row[0]); else - xstrfmtcat(suspended_char, "job_db_inx=%s", + xstrfmtcat(suspended_char, "job_db_inx in (%s", row[0]); } if (id_char) - xstrfmtcat(id_char, " || job_db_inx=%s", row[0]); + xstrfmtcat(id_char, ", %s", row[0]); else - xstrfmtcat(id_char, "job_db_inx=%s", row[0]); + xstrfmtcat(id_char, "job_db_inx in (%s", row[0]); } mysql_free_result(result); if (suspended_char) { + xstrfmtcat(suspended_char, ")"); xstrfmtcat(query, "update \"%s_%s\" set " "time_suspended=%ld-time_suspended " @@ -1588,6 +1589,7 @@ extern int as_mysql_flush_jobs_on_cluster( xfree(suspended_char); } if (id_char) { + xstrfmtcat(id_char, ")"); xstrfmtcat(query, "update \"%s_%s\" set state=%d, " "time_end=%ld where %s;", diff --git a/src/plugins/jobcomp/mysql/mysql_jobcomp_process.c b/src/plugins/jobcomp/mysql/mysql_jobcomp_process.c index 6ebe3fad8e4c2c0949170a7effe36702063b9dd8..8f9c92aa98c6e7f903074f2a95142d4911a7541b 100644 --- a/src/plugins/jobcomp/mysql/mysql_jobcomp_process.c +++ b/src/plugins/jobcomp/mysql/mysql_jobcomp_process.c @@ -107,7 +107,7 @@ extern List mysql_jobcomp_process_get_jobs(slurmdb_job_cond_t *job_cond) while(jobcomp_table_fields[i].name) { if (i) xstrcat(tmp, ", "); - xstrcat(tmp, jobcomp_table_fields[i].name); + xstrfmtcat(tmp, "`%s`", jobcomp_table_fields[i].name); i++; } diff --git a/src/plugins/mcs/account/mcs_account.c b/src/plugins/mcs/account/mcs_account.c index 3496742c5361d060d01959671ff271707da45ebb..e8c78734523fdf476e7164a7e8dae16496861005 100644 --- a/src/plugins/mcs/account/mcs_account.c +++ b/src/plugins/mcs/account/mcs_account.c @@ -98,21 +98,21 @@ extern int mcs_p_set_mcs_label(struct job_record *job_ptr, char *label) { int rc = SLURM_SUCCESS; xfree(job_ptr->mcs_label); + if (label != NULL) { /* test label param */ - if (!xstrcmp(label, job_ptr->account)) { + if (!xstrcmp(label, job_ptr->account)) job_ptr->mcs_label = xstrdup(job_ptr->account); - } else { + else rc = SLURM_ERROR; - } } else { if ((slurm_mcs_get_enforced() == 0) && job_ptr->details && - (job_ptr->details->whole_node != WHOLE_NODE_MCS)) { + (job_ptr->details->whole_node != WHOLE_NODE_MCS)) ; - } else { + else job_ptr->mcs_label = xstrdup(job_ptr->account); - } } + return rc; } @@ -127,14 +127,16 @@ extern int mcs_p_check_mcs_label(uint32_t user_id, char *mcs_label) memset(&assoc_rec, 0, sizeof(slurmdb_assoc_rec_t)); assoc_rec.acct = mcs_label; assoc_rec.uid = user_id; + if (mcs_label != NULL) { if (!assoc_mgr_fill_in_assoc(acct_db_conn, &assoc_rec, - accounting_enforce, - (slurmdb_assoc_rec_t **) - NULL, false)) + accounting_enforce, + (slurmdb_assoc_rec_t **) NULL, + false)) rc = SLURM_SUCCESS; else rc = SLURM_ERROR; } + return rc; } diff --git a/src/plugins/mcs/group/mcs_group.c b/src/plugins/mcs/group/mcs_group.c index bbf3e84c9ceba5f2e9fba63f36ba69fe77766af2..7dc77b2ff2e9dbfd824b4cb446b99237a8519ce8 100644 --- a/src/plugins/mcs/group/mcs_group.c +++ b/src/plugins/mcs/group/mcs_group.c @@ -92,12 +92,15 @@ extern int init(void) { debug("%s loaded", plugin_name); mcs_params_specific = slurm_mcs_get_params_specific(); + if (_check_and_load_params() != 0) { - info("mcs: plugin warning : no group in %s", mcs_params_specific); + info("mcs: plugin warning : no group in %s", + mcs_params_specific); xfree(mcs_params_specific); /* no need to check others options : default values used */ return SLURM_SUCCESS; } + xfree(mcs_params_specific); return SLURM_SUCCESS; } @@ -125,6 +128,7 @@ static int _get_user_groups(uint32_t user_id, uint32_t group_id, user_name = uid_to_string((uid_t) user_id); *ngroups = max_groups; rc = getgrouplist(user_name, (gid_t) group_id, groups, ngroups); + if (rc < 0) { error("getgrouplist(%s): %m", user_name); rc = SLURM_ERROR; @@ -132,6 +136,7 @@ static int _get_user_groups(uint32_t user_id, uint32_t group_id, *ngroups = rc; rc = SLURM_SUCCESS; } + xfree(user_name); return rc; } @@ -154,15 +159,17 @@ static int _check_and_load_params(void) slurm_mcs_reset_params(); return SLURM_ERROR; } + n = strlen(mcs_params_specific); for (i = 0 ; i < n ; i++) { if (mcs_params_specific[i] == '|') nb_mcs_groups = nb_mcs_groups + 1; } + if (nb_mcs_groups == 0) { /* no | in param : just one group */ if (mcs_params_specific != NULL) { - if ( gid_from_string(mcs_params_specific, &gid ) != 0 ) { + if (gid_from_string(mcs_params_specific, &gid ) != 0 ) { info("mcs: Only one invalid group : %s. " "ondemand, ondemandselect set", groups_names); nb_mcs_groups = 0; @@ -190,10 +197,12 @@ static int _check_and_load_params(void) } return SLURM_SUCCESS; } + nb_mcs_groups = nb_mcs_groups + 1; array_mcs_parameter = xmalloc(nb_mcs_groups * sizeof(uint32_t)); tmp_params = xstrdup(mcs_params_specific); groups_names = strtok_r(tmp_params, "|", &name_ptr); + i = 0; while (groups_names) { if (i == (nb_mcs_groups - 1)) { @@ -213,6 +222,7 @@ static int _check_and_load_params(void) i = i + 1; groups_names = strtok_r(NULL, "|", &name_ptr); } + /* if no valid group : deselect all params */ if (nb_valid_group == 0) { slurm_mcs_reset_params(); @@ -220,6 +230,7 @@ static int _check_and_load_params(void) xfree(tmp_params); return SLURM_ERROR; } + xfree(tmp_params); return SLURM_SUCCESS; } @@ -236,20 +247,20 @@ static int _find_mcs_label(gid_t *groups, int ngroups, char **result) struct group *gr; if (ngroups == 0) - rc = SLURM_ERROR; - else { - for( i = 0 ; i < nb_mcs_groups ; i++) { - for ( j = 0 ; j < ngroups ; j++) { - tmp_group = (uint32_t) groups[j]; - if (array_mcs_parameter[i] == tmp_group ) { - gr = getgrgid(groups[j]); - *result = gr->gr_name; - return rc; - } + return SLURM_ERROR; + + for (i = 0; i < nb_mcs_groups; i++) { + for (j = 0; j < ngroups; j++) { + tmp_group = (uint32_t) groups[j]; + if (array_mcs_parameter[i] == tmp_group) { + gr = getgrgid(groups[j]); + *result = gr->gr_name; + return rc; } } - rc = SLURM_ERROR; } + rc = SLURM_ERROR; + return rc; } @@ -266,7 +277,7 @@ static int _check_mcs_label (struct job_record *job_ptr, char *label) int ngroups = -1; /* test if real unix group */ - if ( gid_from_string(label, &gid ) != 0 ) + if (gid_from_string(label, &gid ) != 0) return rc; /* test if this group is owned by the user */ @@ -274,6 +285,7 @@ static int _check_mcs_label (struct job_record *job_ptr, char *label) groups, MAX_GROUPS, &ngroups); if (rc) /* Failed to get groups */ return rc; + rc = SLURM_ERROR; for (i = 0; i < ngroups; i++) { tmp_group = (uint32_t) groups[i]; @@ -282,16 +294,19 @@ static int _check_mcs_label (struct job_record *job_ptr, char *label) break; } } + if (rc == SLURM_ERROR) return rc; + rc = SLURM_ERROR; /* test if mcs_label is in list of possible mcs_label */ - for( i = 0 ; i < nb_mcs_groups ; i++) { - if (array_mcs_parameter[i] == gid ) { + for (i = 0; i < nb_mcs_groups; i++) { + if (array_mcs_parameter[i] == gid) { rc = SLURM_SUCCESS; return rc; } } + return rc; } @@ -311,6 +326,7 @@ extern int mcs_p_set_mcs_label (struct job_record *job_ptr, char *label) if ((slurm_mcs_get_enforced() == 0) && job_ptr->details && (job_ptr->details->whole_node != WHOLE_NODE_MCS)) return SLURM_SUCCESS; + rc = _get_user_groups(job_ptr->user_id,job_ptr->group_id, groups, MAX_GROUPS, &ngroups); if (rc) { /* Failed to get groups */ @@ -319,6 +335,7 @@ extern int mcs_p_set_mcs_label (struct job_record *job_ptr, char *label) else return SLURM_ERROR; } + rc = _find_mcs_label(groups, ngroups, &result); if (rc) { return SLURM_ERROR; @@ -351,9 +368,9 @@ extern int mcs_p_check_mcs_label (uint32_t user_id, char *mcs_label) if (mcs_label != NULL) { /* test if real unix group */ - if ( gid_from_string(mcs_label, &gid ) != 0 ) { + if (gid_from_string(mcs_label, &gid ) != 0) return rc; - } + /* test if this group is owned by the user */ slurm_user_gid = gid_from_uid(user_id); group_id = (uint32_t) slurm_user_gid; @@ -361,6 +378,7 @@ extern int mcs_p_check_mcs_label (uint32_t user_id, char *mcs_label) &ngroups); if (rc) /* Failed to get groups */ return rc; + rc = SLURM_ERROR; for (i = 0; i < ngroups; i++) { tmp_group = (uint32_t) groups[i]; @@ -369,8 +387,8 @@ extern int mcs_p_check_mcs_label (uint32_t user_id, char *mcs_label) break; } } - } else { + } else rc = SLURM_SUCCESS; - } + return rc; } diff --git a/src/plugins/mcs/user/mcs_user.c b/src/plugins/mcs/user/mcs_user.c index 5d5a71991e5feb2d40634ee5c61d3cf5170e546d..2ff26736f70fb1d8ae3b2fed0f1879209ba3110c 100644 --- a/src/plugins/mcs/user/mcs_user.c +++ b/src/plugins/mcs/user/mcs_user.c @@ -95,23 +95,24 @@ extern int mcs_p_set_mcs_label (struct job_record *job_ptr, char *label) { char *user = NULL; int rc = SLURM_SUCCESS; + user = uid_to_string((uid_t) job_ptr->user_id); xfree(job_ptr->mcs_label); + if (label != NULL) { /* test label param */ - if (xstrcmp(label, user) == 0) { + if (xstrcmp(label, user) == 0) job_ptr->mcs_label = xstrdup(user); - } else { + else rc = SLURM_ERROR; - } } else { if ((slurm_mcs_get_enforced() == 0) && job_ptr->details && - (job_ptr->details->whole_node != WHOLE_NODE_MCS)) { + (job_ptr->details->whole_node != WHOLE_NODE_MCS)) ; - } else { + else job_ptr->mcs_label = xstrdup(user); - } } + xfree(user); return rc; } @@ -123,16 +124,16 @@ extern int mcs_p_check_mcs_label (uint32_t user_id, char *mcs_label) { char *user = NULL; int rc = SLURM_SUCCESS; + user = uid_to_string((uid_t) user_id); if (mcs_label != NULL) { - if (xstrcmp(mcs_label, user) == 0) { + if (xstrcmp(mcs_label, user) == 0) rc = SLURM_SUCCESS; - } else { + else rc = SLURM_ERROR; - } - } else { + } else rc = SLURM_SUCCESS; - } + xfree(user); return rc; } diff --git a/src/slurmctld/node_scheduler.c b/src/slurmctld/node_scheduler.c index 848d127307cc5440ca0117a80275dd1676ca4e67..cf0fab0b2f8c901ac3205c9101139d485cfcf71f 100644 --- a/src/slurmctld/node_scheduler.c +++ b/src/slurmctld/node_scheduler.c @@ -941,11 +941,8 @@ extern void filter_by_node_mcs(struct job_record *job_ptr, int mcs_select, struct node_record *node_ptr; int i; - if (mcs_select != 1) - return; - /* Need to filter out any nodes allocated with other mcs */ - if (job_ptr->mcs_label != NULL) { + if (job_ptr->mcs_label && (mcs_select == 1)) { for (i = 0, node_ptr = node_record_table_ptr; i < node_record_count; i++, node_ptr++) { /* if there is a mcs_label -> OK if it's the same */ diff --git a/testsuite/expect/test17.51 b/testsuite/expect/test17.51 index 3ea77d2b361e4aaa67a6372647a3a1ad463bf03d..693a68d21692fed44f2bde4aca48cdf0740af6af 100755 --- a/testsuite/expect/test17.51 +++ b/testsuite/expect/test17.51 @@ -116,10 +116,11 @@ if {[is_super_user] == 0} { get_conf_path copy_conf -send_user "\n---Checking sbatch uses mcs-label only for some jobs (ondemand mode)---\n" +send_user "\n---Checking sbatch uses mcs-label only for some jobs (ondemand,select mode)---\n" # # Change the slurm.conf MCSparameters and MCSPlugin +# test with ondemand,select # exec $bin_sed -i /^\[\t\s\]*MCSPlugin\[\t\s\]*=/Id $config_path/slurm.conf exec $bin_sed -i /^\[\t\s\]*MCSParameters\[\t\s\]*=/Id $config_path/slurm.conf @@ -158,9 +159,9 @@ if {$found == 0} { send_user "\n---Checking sbatch fails with a bad mcs-label ---\n" set timeout $max_job_delay -make_bash_script $tmp_job "sleep 10" +make_bash_script $tmp_job "sleep 30" -spawn $sbatch -N1 --mcs-label=foo -t1 $tmp_job +spawn $sbatch -N1 --mcs-label=foo -t10 $tmp_job expect { -re "Batch job submission failed: Invalid mcs_label specified" { send_user "\nThis error is expected, no worries\n" @@ -179,9 +180,9 @@ expect { ###### Check that sbatch uses mcs-label=user ###### send_user "\n---Checking sbatch uses mcs-label=user---\n" -make_bash_script $tmp_job "sleep 10" +make_bash_script $tmp_job "sleep 30" set job_id 0 -spawn $sbatch -N1 -o/dev/null --exclusive=mcs -t1 $tmp_job +spawn $sbatch -N1 -o/dev/null --exclusive=mcs -t10 $tmp_job expect { -re "Submitted batch job ($number)" { set job_id $expect_out(1,string) @@ -331,9 +332,9 @@ cancel_job $job_id # # Change the slurm.conf MCSparameters and MCSPlugin -# test with enforced +# test with enforced,noselect # -send_user "\n---Checking sbatch uses mcs-label with all jobs (enforced mode)---\n" +send_user "n---Checking sbatch uses mcs-label with all jobs (enforced,noselect mode)---\n" exec $bin_sed -i /^\[\t\s\]*MCSPlugin\[\t\s\]*=/Id $config_path/slurm.conf exec $bin_sed -i /^\[\t\s\]*MCSParameters\[\t\s\]*=/Id $config_path/slurm.conf exec $bin_sed -i /^\[\t\s\]*PrivateData\[\t\s\]*=/Id $config_path/slurm.conf @@ -346,7 +347,7 @@ update_conf ###### Check that sbatch uses mcs-label=user ###### send_user "\n---Checking sbatch uses mcs-label=user---\n" -make_bash_script $tmp_job "sleep 10" +make_bash_script $tmp_job "sleep 30" spawn $sbatch -N1 -o/dev/null -t1 $tmp_job expect { @@ -443,6 +444,291 @@ expect { cancel_job $job_id +# +# Change the slurm.conf MCSparameters and MCSPlugin +# test with ondemand,noselect +# +send_user "\n---Checking sbatch doesn't use mcs-label on filter (ondemand,noselect mode)---\n" +exec $bin_sed -i /MCSPlugin/d $config_path/slurm.conf +exec $bin_sed -i /MCSParameters/d $config_path/slurm.conf +exec $bin_sed -i /PrivateData/d $config_path/slurm.conf +exec $bin_echo MCSPlugin=mcs/user >> $config_path/slurm.conf +exec $bin_echo MCSParameters=ondemand,noselect,privatedata >> $config_path/slurm.conf +exec $bin_echo PrivateData=jobs,nodes >> $config_path/slurm.conf +update_conf + +###### Check that sbatch uses mcs-label=user ###### +send_user "\n---Checking sbatch uses --exclusive=mcs ---\n" + +make_bash_script $tmp_job "sleep 30" + +spawn $sbatch -N1 --exclusive=mcs -o/dev/null -t10 $tmp_job +expect { + -re "Submitted batch job ($number)" { + set job_id $expect_out(1,string) + exp_continue + } + timeout { + send_user "\nFAILURE: sbatch is not responding\n" + set exit_code 1 + } + eof { + wait + } +} + +if {$job_id == 0} { + send_user "\nFAILURE: job was not submitted\n" + set exit_code 1 +} + +set found 0 +sleep 3 +spawn $squeue --jobs=$job_id --noheader -O "mcslabel" +expect { + -re "$user_name" { + send_user "\nMCS-label OK for this job\n" + set found 1 + exp_continue + } + -re "(null)" { + send_user "\nFAILURE: NO MCS-label for this job\n" + set exit_code 1 + exp_continue + } + timeout { + send_user "\nFAILURE: squeue is not responding\n" + set exit_code 1 + } + eof { + wait + } +} + +if {$found == 0} { + send_user "\nFAILURE: job was submitted with a bad mcs-label\n" + set exit_code 1 +} + +set found 0 +set node 0 +spawn $squeue --jobs=$job_id --noheader -O "nodelist" +expect { + -re "($alpha_numeric_under)" { + set node $expect_out(1,string) + send_user "\nNode for this job : $node\n" + set found 1 + } + timeout { + send_user "\nFAILURE: squeue is not responding\n" + set exit_code 1 + } + eof { + wait + } +} + +if {$found == 0} { + send_user "\nFAILURE: no node found in squeue command\n" + set exit_code 1 +} + +# +# verify MCS of nodes +# +set found 0 +spawn -noecho $bin_bash -c "exec $scontrol show node=$node | $bin_grep MCS" +expect { + -re "MCS_label=$user_name" { + send_user "\nFAILURE: a mcs_label is found for this job. It was not expected\n" + set exit_code 1 + } + -re "MCS_label=N/A" { + send_user "\nNo mcs_label for this node. It was expected\n" + } + timeout { + send_user "\nFAILURE: scontrol is not responding\n" + set exit_code 1 + } + eof { + wait + } +} + +cancel_job $job_id + +###### Check that sbatch doesn't use mcs-label ###### +send_user "\n---Checking sbatch uses --exclusive=mcs ---\n" + +make_bash_script $tmp_job "sleep 30" +set job_id 0 +spawn $sbatch -N1 -o/dev/null -t10 $tmp_job +expect { + -re "Submitted batch job ($number)" { + set job_id $expect_out(1,string) + exp_continue + } + timeout { + send_user "\nFAILURE: sbatch is not responding\n" + set exit_code 1 + } + eof { + wait + } +} + +if {$job_id == 0} { + send_user "\nFAILURE: job was not submitted\n" + set exit_code 1 +} +set found 0 + +set found 0 +spawn $squeue --jobs=$job_id --noheader -O "mcslabel" +expect { + -re "(null)" { + send_user "\nNO MCS-label for this job : this is expected\n" + exp_continue + } + -re "$user_name" { + send_user "\na MCS-label for this job : this is not expected\n" + set found 1 + exp_continue + } + timeout { + send_user "\nFAILURE: squeue is not responding\n" + set exit_code 1 + } + eof { + wait + } +} + +if {$found == 1} { + send_user "\nFAILURE: job was submitted with a bad mcs-label\n" + set exit_code 1 +} + +cancel_job $job_id +sleep 5 +send_user "\n---Checking sbatch uses mcs-label for all jobs (enforced,select mode)---\n" + +# +# Change the slurm.conf MCSparameters and MCSPlugin +# test with enforced,select +# +exec $bin_sed -i /MCSPlugin/d $config_path/slurm.conf +exec $bin_sed -i /MCSParameters/d $config_path/slurm.conf +exec $bin_sed -i /PrivateData/d $config_path/slurm.conf +exec $bin_echo MCSPlugin=mcs/user >> $config_path/slurm.conf +exec $bin_echo MCSParameters=enforced,select,privatedata >> $config_path/slurm.conf +exec $bin_echo PrivateData=jobs,nodes >> $config_path/slurm.conf +update_conf + +###### Check that sbatch uses mcs-label=user ###### +send_user "\n---Checking sbatch with no --exclusive=mcs ---\n" + +make_bash_script $tmp_job "sleep 30" +set job_id 0 +spawn $sbatch -N1 -o/dev/null -t20 $tmp_job +expect { + -re "Submitted batch job ($number)" { + set job_id $expect_out(1,string) + exp_continue + } + timeout { + send_user "\nFAILURE: sbatch is not responding\n" + set exit_code 1 + } + eof { + wait + } +} + +if {$job_id == 0} { + send_user "\nFAILURE: job was not submitted\n" + set exit_code 1 + exit 1 +} + +set found 0 +spawn $squeue --jobs=$job_id --noheader -O "mcslabel" +expect { + -re "(null)" { + send_user "\nNO MCS-label for this job : this is not expected\n" + exp_continue + } + -re "$user_name" { + send_user "\nMCS-label OK for this job\n" + set found 1 + exp_continue + } + timeout { + send_user "\nFAILURE: squeue is not responding\n" + set exit_code 1 + } + eof { + wait + } +} + +if {$found == 0} { + send_user "\nFAILURE: job was submitted with a bad mcs-label\n" + set exit_code 1 +} + +set found 0 +set node 0 +sleep 2 +spawn $squeue --jobs=$job_id --noheader -O "nodelist" +expect { + -re "($alpha_numeric_under)" { + set node $expect_out(1,string) + send_user "\nNode for this job : $node\n" + set found 1 + } + timeout { + send_user "\nFAILURE: squeue is not responding\n" + set exit_code 1 + } + eof { + wait + } +} + +if {$found == 0} { + send_user "\nFAILURE: no node found in squeue command\n" + set exit_code 1 +} + + +# +# verify MCS of nodes +# +set found 0 +spawn -noecho $bin_bash -c "exec $scontrol show node=$node | $bin_grep MCS" +expect { + -re "MCS_label=$user_name" { + send_user "\n mcs_label OK for node $node\n" + set found 1 + exp_continue + } + timeout { + send_user "\nFAILURE: scontrol is not responding\n" + set exit_code 1 + } + eof { + wait + } +} + +if {$found == 0} { + send_user "\nFAILURE: job was submitted with node with bad mcs-label\n" + set exit_code 1 +} + +cancel_job $job_id + # Clean up vestigial files and restore original slurm.conf file send_user "\nChanging slurm.conf back\n" exec $bin_cp -v $cwd/slurm.conf.orig $config_path/slurm.conf