- Sep 09, 2017
-
-
Tim Wickberg authored
-
- Sep 08, 2017
-
-
Tim Wickberg authored
Since ReleaseAgent is no longer required, we can strip out all the supporting logic for it.
-
Morris Jette authored
Accidentally removed bracked in checked-in code
-
Morris Jette authored
-
- Sep 07, 2017
-
-
Dominik Bartkiewicz authored
bug 3824
-
Morris Jette authored
Do not run the Node Health Check on termination of the external step as this happens when the job allocation ends and the job NHC will be executed anyway. Bug 4074
-
Danny Auble authored
-
- Sep 05, 2017
-
-
Morris Jette authored
Reported by Clang
-
- Sep 01, 2017
-
-
Morris Jette authored
Prevent a heterogeneous job allocation from including the same nodes in multiple components (required by MPI jobs spanning components).
-
Tim Wickberg authored
A lot of slurmdbd operations are authenticated by the accounting_storage plugin, rather than in slurmdbd. To allow the drop_priv flag to work it must be checked in is_user_min_admin_level() in addition to the various functions in proc_req.c .
-
Tim Wickberg authored
This reverts commit 47dad9e8.
-
Tim Wickberg authored
On second thought, we should be using these or quite similar functions. This reverts commit a44ef130.
-
- Aug 31, 2017
-
-
Tim Wickberg authored
A lot of slurmdbd operations are authenticated by the accounting_storage plugin, rather than in slurmdbd. To allow the drop_priv flag to work it must be checked in is_user_min_admin_level() in addition to the various functions in proc_req.c .
-
Tim Wickberg authored
-
Artem Polyakov authored
Signed-off-by:
Artem Polyakov <artpol84@gmail.com>
-
Artem Polyakov authored
Add new performance debugging feature to pmix plugin that allows to measure plain collectives performance on selected message sizes. This functionality is similar to a ping-pong feature. Performance results for existing point-to-point modes: size sapi dtcp ducx 1 0.002319442 0.000492334 0.000135243 2 0.002318223 0.000453552 0.000137356 4 0.00227046 0.00045832 0.000137117 8 0.002342675 0.000463539 0.000136455 16 0.00235131 0.000481208 0.00013619 32 0.002333058 0.000562986 0.000140756 64 0.002456691 0.000883791 0.000142574 128 0.002953556 0.001326429 0.000142336 256 0.003892236 0.002324766 0.000161224 512 0.006044123 0.004371988 0.000177675 1024 0.010324001 0.008485476 0.000224325 2048 0.018556118 0.016488896 0.000347243 4096 0.035331223 0.032744778 0.000481764 8192 0.06957123 0.065519465 0.001194106 16384 0.137925333 0.130130662 0.002544668 32768 0.272100422 0.259290563 0.009916888 65536 0.543431362 0.486692217 0.012841119 Signed-off-by:
Artem Polyakov <artpol84@gmail.com>
-
Artem Polyakov authored
Signed-off-by:
Artem Polyakov <artpol84@gmail.com>
-
Artem Polyakov authored
Signed-off-by:
Artem Polyakov <artpol84@gmail.com>
-
Artem Polyakov authored
Signed-off-by:
Artem Polyakov <artpol84@gmail.com>
-
Artem Polyakov authored
There were segmentation faults because of double free of a pending list when UCX comonent was trying to connect multiple times. Signed-off-by:
Artem Polyakov <artpol84@gmail.com>
-
- Aug 30, 2017
-
-
David Gloe authored
Statically linked Cray PMI applications still expect to use some file paths containing the old SLURM_ID_HASH format. Some Cray customers have certification requirements that make recompilation difficult. The attached patch defines a macro to convert the new SLURM_ID_HASH to the old format, and writes the files and symlinks necessary for statically linked Cray PMI applications to work. Bug 4114
-
- Aug 29, 2017
-
-
Brian Christiansen authored
Bug 4090
-
Danny Auble authored
-
Morris Jette authored
This applies to job steps for MPI. Even if there is more than one pack job component in a single MPI_COMM_WORLD, they will share a common SLURM_JOBID.
-
Brian Christiansen authored
as reported when compiling with optimizations (-O2).
-
Brian Christiansen authored
as reported when compiling with optimizations (-02). Initialize the variables early since they were being initialized in their loops and later checked for -1. Technically couldn't have happened since for example, user_part_inx1 would only be set to -1 if max_backfill_job_per_user_part was set. And user_part_inx is only checked later if max_backfill_job_per_user_part is set. Same thing for part_inx, max_backfill_job_per_part and user_inx, max_backfill_job_per_user.
-
Brian Christiansen authored
reported when compiling with optimizations (-O2). The compiler ignores the (void) cast and reports the error.
-
Brian Christiansen authored
reported when compiling with optimizations (-O2). field_id may be uninitialized or the the value from the previous iteration in the while loop. The only possible values of dataset_loc->type are: typedef enum { PROFILE_FIELD_NOT_SET, PROFILE_FIELD_UINT64, PROFILE_FIELD_DOUBLE } acct_gather_profile_field_type_t; and the while loop condition ensures that PROFILE_FIELD_NOT_SET is not handled. So instead of handling PROFILE_FIELD_NOT_SET directly, just catching everything else with the "default" case statement and continuing the loop.
-
Brian Christiansen authored
-
Danny Auble authored
relies on the primary to do so. There is a potential race condition if the backup DBD tries to create/check the database at the same time as the primary. This patch removes this race by not allowing the backup to do the check/create. Bug 3827
-
Morris Jette authored
Coverity 44943, 44944
-
Morris Jette authored
Coverity CID 44941, 44942
-
Morris Jette authored
This sets more per-pack-job environment variables for launched steps. All of the following are used by Open MPI: SLURM_CPUS_PER_TASK SLURM_STEP_NUM_TASKS SLURM_TASKS_PER_NODE A few more env vars are still needed by OpenMPI
-
- Aug 25, 2017
-
-
Morris Jette authored
These are required by OpenMPI
-
Morris Jette authored
Coverity CID 44723
-
- Aug 24, 2017
-
-
Alejandro Sanchez authored
Testing if curl_handle != NULL or rc != SLURM_SUCCESS was already done in the right above if/else statements, jumping to the consequent goto cleanup label if needed. Thus the removed test was never going to be evaluated to true, and Coverity properly warned about this. Regression introduced in commit 5f5e6472 (code cleanup).
-
Morris Jette authored
-
- Aug 23, 2017
-
-
Alejandro Sanchez authored
-
Alejandro Sanchez authored
Running slurmctld under valgrind while operating with jobcomp/elasticsearch reported the following bytes definitely lost: ==27403== 658 bytes in 1 blocks are definitely lost in loss record 301 of 342 ==27403== at 0x4C2FD4F: realloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so) ==27403== by 0x2281B3: slurm_xrealloc (xmalloc.c:137) ==27403== by 0x22856A: makespace (xstring.c:114) ==27403== by 0x2285D0: _xstrcat (xstring.c:132) ==27403== by 0x228CE0: _xstrfmtcat (xstring.c:291) ==27403== by 0x83C5BCD: ??? ==27403== by 0x30A913: g_slurm_jobcomp_write (slurm_jobcomp.c:172) ==27403== by 0x18D8FC: job_completion_logger (job_mgr.c:13652) It turns out the generated buffer in slurm_jobcomp_log_record was xstrdup'ed to the corresponding job_node->serialized_job, but the originally generated buffer wasn't freed afterwards. The fix consists in change the transfer so that instead of xstrdup'ing the char * we just assign the pointer and NULL the buffer. The job_node->serialized_job was already xfree'd properly later when the job was indexed. Discovered while working on Bug 4065.
-
Tim Wickberg authored
This should only happen due to ESLURM_RESULT_TOO_LARGE, which leads to no list being packed. Follow on to 390da8cf / 8cf1835c. Bug 3624.
-