Commits · 1a603a7b452675b0a1d8629b77a02cd05dfa0494 · tud-zih-energy / Slurm

Aug 24, 2017

Fix Coverity CID 174746: Control flow issues (DEADCODE). · 1a603a7b

Testing if curl_handle != NULL or rc != SLURM_SUCCESS was already done
in the right above if/else statements, jumping to the consequent goto
cleanup label if needed. Thus the removed test was never going to be
evaluated to true, and Coverity properly warned about this.

Regression introduced in commit 5f5e6472 (code cleanup).

1a603a7b

Merge branch 'slurm-17.02' · 75777f44
Alejandro Sanchez authored 7 years ago

75777f44

Prevent slurmstepd ABRT when parsing gres.conf CPUs. · 3e1fffb6

Alejandro Sanchez authored 7 years ago

Calling bit_unfmt() with a zero bit_size() bitmap leads to a later
call to bit_nclear() with start=0 and stop=-1, leading to the ABRT.

This scenario happened when cgroup.conf has ConstrainDevices=yes and
task_cgroup_devices_create() tries to collect the GRES devices
but gres_cpu_cnt=0, thus creating a p->cpus_bitmap = bit_alloc(gres_cpu_cnt);
of zero size which is passed by argument to bit_unfmt().

gres_cpu_cnt is 0 because we have defined a gres.conf like this:

Name=gpu Type=tesla File=/tmp/gres/tesla0 CPUs=0,1
Name=gpu Type=tesla File=/tmp/gres/tesla1 CPUs=0,1
Name=gpu Type=kepler File=/tmp/gres/kepler0 CPUs=2,3
Name=gpu Type=kepler File=/tmp/gres/kepler1 CPUs=2,3

but have no GresTypes nor GRES option in the slurm.conf / node config def.

Bug 3974

3e1fffb6

Fix warning of using gtk_tree_selection_selected_foreach to edit things. · bac1effd
Alejandro Sanchez authored 7 years ago
```
Bug 3217
```
bac1effd
Simplify common code in sview. · 90ca37b0
Danny Auble authored 7 years ago

90ca37b0

Aug 23, 2017

jobcomp/elasticsearch - code cleanup, no functional change. · 5f5e6472
Alejandro Sanchez authored 7 years ago

5f5e6472
Merge branch 'slurm-17.02' · d61c0d3e
Alejandro Sanchez authored 7 years ago

d61c0d3e

jobcomp/elasticsearch - fix memory leak when transferring generated buffer. · 8172b7df

Alejandro Sanchez authored 7 years ago

Running slurmctld under valgrind while operating with jobcomp/elasticsearch
reported the following bytes definitely lost:

==27403== 658 bytes in 1 blocks are definitely lost in loss record 301 of 342
==27403==    at 0x4C2FD4F: realloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==27403==    by 0x2281B3: slurm_xrealloc (xmalloc.c:137)
==27403==    by 0x22856A: makespace (xstring.c:114)
==27403==    by 0x2285D0: _xstrcat (xstring.c:132)
==27403==    by 0x228CE0: _xstrfmtcat (xstring.c:291)
==27403==    by 0x83C5BCD: ???
==27403==    by 0x30A913: g_slurm_jobcomp_write (slurm_jobcomp.c:172)
==27403==    by 0x18D8FC: job_completion_logger (job_mgr.c:13652)

It turns out the generated buffer in slurm_jobcomp_log_record was xstrdup'ed to
the corresponding job_node->serialized_job, but the originally generated buffer
wasn't freed afterwards. The fix consists in change the transfer so that instead
of xstrdup'ing the char * we just assign the pointer and NULL the buffer.

The job_node->serialized_job was already xfree'd properly later when the job
was indexed.

Discovered while working on Bug 4065.

8172b7df

Print a warning if no results list is available. · 6d15591f

Tim Wickberg authored 7 years ago

This should only happen due to ESLURM_RESULT_TOO_LARGE,
which leads to no list being packed.

Follow on to 390da8cf / 8cf1835c.

Bug 3624.

6d15591f

Fix from change in ffe2c5dc that removed last parameter to the · d24525d4
Danny Auble authored 7 years ago
```
launch_g_step_wait() function.
```
d24525d4

Aug 22, 2017
- Strip trailing slashes from the JobCompLoc for jobcomp/elasticsearch. · 60eed77f
  Alejandro Sanchez authored 7 years ago
  
  Otherwise the resulting URL may be invalid. Update documentation while here as well. Bug 4065.
  60eed77f
- Change capmc_node_bitmap to a local variable. · b56f12e0
  Tim Shaw authored 7 years ago
  
  Otherwise a race between threads in _check_node_status leads to a crash. Bug 4093.
  b56f12e0
- Fail on EPERM as you would any other error. · a5b47f7b
  Tim Wickberg authored 7 years ago
  
  Modification of commit c7e6d864. Bug 4095.
  a5b47f7b
- Signal the purge thread if ever removing something from the job_list. · d0f56c67
  Danny Auble authored 7 years ago
  
  d0f56c67
- Merge branch 'slurm-17.02' · 6c4cd4a1
  Morris Jette authored 7 years ago
  
  6c4cd4a1
- In salloc with --uid option, drop supplementary groups before changing UID · c7e6d864
  Philip Kovacs authored 7 years ago
  
  bug 4095
  c7e6d864
- Merge branch 'slurm-17.02' · 512e4610
  Morris Jette authored 7 years ago
  
  512e4610
- In salloc with --uid option, drop supplementary groups before changing UID · 1efbd459
  Philip Kovacs authored 7 years ago
  
  bug 4095
  1efbd459
- Note new contributor · fe1cd70b
  Morris Jette authored 7 years ago
  
  fe1cd70b
- Elimiate -Wformat-truncation warnings · d04fa289
  Philip Kovacs authored 7 years ago
  
  Bug 4094
  d04fa289
- Add error checking on file format · 7ba8e020
  Morris Jette authored 7 years ago
  
  Coverity CID 166001
  7ba8e020
- log fcntl error · ca23df45
  Morris Jette authored 7 years ago
  
  Coverity CID 44725, 44726, 44747, 44728
  ca23df45
- Log job not found, don't use NULL pointer · 70f86051
  Morris Jette authored 7 years ago
  
  Coverity CID 44968
  70f86051
- remove dead code · db233217
  Morris Jette authored 7 years ago
  
  Coverity CID 44810
  db233217
- Prevent possible overflow on multiply · f5546cdd
  Morris Jette authored 7 years ago
  
  Coverity CID 53126
  f5546cdd
- Avoid possible overflow on multiply · 1f96f213
  Morris Jette authored 7 years ago
  
  Coverity CID 53127
  1f96f213
- Log write error · 020defef
  Morris Jette authored 7 years ago
  
  Coverity CID 44761
  020defef
- log invalid QOS · 21ea996c
  Morris Jette authored 7 years ago
  
  Coverity CID 44696
  21ea996c
- Log file delete error · 041d50e2
  Morris Jette authored 7 years ago
  
  Coverity CID 44729
  041d50e2
- Log remove file error · 46528dfd
  Morris Jette authored 7 years ago
  
  Coverity CID 44700
  46528dfd
Aug 21, 2017

jobcomp/filetxt/elasticsearch - fix [derived]/exit_code fields storage · 9c720d8f

Alejandro Sanchez authored 7 years ago

The exit status value for these two fields was incorrectly saved as-is.  The
patch makes use of the appropiate macros to properly decode the low-order
8 bits of the exit status and the signal number (if any).

bug 3942

9c720d8f

Print numbers using exponential format as needed · c125759d

Isaac Hartung authored 7 years ago

Print numbers using exponential format if required to fit in allocated
    field width. The sacctmgr and sshare commands are impacted.
bug 1749

c125759d

Put back Gres debug flag string · 9263f45c
Brian Christiansen authored 7 years ago
```
was removed in 2705f9c5.
Caused sview to crash when viewing the debug_flags.
```
9263f45c
Merge branch 'slurm-17.02' · f7ce8cc8
Morris Jette authored 7 years ago

f7ce8cc8
Clarify use of --switches option on dragonfly network · 1542ee84
Morris Jette authored 7 years ago
```
bug 4056
```
1542ee84

select/cons_res - fix bug with Dragonfly and --switches count timeout · 46c0919d

Alejandro Sanchez authored 7 years ago

Given a configuration with TopologyParam including Dragonfly option, if a
job requested --switches count, the count timeout specified by either
the job request or max_switch_wait SchedulerParameters was not respected.
This was due to leaf_switch_count variable not being incremented in
_eval_nodes_dfly() function when needed, as we do in _eval_nodes_topo(),
the later being a execution path which already succeed to wait for the
switch count timeout.

Bug 4056

46c0919d

Aug 19, 2017
- Update contributor list · 71f344ae
  Morris Jette authored 7 years ago
  
  For commit 35b505cc, bug 3982
  71f344ae
- Remove dead code · 31b9a2cc
  Morris Jette authored 7 years ago
  
  Coverity CID 44808
  31b9a2cc
- Remove redundant NULL varible check · d4c36a65
  Morris Jette authored 7 years ago
  
  Coverity CID 45157
  d4c36a65
Aug 18, 2017
- Fix QOS usage factor applying to TRES run mins · d2f08d4a
  Brian Christiansen authored 7 years ago
  
  d2f08d4a