Skip to content
Snippets Groups Projects
  1. Aug 23, 2017
    • Alejandro Sanchez's avatar
      jobcomp/elasticsearch - fix memory leak when transferring generated buffer. · 8172b7df
      Alejandro Sanchez authored
      Running slurmctld under valgrind while operating with jobcomp/elasticsearch
      reported the following bytes definitely lost:
      
      ==27403== 658 bytes in 1 blocks are definitely lost in loss record 301 of 342
      ==27403==    at 0x4C2FD4F: realloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
      ==27403==    by 0x2281B3: slurm_xrealloc (xmalloc.c:137)
      ==27403==    by 0x22856A: makespace (xstring.c:114)
      ==27403==    by 0x2285D0: _xstrcat (xstring.c:132)
      ==27403==    by 0x228CE0: _xstrfmtcat (xstring.c:291)
      ==27403==    by 0x83C5BCD: ???
      ==27403==    by 0x30A913: g_slurm_jobcomp_write (slurm_jobcomp.c:172)
      ==27403==    by 0x18D8FC: job_completion_logger (job_mgr.c:13652)
      
      It turns out the generated buffer in slurm_jobcomp_log_record was xstrdup'ed to
      the corresponding job_node->serialized_job, but the originally generated buffer
      wasn't freed afterwards. The fix consists in change the transfer so that instead
      of xstrdup'ing the char * we just assign the pointer and NULL the buffer.
      
      The job_node->serialized_job was already xfree'd properly later when the job
      was indexed.
      
      Discovered while working on Bug 4065.
      8172b7df
  2. Aug 22, 2017
  3. Aug 21, 2017
    • Alejandro Sanchez's avatar
      select/cons_res - fix bug with Dragonfly and --switches count timeout · 46c0919d
      Alejandro Sanchez authored
      Given a configuration with TopologyParam including Dragonfly option, if a
      job requested --switches count, the count timeout specified by either
      the job request or max_switch_wait SchedulerParameters was not respected.
      This was due to leaf_switch_count variable not being incremented in
      _eval_nodes_dfly() function when needed, as we do in _eval_nodes_topo(),
      the later being a execution path which already succeed to wait for the
      switch count timeout.
      
      Bug 4056
      46c0919d
  4. Aug 17, 2017
  5. Aug 16, 2017
  6. Aug 14, 2017
  7. Aug 11, 2017
  8. Aug 10, 2017
  9. Aug 07, 2017
  10. Aug 04, 2017
  11. Aug 01, 2017
  12. Jul 28, 2017
  13. Jul 26, 2017
    • Dominik Bartkiewicz's avatar
      Fix regression in commit e5c05549 that would put the stepd pid into the... · f28b1a97
      Dominik Bartkiewicz authored
      Fix regression in commit e5c05549 that would put the stepd pid into the memory cgroup instead of the task's pid.
      
      Beforehand this would put the result of getpid() into the cgroup.  Before
      e5c05549 this was done in the child of the fork which would get you
      the task's pid, but moving it to run in the parent broke this logic.
      
      What this patch does is adds pid to the input parameters of
      task_g_pre_launch_priv making it so we could use the correct pid.
      f28b1a97
  14. Jul 19, 2017
  15. Jul 07, 2017
  16. Jun 30, 2017
  17. Jun 13, 2017
    • Tim Wickberg's avatar
      Add LaunchParameters option of cray_net_exclusive. · 23721c4c
      Tim Wickberg authored
      Changes the alpsc_configure_nic() call to set the exclusive flag,
      and 100 for both the cpu and memory scaling values.
      
      Should only be used with exclusive jobs without concurrent steps
      running on a node, otherwise oversubscription of the GNI resources
      can occur leading to performance issues.
      
      Bug 3713.
      23721c4c
  18. Jun 12, 2017
  19. Jun 09, 2017
  20. Jun 08, 2017
  21. Jun 02, 2017
  22. May 31, 2017
  23. May 30, 2017
  24. May 26, 2017
Loading