Skip to content
Snippets Groups Projects
  1. Aug 09, 2023
  2. Aug 08, 2023
  3. Aug 07, 2023
    • David Gloe's avatar
      Send slurmctld timeout to slurmstepd · 16946d63
      David Gloe authored
      If using a backup slurmctld and the primary goes down temporarily,
      slurmstepd uses the slurmctld timeout to determine when to retry
      contacting the primary. Send the value to slurmstepd so it waits the
      correct amount of time, rather than NO_VAL16/2.
      
      The slurmctld timeout is initialized to NO_VAL16 in init_slurm_conf()
      and the sleep time of slurmctld_timeout / 2 happens in
      slurm_send_recv_controller_msg().
      
      Bug 17232
      16946d63
    • Ben Roberts's avatar
      Docs - Fix typos in reservations page · 314ad71a
      Ben Roberts authored
      Bug 17360
      314ad71a
    • Marshall Garey's avatar
      Fix allocating fewer nodes for a job than required · 8e5bc322
      Marshall Garey authored
      
      This partially reverts commit dfb07974. Commit dfb07974 removed code
      that limits avail_cpus on a node. This commit restores that code only for
      jobs that do not request gres. For jobs that request gres, avail_cpus is
      limited later by gres_select_filter_sock_core.
      
      Prior to this commit, for jobs that do not request gres, avail_cpus is not
      limited based on the maximum number of tasks that can run on the node.
      Therefore, fewer nodes could be assigned to the job than required by tasks.
      
      For example, given nodes with 16 cpus, the following job needs 4 nodes.
      However, after commit dfb07974 and before this commit, the job would be
      assigned 3 nodes. Three nodes have sufficient cpus to fulfill the job
      request, although it does not fulfill the cpus_per_task request.
      
          srun --mem=0 -n4 --cpus-per-task=12 --exclusive hostname
      
      Bug 17185
      
      Signed-off-by: default avatarMarcin Stolarek <cinek@schedmd.com>
      8e5bc322
    • Tim Wickberg's avatar
      d1850852
  4. Aug 04, 2023
  5. Aug 03, 2023
    • Marcin Stolarek's avatar
      Fix gres_select_filter_sock_core for CR_CPU_MEMORY · 966cc0b7
      Marcin Stolarek authored
      In case of CR_CPU_MEMORY can_job_run_on_node can set number of CPUs
      available to the job based on --mem-per-cpu to a value that is not a
      multiple of ThreadsPerCore. In this case we can't assume that removal
      of core always removes ThreadsPerCore. Instead of that assumption just
      reset the value number of available cores * ThreadsPerCore.
      
      Bug 17229
      966cc0b7
    • Benjamin Witham's avatar
      Upgrade rate limit messages from verbose() to info() · f90dcd4d
      Benjamin Witham authored
      This is an update of commit 49bcc0d4. Originally the message was at
      debug(), then changed to verbose(), and is now being upgraded to info().
      
      Bug 16664, 17341.
      f90dcd4d
  6. Aug 02, 2023
  7. Aug 01, 2023
  8. Jul 31, 2023
Loading