Skip to content
Snippets Groups Projects
  1. Oct 19, 2012
  2. Oct 18, 2012
  3. Oct 17, 2012
  4. Oct 16, 2012
  5. Oct 04, 2012
  6. Oct 03, 2012
  7. Oct 02, 2012
    • Morris Jette's avatar
      Correct -mem-per-cpu logic for multiple threads per core · 6a103f2e
      Morris Jette authored
      See bugzilla bug 132
      
      When using select/cons_res and CR_Core_Memory, hyperthreaded nodes may be
      overcommitted on memory when CPU counts are scaled. I've tested 2.4.2 and HEAD
      (2.5.0-pre3).
      
      Conditions:
      -----------
      * SelectType=select/cons_res
      * SelectTypeParameters=CR_Core_Memory
      * Using threads
        - Ex. "NodeName=linux0 Sockets=1 CoresPerSocket=4 ThreadsPerCore=2
      RealMemory=400"
      
      Description:
      ------------
      In the cons_res plugin, _verify_node_state() in job_test.c checks if a node has
      sufficient memory for a job. However, the per-CPU memory limits appear to be
      scaled by the number of threads. This new value may exceed the available memory
      on the node. And, once a node is overcommitted on memory, future memory checks
      in _verify_node_state() will always succeed.
      
      Scenario to reproduce:
      ----------------------
      With the example node linux0, we run a single-core job with 250MB/core
          srun --mem-per-cpu=250 sleep 60
      
      cons_res checks that it will fit: ((real - alloc) >= job mem)
          ((400 - 0) >= 250) and the job starts
      
      Then, the memory requirement is doubled:
          "slurmctld: error: cons_res: node linux0 memory is overallocated (500) for
      job X"
          "slurmd: scaling CPU count by factor of 2"
      
      This job should not have started
      
      While the first job is still running, we submit a second, identical job
          srun --mem-per-cpu=250 sleep 60
      
      cons_res checks that it will fit:
          ((400 - 500) >= 250), the unsigned int wraps, the test passes, and the job
      starts
      
      This second job also should not have started
      6a103f2e
    • Morris Jette's avatar
    • Danny Auble's avatar
      one more fix · fb0269f3
      Danny Auble authored
      fb0269f3
    • Danny Auble's avatar
  8. Oct 01, 2012
  9. Sep 28, 2012
  10. Sep 27, 2012
  11. Sep 25, 2012
  12. Sep 24, 2012
  13. Sep 21, 2012
  14. Sep 20, 2012
  15. Sep 19, 2012
  16. Sep 18, 2012
Loading