Skip to content
Snippets Groups Projects
  1. Sep 06, 2017
  2. Sep 04, 2017
    • Alejandro Sanchez's avatar
      Fix potential part_record NULL dereference. · 459af7f7
      Alejandro Sanchez authored
      ** CID 175193: (FORWARD_NULL)
      
      Theoretically we shouldn't have a job_desc_msg_t without an associated
      part_record, but just in case let's harden the code.
      
      Introduced in previous commit: 24365514.
      459af7f7
    • Alejandro Sanchez's avatar
      Fix to test job mem against MaxMemPer[CPU|Node] limits at scheduling time. · 24365514
      Alejandro Sanchez authored
      Initially job mem limits were tested at submission time through
      _validate_min_mem_partition() -> _valid_pn_min_mem(), but not tested
      again at scheduling time, thus leading to jobs incorrectly being scheduled
      against partitions where the job exceeded their MaxMemPer* limit
      (which can in turn be inherited from the system wide limit too).
      
      NOTE: New WAIT_PN_MEM_LIMIT job_state_reason enum component added to support
      this new waiting reason.
      
      Bug 2291.
      24365514
  3. Sep 01, 2017
  4. Aug 22, 2017
  5. Aug 15, 2017
  6. Aug 14, 2017
  7. Aug 12, 2017
  8. Aug 11, 2017
  9. Aug 09, 2017
  10. Aug 02, 2017
    • Marshall Garey's avatar
      Fix srun jobs to run in high prio partition · 948de46b
      Marshall Garey authored
      srun jobs that could start immediately and requested multiple partitions
      didn't run in the highest priority partition if the highest priority
      partition wasn't listed first.
      
      It's possible that the scontrol show jobs will show the partition list
      in priority order now that the job's partition list gets sorted by
      priority.
      
      Bug 4015
      948de46b
  11. Jul 28, 2017
    • Morris Jette's avatar
      Deallocate pack job start failure · 17952cdf
      Morris Jette authored
      If a pack job is only partitially allocated resources (likely due
        due to limits), deallocate resources from those components which
        have been started and requeue them.
      17952cdf
  12. Jul 27, 2017
    • Morris Jette's avatar
      Refactor limits logic for pack job · db10eae9
      Morris Jette authored
      This change adds a new function and moves some logic around so that
        limits can be tested on a pack job as a whole (that logic still
        needs to be developed).
      db10eae9
  13. Jul 25, 2017
  14. Jul 24, 2017
  15. Jul 19, 2017
  16. Jul 05, 2017
    • Brian Christiansen's avatar
      Delete federated origin jobs after minjobage · f1441da3
      Brian Christiansen authored
      It wasn't doing it for origin jobs.
      f1441da3
    • Brian Christiansen's avatar
      Let remote fed jobs stay in queue till minjobage · f715668e
      Brian Christiansen authored
      Previously remote jobs would be removed from the job_list as quickly as
      possible to prevent collisions with requeued jobs and to clear up the
      jobs and the orign job would stay around till minjobage on the origin.
      But the origin job didn't have the details from the job that ran on a
      remote cluster.
      
      Now just don't show revoked jobs. The origin tracking job will remain as
      revoked and not shown and the remote job will hang around for display
      till minjobage. scontrol show jobs will show the job from the cluster
      that ran the job. The job is requeuable as long as the origin job is
      still in the origin cluster's job_list.
      f715668e
    • Brian Christiansen's avatar
      Don't schedule or show revoked jobs · fcd22f9b
      Brian Christiansen authored
      Just check for the revoked state instead of checking if it's a tracker
      job since an origin job will be revoked if it can't run on the origin or
      if it's running on a remote cluster.
      fcd22f9b
  17. Jun 27, 2017
  18. Jun 22, 2017
  19. Jun 21, 2017
  20. Jun 20, 2017
  21. Jun 19, 2017
  22. Jun 16, 2017
  23. Jun 13, 2017
  24. Jun 08, 2017
Loading