Commits · e1a00772c58a7b6f82c2489ee9169aad719dbb2d · tud-zih-energy / Slurm

Jun 09, 2015

Fix scheduling inconsistency with GRES · e1a00772

Morris Jette authored 9 years ago

1. I submit a first job that uses 1 GPU:
$ srun --gres gpu:1 --pty bash
$ echo $CUDA_VISIBLE_DEVICES
0

2. while the first one is still running, a 2-GPU job asking for 1 task per node
waits (and I don't really understand why):
$ srun --ntasks-per-node=1 --gres=gpu:2 --pty bash
srun: job 2390816 queued and waiting for resources

3. whereas a 2-GPU job requesting 1 core per socket (so just 1 socket) actually
gets GPUs allocated from two different sockets!
$ srun -n 1  --cores-per-socket=1 --gres=gpu:2 -p testk --pty bash
$ echo $CUDA_VISIBLE_DEVICES
1,2

With this change #2 works the same way as #3.
bug 1725

e1a00772

Move definitions into alphabetic order · 5f337d38
Morris Jette authored 9 years ago

5f337d38
Update broken links in webpages · 4a41e4d7
Danny Auble authored 9 years ago

4a41e4d7
Replace /usr/bin with a more managible approach · 321a48b3
Danny Auble authored 9 years ago

321a48b3
Corrections to slurm.conf formatting · b373847e
Morris Jette authored 9 years ago

b373847e
In test4.3 unset the SINFO_FORMAT since it conflicts with the --long · 3fad9df1
David Bigagli authored 9 years ago
```
option.
```
3fad9df1

Jun 05, 2015
- Correct eof/wait logic in a test · 752a33db
  Morris Jette authored 9 years ago
  
  752a33db
- More wording changes in addition to commit 3d11b90f · 619c4372
  Danny Auble authored 9 years ago
  
  619c4372
- update unsubscription methods · 3d11b90f
  Danny Auble authored 9 years ago
  
  3d11b90f
- Revert "Fix issue where command line options were parsed twice in sbatch." · b37004e2
  Danny Auble authored 9 years ago
  
  Only going to do this in the master as it may affect scripts. This reverts commit 454f78e6. Conflicts: NEWS
  b37004e2
- Update gres.conf description of file regular expressions · 22b7f1ad
  Morris Jette authored 9 years ago
  
  bug 1724
  22b7f1ad
- Small typo in expect code for test28.[345]. · e053d6d4
  Nicolas Joly authored 9 years ago
  
  e053d6d4
- Spelling fixes in testsuite. · 6f56f61b
  Nicolas Joly authored 9 years ago
  
  6f56f61b
- Update some LLNL links to SchedMD · ed84f96c
  Morris Jette authored 9 years ago
  
  ed84f96c
Jun 04, 2015
- Partially modify the commit 971d0021 . · 707268a5
  David Bigagli authored 9 years ago
  
  707268a5
- Remove old code. · 05976915
  David Bigagli authored 9 years ago
  
  05976915
- Move around some code to be cleaner · d0f6c4ac
  Morris Jette authored 9 years ago
  
  d0f6c4ac
- fix test if unsufficient resources · 05eadb57
  Veronique Legrand authored 9 years ago
  
  Previously the test would generate an error if the default partition contained less than 3 nodes bug 1720
  05eadb57
- Fix parsing for NetBSD sleep error message · ee72ee8c
  Nicolas Joly authored 9 years ago
  
  ee72ee8c
- Cut and Paste error with variables in If statement · 61ad32e8
  Nancy Kritkausky authored 9 years ago
  
  61ad32e8
- Fix broken build on non Cray. · df6fce57
  David Bigagli authored 9 years ago
  
  df6fce57
- Fix sacctmgr archive loading of older versions. · bf07cfcc
  David Bigagli authored 9 years ago
  
  bf07cfcc
Jun 03, 2015

switch/cray: Refine PMI_CRAY_NO_SMP_ENV set · ef66b2eb

Morris Jette authored 9 years ago

switch/cray: Refine logic to set PMI_CRAY_NO_SMP_ENV environment variable.
Rather than testing for the task distribution option, test the actual
task IDs to see fi they are monotonically increasing across all nodes.
Based upon idea from Brian Gilmer (Cray).

ef66b2eb

Jun 02, 2015
- Fix issue where command line options were parsed twice in sbatch. · 454f78e6
  Danny Auble authored 9 years ago
  
  454f78e6
- Fix issue where sbatch would set ntasks-per-node to 0 making any srun · 9f67ad99
  Danny Auble authored 9 years ago
  
  afterward cause a divide by zero error.
  9f67ad99
- When deleting a job from the system set the job_id to 0 to avoid memory · 0b007678
  Danny Auble authored 9 years ago
  
  corruption if thread uses the pointer basing validity off the id. Bug 1710
  0b007678
Jun 01, 2015
- Update NEWS. · c3383298
  David Bigagli authored 9 years ago
  
  c3383298
- Fix squeue -o %X output to correctly handle NO_VAL and suffix. · 1cee0d58
  Nicolas Joly authored 9 years ago
  
  1cee0d58
- Disable test when only one job can run at a time · 1361c7cf
  Morris Jette authored 9 years ago
  
  Disable test with select/linear and only one node
  1361c7cf
May 30, 2015
- CRAY - Remove libpmi from rpm install · 374f2db9
  Danny Auble authored 9 years ago
  
  374f2db9
May 29, 2015
- Fix race condition where last array task might not get updated in the db. · d95f1ed6
  Brian Christiansen authored 9 years ago
  
  Bug 1495
  d95f1ed6
- select/linear: Correct CPU count · 58623ec7
  Morris Jette authored 9 years ago
  
  Correct count of CPUs allocated to job on system with hyperthreads. The bug was introduced in commit a6d3074d On a system with hyperthreads: srun -n1 --ntasks-per-core=1 hostname you would get: slurmctld: error: job_update_cpu_cnt: cpu_cnt underflow on job_id 67072
  58623ec7
- Fix sreport core dump. · 75339f3e
  David Bigagli authored 9 years ago
  
  75339f3e
- preempt/job_prio plugin: Implement "warm-up" time · f5a8c6fb
  Morris Jette authored 9 years ago
  
  preempt/job_prio plugin: Implement the concept of Warm-up Time here. Use the QoS GraceTime as the amount of time to wait before preempting. Basically, skip preemption if your time is not up.
  f5a8c6fb
- preempt/job_prio plugin: fix for infinite loop · 5c302f8d
  Morris Jette authored 9 years ago
  
  5c302f8d
- Add job sorting order to slurm.conf man page. · 432e2b6b
  Brian Christiansen authored 9 years ago
  
  432e2b6b
- [PATCH] Remove -W option from sbatch/salloc usage string · a507ddcb
  Dorian Krause authored 9 years ago
  
  -W/--wait is only supported by srun and should not show up in the usage string of sbatch or salloc.
  a507ddcb
- Fix issue with incorrect time calculation in the priority plugin when · cc49f09a
  Danny Auble authored 9 years ago
  
  a job runs past it's time limit.
  cc49f09a
- minor fixes for commented out debug statements · ce916dc1
  Danny Auble authored 9 years ago
  
  ce916dc1
May 28, 2015
- Fix squeue -o %m and %d unit conversion to Megabytes. · c18bbab6
  Brian Christiansen authored 9 years ago
  
  Bug 1705
  c18bbab6