- Jun 09, 2015
-
-
Morris Jette authored
1. I submit a first job that uses 1 GPU: $ srun --gres gpu:1 --pty bash $ echo $CUDA_VISIBLE_DEVICES 0 2. while the first one is still running, a 2-GPU job asking for 1 task per node waits (and I don't really understand why): $ srun --ntasks-per-node=1 --gres=gpu:2 --pty bash srun: job 2390816 queued and waiting for resources 3. whereas a 2-GPU job requesting 1 core per socket (so just 1 socket) actually gets GPUs allocated from two different sockets! $ srun -n 1 --cores-per-socket=1 --gres=gpu:2 -p testk --pty bash $ echo $CUDA_VISIBLE_DEVICES 1,2 With this change #2 works the same way as #3. bug 1725
-
Morris Jette authored
-
Danny Auble authored
-
Danny Auble authored
-
Morris Jette authored
-
David Bigagli authored
option.
-
- Jun 05, 2015
-
-
Morris Jette authored
-
Danny Auble authored
-
Danny Auble authored
-
Danny Auble authored
Only going to do this in the master as it may affect scripts. This reverts commit 454f78e6. Conflicts: NEWS
-
Morris Jette authored
bug 1724
-
Nicolas Joly authored
-
Nicolas Joly authored
-
Morris Jette authored
-
- Jun 04, 2015
-
-
David Bigagli authored
-
David Bigagli authored
-
Morris Jette authored
-
Veronique Legrand authored
Previously the test would generate an error if the default partition contained less than 3 nodes bug 1720
-
Nicolas Joly authored
-
Nancy Kritkausky authored
-
David Bigagli authored
-
David Bigagli authored
-
- Jun 03, 2015
-
-
Morris Jette authored
switch/cray: Refine logic to set PMI_CRAY_NO_SMP_ENV environment variable. Rather than testing for the task distribution option, test the actual task IDs to see fi they are monotonically increasing across all nodes. Based upon idea from Brian Gilmer (Cray).
-
- Jun 02, 2015
-
-
Danny Auble authored
-
Danny Auble authored
afterward cause a divide by zero error.
-
Danny Auble authored
corruption if thread uses the pointer basing validity off the id. Bug 1710
-
- Jun 01, 2015
-
-
David Bigagli authored
-
Nicolas Joly authored
-
Morris Jette authored
Disable test with select/linear and only one node
-
- May 30, 2015
-
-
Danny Auble authored
-
- May 29, 2015
-
-
Brian Christiansen authored
Bug 1495
-
Morris Jette authored
Correct count of CPUs allocated to job on system with hyperthreads. The bug was introduced in commit a6d3074d On a system with hyperthreads: srun -n1 --ntasks-per-core=1 hostname you would get: slurmctld: error: job_update_cpu_cnt: cpu_cnt underflow on job_id 67072
-
David Bigagli authored
-
Morris Jette authored
preempt/job_prio plugin: Implement the concept of Warm-up Time here. Use the QoS GraceTime as the amount of time to wait before preempting. Basically, skip preemption if your time is not up.
-
Morris Jette authored
-
Brian Christiansen authored
-
Dorian Krause authored
-W/--wait is only supported by srun and should not show up in the usage string of sbatch or salloc.
-
Danny Auble authored
a job runs past it's time limit.
-
Danny Auble authored
-
- May 28, 2015
-
-
Brian Christiansen authored
Bug 1705
-