Commits · 097e60937b012362b017f137de40f9caed10f3f9 · tud-zih-energy / Slurm

Jul 02, 2015
- Add assoc usage to cache info dump · 35d2edeb
  Morris Jette authored 9 years ago
  
  Add association usage information to "scontrol show cache" command output.
  35d2edeb
Jul 01, 2015

Show job in sacct when step's cpus are different from job allocation. · 0f8e7338

When submitting a job with srun -n# the job may be allocated more than # because
the job was given the whole core or socket (eg. CR_CORE, CR_SOCKET). sacct
showed only what the step used and not the allocation. This commit shows the job
and the step if job and step cpus are different.

0f8e7338

Add TRES support to sreport command · b860ed8e

Morris Jette authored 9 years ago

Major re-write of the sreport command to support --tres job option
and permit users to select specific tracable resources to generate
reports for. For most reports, each TRES is listed on a separate
line of output with its name. The default TRES type is "cpu" to
minimize changes to output.

b860ed8e

Jun 30, 2015
- Display error message when attempting to modify priority of a held job. · d798caa9
  Thomas Cadeau authored 9 years ago
  
  Bug 1745
  d798caa9
- Revert "Display error message when attempting to modify priority of a held job." · 36cb918c
  Brian Christiansen authored 9 years ago
  
  This reverts commit 3f91f4b2.
  36cb918c
Jun 29, 2015
- Display error message when attempting to modify priority of a held job. · 3f91f4b2
  Nathan Yee authored 9 years ago
  
  Bug 1745
  3f91f4b2
- Add partition information to sshare. · ea95512d
  David Bigagli authored 9 years ago
  
  ea95512d
Jun 26, 2015
- Update NEWS and RELEASE_NOTES about HDF5 and sh5util mods. · 6c768f2e
  Danny Auble authored 9 years ago
  
  6c768f2e
- Add db index on assoc_table.acct. · 712122f9
  Brian Christiansen authored 9 years ago
  
  Bug 1746
  712122f9
Jun 25, 2015
- Add NEWS describing previous commit · 2c727724
  Morris Jette authored 9 years ago
  
  2c727724
- Clarify a NEWS item · 83ed9780
  Morris Jette authored 9 years ago
  
  83ed9780
Jun 24, 2015
- Fix core dump. · 7b99dcd0
  David Bigagli authored 9 years ago
  
  7b99dcd0
- BGQ: Disable advanced reservation "REPLACE" option · 6e491256
  Morris Jette authored 9 years ago
  
  6e491256
Jun 23, 2015
- Set the totalview_stepid to the value of the job step instead of NO_VAL. · 5456f107
  David Bigagli authored 9 years ago
  
  5456f107
Jun 22, 2015

Advanced reservation fixes · a6454176

Morris Jette authored 9 years ago

Updates of existing bluegene advanced reservations did not work at all.
Some multi-core configurations resulting in an abort due to creating
  core_bitmaps for the reservation that only had one bit per node rather
  than one bit per core.
These bugs were introduced in commit 5f258072

a6454176

Update NEWS · c8545598
David Bigagli authored 9 years ago

c8545598
Update NEWS · 38007f9b
David Bigagli authored 9 years ago

38007f9b

Jun 19, 2015
- Fix squeue to print according to the man page. · 2973524c
  David Bigagli authored 9 years ago
  
  2973524c
Jun 15, 2015

Prevent abort on update of license-only reservation · 50deadb4

Morris Jette authored 9 years ago

Logic was assuming the reservation had a node bitmap which was
being used to check for overlapping jobs. If there is no node
bitmap (e.g. a licenses only reservation), an abort would result.

50deadb4

Jun 12, 2015
- Set job's reason to BadConstaints when job can't run on any node. · 475988f5
  Brian Christiansen authored 9 years ago
  
  Bug 1739
  475988f5
- Remove TICKET_BASED fairshare. · d540af5b
  Brian Christiansen authored 9 years ago
  
  Bug 1743
  d540af5b
- Deprecated TICKET_BASED fairshare. · c3a30337
  Brian Christiansen authored 9 years ago
  
  Bug 1743
  c3a30337
Jun 11, 2015
- Use correct slurmd spooldir when creating cpu-frequency locks. · 9d20cf02
  Brian Christiansen authored 9 years ago
  
  Bug 1733
  9d20cf02
Jun 10, 2015
- Add NEWS for last commit · 30e50e6c
  Morris Jette authored 9 years ago
  
  30e50e6c
Jun 09, 2015

Search for user in all groups · 93ead71a
David Bigagli authored 9 years ago

93ead71a

Fix scheduling inconsistency with GRES · e1a00772

Morris Jette authored 9 years ago

1. I submit a first job that uses 1 GPU:
$ srun --gres gpu:1 --pty bash
$ echo $CUDA_VISIBLE_DEVICES
0

2. while the first one is still running, a 2-GPU job asking for 1 task per node
waits (and I don't really understand why):
$ srun --ntasks-per-node=1 --gres=gpu:2 --pty bash
srun: job 2390816 queued and waiting for resources

3. whereas a 2-GPU job requesting 1 core per socket (so just 1 socket) actually
gets GPUs allocated from two different sockets!
$ srun -n 1  --cores-per-socket=1 --gres=gpu:2 -p testk --pty bash
$ echo $CUDA_VISIBLE_DEVICES
1,2

With this change #2 works the same way as #3.
bug 1725

e1a00772

Enable backup controller on external Cray node with Native Slurm. · 5671bde2
Brian Christiansen authored 9 years ago
```
Bug 1572
```
5671bde2
Add no_backup_scheduling SchedulerParameter. · f9d132fc
Brian Christiansen authored 9 years ago
```
Bug 1572
```
f9d132fc

Jun 05, 2015
- Make it so info options for srun/salloc/sbatch print with just 1 -v instead · 6e02b5c3
  Danny Auble authored 9 years ago
  
  of 4.
  6e02b5c3
- Revert "Fix issue where command line options were parsed twice in sbatch." · b37004e2
  Danny Auble authored 9 years ago
  
  Only going to do this in the master as it may affect scripts. This reverts commit 454f78e6. Conflicts: NEWS
  b37004e2
Jun 04, 2015
- Partially modify the commit 971d0021 . · 707268a5
  David Bigagli authored 9 years ago
  
  707268a5
- Fix sacctmgr archive loading of older versions. · bf07cfcc
  David Bigagli authored 9 years ago
  
  bf07cfcc
Jun 03, 2015

switch/cray: Refine PMI_CRAY_NO_SMP_ENV set · ef66b2eb

Morris Jette authored 9 years ago

switch/cray: Refine logic to set PMI_CRAY_NO_SMP_ENV environment variable.
Rather than testing for the task distribution option, test the actual
task IDs to see fi they are monotonically increasing across all nodes.
Based upon idea from Brian Gilmer (Cray).

ef66b2eb

Add srun --accel-bind option · 4d3726b2

Morris Jette authored 9 years ago

Add srun --accel-bind option to control how tasks are bound to GPUs and NIC
    Generic RESources (GRES).
Based in part upon work by Matthieu Ospici (ATOS).
gres/nic plugin modified to set OMPI_MCA_btl_openib_if_include environment
    variable based upon allocated devices (usable with OpenMPI and Melanox).
Reset GRES env vars after task affinity set

4d3726b2

Jun 02, 2015
- Fix issue where command line options were parsed twice in sbatch. · 454f78e6
  Danny Auble authored 9 years ago
  
  454f78e6
- Fix issue where sbatch would set ntasks-per-node to 0 making any srun · 9f67ad99
  Danny Auble authored 9 years ago
  
  afterward cause a divide by zero error.
  9f67ad99
- When deleting a job from the system set the job_id to 0 to avoid memory · 0b007678
  Danny Auble authored 9 years ago
  
  corruption if thread uses the pointer basing validity off the id. Bug 1710
  0b007678
Jun 01, 2015

Improve front-end selection for interactive jobs · 310c3f98

Morris Jette authored 9 years ago

If an salloc or srun command is executed on a "front-end" configuration,
that job will be assigned a slurmd shepherd daemon on the same host as used
to execute the command when possible rather than an slurmd daemon on an
arbitrary front-end node.

310c3f98

Update NEWS. · c3383298
David Bigagli authored 9 years ago

c3383298

May 30, 2015
- CRAY - Remove libpmi from rpm install · 374f2db9
  Danny Auble authored 9 years ago
  
  374f2db9