Commits · e17ffc1ba5602d4601f84d124b948a923ec5543d · tud-zih-energy / Slurm

May 12, 2014
- Cosmetic mods to NEWS · e17ffc1b
  Morris Jette authored 10 years ago
  
  e17ffc1b
- Fix support for job --profile=none option · 043e1b08
  Puenlap Lee authored 10 years ago
  
  Also correct related documentation
  043e1b08
- fix of comp nodes causing backfill to end early · d508ea95
  Hongjia Cao authored 10 years ago
  
  Completing nodes is removed when calling _try_sched() for a job, which is the case in select_nodes(). If _try_sched() thinks the job can run now but select_nodes() returns ESLURM_NODES_BUSY, the backfill loop will be ended.
  d508ea95
May 09, 2014
- CRAY - make job_container/cncu default when running on a Cray natively · dbf03e40
  Danny Auble authored 10 years ago
  
  dbf03e40
- If an invalid assoc_ptr comes in don't use the id to verify it. · 2261d393
  Danny Auble authored 10 years ago
  
  2261d393
May 08, 2014

Fix sinfo -R to print each node once · b5ace9a8

Morris Jette authored 10 years ago

Fix sinfo -R to print each down/drained node once, rather than once per
partition. This was broken in the sinfo change to process each partition's
information in a separate pthread.

b5ace9a8

Correct sinfo sort fields options · ff518ad1

Morris Jette authored 10 years ago

Correct sinfo --sort fields to match documentation: E => Reason,
H -> Reason Time (new), R -> Partition Name, u/U -> Reason user (new)

ff518ad1

May 07, 2014
- enforce job preemption GraceTime · b8d55249
  Morris Jette authored 10 years ago
  
  Without this patch, jobs with an infinite time limit would have their preemption GraceTime ignored.
  b8d55249
- Disable time limit reset for job being preempted · 52de11ac
  Morris Jette authored 10 years ago
  
  related to bug 789
  52de11ac
- CRAY - make switch/cray default when running on a Cray natively · 1c2200db
  Danny Auble authored 10 years ago
  
  1c2200db
- Fix issue where not enforcing QOS but a partition either allows or denies · b6333a12
  Danny Auble authored 10 years ago
  
  them.
  b6333a12
May 06, 2014
- Start NEWS for v14.03.4 · b4f3f38d
  Morris Jette authored 10 years ago
  
  b4f3f38d
- update news for tag · 70d1e809
  Danny Auble authored 10 years ago
  
  70d1e809
- BGQ - Fix issue with uninitialized variable. · 950a3fd6
  Danny Auble authored 10 years ago
  
  950a3fd6
- Start NEWS for v14.03.4 · 3e95dc32
  Morris Jette authored 10 years ago
  
  3e95dc32
- in slurm.spec, remove cray-mysql-devel requirement · f85e362c
  Morris Jette authored 10 years ago
  
  In slurm.spec file, replace "Requires cray-MySQL-devel-enterprise" with "Requires mysql-devel" per David Gloe.
  f85e362c
May 05, 2014
- Fix perlapi to compile correctly with perl 5.18 · 21ebf585
  Danny Auble authored 10 years ago
  
  21ebf585
- Handle node ranges better when dealing with accounting max node limits. · d849aadb
  Danny Auble authored 10 years ago
  
  d849aadb
- BGQ - Move code to only start job on a block after limits are checked. · 3a4246cc
  Danny Auble authored 10 years ago
  
  Related to bug 771
  3a4246cc
- Correct default batch job output file name · 4334ab7d
  Morris Jette authored 10 years ago
  
  In version 14.03.2 was using "slurm_<jobid>_4294967294.out" due to error in job array logic.
  4334ab7d
- BGQ - Fix issue where limits were checked on midplane counts instead of · 836b654f
  Danny Auble authored 10 years ago
  
  cnode counts.
  836b654f
May 02, 2014
- Update NEWS for next version · 87080f15
  Danny Auble authored 10 years ago
  
  87080f15
- Handle node ranges better when dealing with accounting max node limits. · c6833796
  Danny Auble authored 10 years ago
  
  This is for bug 775
  c6833796
- BGQ - Temp fix issue where job could be left on job_list after it finished. · e4f1a099
  Danny Auble authored 10 years ago
  
  e4f1a099
- Fix issue where user is requesting --acctg-freq=0 and no memory limits. · 17e4e2ac
  Danny Auble authored 10 years ago
  
  17e4e2ac
May 01, 2014
- Fix allowgroup on bad group seg fault with the controller. · 76846134
  Danny Auble authored 10 years ago
  
  regression from 2a674aee
  76846134
- Temporary fix for handling our typemap for the perl api with newer perl. · bffdc7e2
  Danny Auble authored 10 years ago
  
  bffdc7e2
- Fix issue with GrpCPURunMins if a job's timelimit is altered while the job · 98de72e4
  Danny Auble authored 10 years ago
  
  is running.
  98de72e4
- Fix issue where user is requesting --acctg-freq=0 and no memory limits. · 0018cdf4
  Danny Auble authored 10 years ago
  
  0018cdf4
Apr 30, 2014

Correct squeue to not merge jobs with state pending and completing · 8ddadea5
David Bigagli authored 10 years ago
```
together.
```
8ddadea5

switch/nrt - CAU and RMDA tracking correction · 6f66fdef

Morris Jette authored 10 years ago

Switch/nrt - Properly track usage of CAU and RDMA resources with multiple
tasks per compute node. Previous logic would allocate resources once per
task and then deallocate once per node, leaking CMA and RDMA resources
and preventing their use by future jobs.

6f66fdef

ignore prio reset on held jobs · cbcea672

Morris Jette authored 10 years ago

If a job is held, then only release it with the "scontrol release <jobid>"
command rather than a simple reset of the job's priority. This is needed to
support job arrays better. Otherwise a priority reset of a job array
would free all requeued/held jobs from that job array rather than
leaving them held.

cbcea672

Apr 28, 2014
- Fix segfault of sacct -c if spaces are in the variables. · 61641594
  Danny Auble authored 10 years ago
  
  61641594
- Fix sacct -c when using jobcomp/filetxt to read variables that were added · d6ab20b7
  Danny Auble authored 10 years ago
  
  in 2.0 :)
  d6ab20b7
- Honor partition priorities over job priorities. · b36f83cf
  Morris Jette authored 10 years ago
  
  Previously partition priority was only considered when used as a component of a job's priority with the priority/multifactor plugin. Now the partition priority is considered first, as documented, and the job priority is considered second. bug 764
  b36f83cf
Apr 26, 2014

Add --priority to job submit commands · 71aca8a8
Stuart Midgley authored 10 years ago
```
Add --priority option to the salloc, sbatch and srun commands.
```
71aca8a8

Handle Max/GrpCPU limits better · eb6f2321

Danny Auble authored 10 years ago

This code was originally put here to enforce checks to make sure jobs
didn't go over the limit.  If they didn't request the amount then we set
the limit and worked off that as if it were a request.

If we do this now we could get jobs deigned which would cancel the job at
submit with a very unrelated note as to why the job failed.

Since we now check this these limits after the node selection this isn't
needed.

eb6f2321

Handle when trying to cancel a step that hasn't started yet better. · 900e5565
Danny Auble authored 10 years ago

900e5565

Apr 25, 2014

scontrol to hold/release job by name · 6eaeb85c

Morris Jette authored 10 years ago

In addition to accepting a job ID argument to the hold and release
commands also accept a job name (e.g. "scontrol hold my.bash")

6eaeb85c

Add job exit state and code to email · e8dce1a7
Morris Jette authored 10 years ago
```
Add a job's exit state (COMPLETED, FAILED, etc) and exit code to
email message
bug 737
```
e8dce1a7