Commits · 82807e6b68cecf96d97c7e61b599e9c47256ea96 · tud-zih-energy / Slurm

Jun 16, 2014
- Update META for v14.03.4 tag · 82807e6b
  Morris Jette authored 10 years ago
  
  82807e6b
Jun 14, 2014

Don't reject job on fast-schedule · a8c0b701

jette authored 10 years ago

If FastSchedule=0 is configured and some nodes have not registered
for service (so we do not know their actual resource counts), then
leave the job pending rather than rejecting it without knowing if
it can run later (when the node registers and we know its specs).
bug 872

a8c0b701

Replace RPC number with name in log · 24bec0bf
jette authored 10 years ago

24bec0bf

Jun 13, 2014
- slurmdbd to print message type string on error · 14deba01
  jette authored 10 years ago
  
  Rather than the numeric value
  14deba01
- num2string() to print numeric value for unknown · 874325e1
  jette authored 10 years ago
  
  If a numeric value is not found in the function's table, print that number rather than UNKNOWN.
  874325e1
Jun 12, 2014

scontrol show job report correct CPU_IDs · 04921aa2

Morris Jette authored 10 years ago

For "scontrol --details show job" report the correct CPU_IDs when thre are
multiple threads per core (we are translating a core bitmap to CPU IDs).
This is an enhancement of commit 83d626ca
so the node table is only loaded once for the entire job table.
bug 850

04921aa2

Correct ID of CPUs allocated to job · 83d626ca

Martin Perry authored 10 years ago

Correct the record of CPU_IDs allocated to a job if there is more
than one CPU per core.

83d626ca

Fix job --exclusive option enforcement · f07f19eb

Morris Jette authored 10 years ago

If job requests --exclusive then do not use nodes which have any cores in an
advanced reservation. Also prevents case where nodes can be shared by other
jobs.

f07f19eb

select/cons_res log change · 9ed92aa2

Morris Jette authored 10 years ago

Disable some logging that would be very slow unless
the _DEBUG flag is set in the plugin

9ed92aa2

Honor job exclusive with reserved cores · f5d6bda0

Morris Jette authored 10 years ago

If job requests --exclusive then do not use nodes which have any cores in an
advanced reservation. Previously the job would be allocated all of the cores
outside of the advanced reservation.

f5d6bda0

Fix shared=yes support · c773b750

Morris Jette authored 10 years ago

Correct support for partition with Shared=YES configuration.
Previous logic would share resources for jobs by default
(i.e. if user did not explicitly request --exclusive).
bug 758

c773b750

Correct PTY use FAQ · e4798fcf
Jens Dreger authored 10 years ago

e4798fcf
backfill - minor code re-order · 9b1bbb1a
Morris Jette authored 10 years ago
```
This reording of some code results in cleaner logic
```
9b1bbb1a

sched/backfill performance improvements · abdc4bd3

Morris Jette authored 10 years ago

collapse the scheduling table when possible to reduce the
number of time slots to check for pending jobs. This should
improve performance considerably.

abdc4bd3

sched/backfill: Add more logging · 43f0091e
Morris Jette authored 10 years ago

43f0091e
Modify the description of -E and -S option of sacct command. · 175df013
David Bigagli authored 10 years ago

175df013
Updates to NEWS for v14.03.4 · f5002a22
Morris Jette authored 10 years ago

f5002a22
backfill map build fix · c0151604
Morris Jette authored 10 years ago
```
Previous logic was sometimes building incomplete map
```
c0151604

Jun 11, 2014

Fix slurmstepd core dump. · 561be64f
David Bigagli authored 10 years ago

561be64f
backfill logging improvement · 95ae7150
Morris Jette authored 10 years ago

95ae7150

backfill - continue after failed job start · b493c5dc

Morris Jette authored 10 years ago

When a decision is made to start a job, if for some
reason that job's start failed, the backfill scheduler
would previously just exit. With this change, it logs
the event and reserves the resources expected to be
used and continues down the job queue.

b493c5dc

backfill - improve resource map build · 6c42ef26

Morris Jette authored 10 years ago

This change prevents creation of some back-to-back records with
the same resources, but different times.

6c42ef26

Format improvements · 4439417c
Morris Jette authored 10 years ago
```
No change in logic
```
4439417c

backfill improvements · a5ab240f

Morris Jette authored 10 years ago

Improved logging of backfill scheduling actions
Better handling of backfill_resolution logic to avoid creating
   some records that are not needed
Avoid creating some backfill scheduling maps with zero duration
The net effect should be slightly improved performance with no
   significant difference in action

a5ab240f

Add DebugFlag BackfillMap document · 4c901129

Morris Jette authored 10 years ago

Update slurm.conf man page for DebugFlag BackfillMap. This should be
considered part of commit 3c2bffb6

4c901129

Add DebugFlag of BackfillMap · 3c2bffb6

Morris Jette authored 10 years ago

Add DebugFlag of BackfillMap. Previously a DebugFlag value of Backfill
logged information about what it was doing plus a map of expected resouce
use in the future. Now that very verbose resource use map is only logged
with a DebugFlag value of BackfillMap

3c2bffb6

Backfill logging · 4824daf0

Morris Jette authored 10 years ago

Log not only the count of jobs tested since the last time locks
were released, but also the total job count since the backfill
scheduler started.

4824daf0

Update upgrade documentation · 863ecabe
Morris Jette authored 10 years ago

863ecabe

sched/backfill performance improvements · 1666af5e

Morris Jette authored 10 years ago

Remove duplicate backfill scheduling tests. For example there is
no need to test if a job can be started if the only difference
from the previous test involves nodes in other partitions that
can not be used by the job we are trying to start.

1666af5e

Jun 10, 2014
- Log proper backfill job test time · c19cb29e
  Morris Jette authored 10 years ago
  
  The backfill scheduler was always reporting the time that a job was being considered as NOW rather than the time that was really being considered.
  c19cb29e
- Set the number of free licenses to be 0 if the global license count · 448645b8
  David Bigagli authored 10 years ago
  
  decreases and total is less than in use.
  448645b8
- Fix typo · c069a7ff
  Danny Auble authored 10 years ago
  
  c069a7ff
- Correction for MaxCPUsPerNode logic · 51a192f8
  James authored 10 years ago
  
  This is a correction to commit 308b7432
  51a192f8
- Improve slurmd/stepd error msg · 9c32bd41
  Morris Jette authored 10 years ago
  
  Improve how failures in slurmd/slurmstepd communications are logged.
  9c32bd41
Jun 09, 2014
- Print job array ID in email notifications · 58050b1f
  Morris Jette authored 10 years ago
  
  mail messages for job array events print now use the job ID using the format "#_# (#)" rather than just the internal job ID.
  58050b1f
- Correct commit bab22e4f . Allow requeue hold of completing jobs. · a9c1c8e5
  David Bigagli authored 10 years ago
  
  a9c1c8e5
- Test for duplicate job state files · 10aec843
  Morris Jette authored 10 years ago
  
  This will help limit damage from two active primary slurmctld (split brain problem).
  10aec843
Jun 07, 2014
- When reconfiguring the controller don't restart the slurmctld epilog if · 59b1fdb5
  David Bigagli authored 10 years ago
  
  it is already running.
  59b1fdb5
- Fix to strigger test · 6480d487
  Morris Jette authored 10 years ago
  
  Duplicate triggers are not not allowed
  6480d487
- Fix test to work with job profiling · 70f251ff
  Morris Jette authored 10 years ago
  
  Job profiling leaves a file open
  70f251ff