Commits · 4bbb8ac500ea35e46e41c5c302dac849c205295b · tud-zih-energy / Slurm

Jan 23, 2017

Make it so the archive files have the table we used instead of just a canned string. · 4bbb8ac5
Danny Auble authored 8 years ago

4bbb8ac5

Add new knl.conf parameters to capmc drivers · 0eea2c3d

Morris Jette authored 8 years ago

Add new knl.conf parameter to the capmc_suspend and capmc_resume
  programs. They are not used by those programs, but we need to
  prevent an error if those new parameters are used.

0eea2c3d

Merge branch 'slurm-16.05' · a692d9c7
Morris Jette authored 8 years ago

a692d9c7

For batch step, reset job memory after node boot · 0277629b

Morris Jette authored 8 years ago

Reset a job's memory limit based upon what's available after node
  reboot, which can change on a KNL if the MCDRAM mode is changes
  on reboot

0277629b

Fix for backfill launch job with reboot · d72b13f2

Morris Jette authored 8 years ago

This bug was likely the root cause of bug 3366. If the backfill scheduler
  allocates resources for a batch job and a node reboot is required, the
  batch launch RPC would be sent to the agent. At that point, there is a
  race condition between the agent and the job_time_limit() function
  testing for boot completion. If the job_time_limit() function ran
  first, it would trigger a second launch RPC request getting sent to
  the agent.
bug 3366

d72b13f2

Cleaner job configuring logic · f9804256
Morris Jette authored 8 years ago
```
Clean up logic to test if job is configuring
bug 3366
```
f9804256

Avoid launching batch step while configuring · e3a7bdcc

Morris Jette authored 8 years ago

Do not launch a batch step while the job is configuring. Previous
  logic checked for the PrologSlurmctld running, but not nodes
  booting. Checking the job's CONFIGURING state flag will validate
  both.
bug 3366

e3a7bdcc

Avoid duplicate configuration complete logic · db6acb8f

Morris Jette authored 8 years ago

Add check to avoid step allocation logic from executing job
  configuration completion logic multiple times (check if job
  is configurating before clearing flag and resetting time limit).
bug 3366

db6acb8f

Fix test1.62 warning message · 911eaf52
Brian Christiansen authored 8 years ago

911eaf52

fix slurmctld/agent race condition · 53784477

Morris Jette authored 8 years ago

slurmctld/agent race condition fix: Prevent job launch while PrologSlurmctld
    daemon is running or node boot in progress.
bug 3366

53784477

job write lock added to agent_retry() · 379007b8
Morris Jette authored 8 years ago
```
This is required to manage the configuration completion.
bug 3366
```
379007b8
Move agent_retry to separate pthread · ce9a2d79
Morris Jette authored 8 years ago
```
This will be required to lock the job structure
bug 3366
```
ce9a2d79

Remove return value from agent_retry() · bb94c6ce

Morris Jette authored 8 years ago

Remove the return value from the agent_retry() function. It is not
  used anywhere and needs to be removed to run as a pthread.
bug 3366

bb94c6ce

Jan 21, 2017
- Merge branch 'slurm-16.05' · 04500fad
  Tim Wickberg authored 8 years ago
  
  04500fad
- Merge branch 'slurm-15.08' into slurm-16.05 · b16e03f0
  Tim Wickberg authored 8 years ago
  
  b16e03f0
- Testsuite - speed up by a minute. · dca5cb3f
  Tim Wickberg authored 8 years ago
  
  Reasonable NFS systems do not need a minute to propagate changes.
  dca5cb3f
Jan 20, 2017
- Merge remote-tracking branch 'origin/slurm-16.05' · ddf663de
  Brian Christiansen authored 8 years ago
  
  ddf663de
- Merge branch 'federation' · 299e8d0f
  Brian Christiansen authored 8 years ago
  
  299e8d0f
- Update NEWS · 44cadf60
  Brian Christiansen authored 8 years ago
  
  44cadf60
- Enable federated interactive jobs · 4874f988
  Brian Christiansen authored 8 years ago
  
  4874f988
- Remove old and unnescceary check for V2.2 · 493a4dc6
  Brian Christiansen authored 8 years ago
  
  493a4dc6
- Rename RPC SIB_JOB_REVOKE to SIB_JOB_COMPLETE · 03782571
  Brian Christiansen authored 8 years ago
  
  03782571
- Add extra null check. · 41595dbe
  Brian Christiansen authored 8 years ago
  
  41595dbe
- Send fed job completes when a partition is deleted · 7aabeeb6
  Brian Christiansen authored 8 years ago
  
  7aabeeb6
- Add test37.5 to federated requeue · b5b81c1b
  Brian Christiansen authored 8 years ago
  
  b5b81c1b
- Enable canceling fed jobs from origin cluster · 15ce1cbf
  Brian Christiansen authored 8 years ago
  
  15ce1cbf
- Remove squeue --fedtrack option · 832a0118
  Brian Christiansen authored 8 years ago
  
  In favor of just using the -a option to show the tracking federated jobs. This allows scontrol -a show jobs to show the tracking jobs as well.
  832a0118
- Add federation job requeueing · 9cd13bb5
  Brian Christiansen authored 8 years ago
  
  9cd13bb5
- Change job_hold_requeue to return a bool · 8b3edd5f
  Brian Christiansen authored 8 years ago
  
  to indicate wheter the job was requeue held or not. This enables the federation to trigger off whether the job was requeue held or not.
  8b3edd5f
- Add KILL_FED_REQUEUE flag to KILL_* flags · 10595c92
  Brian Christiansen authored 8 years ago
  
  So that the origin job tell a remote cluster to cancel the job but mark the job as requeued in the database. See note about the KILL_* flags actually using 12bits instead of noted 8bits.
  10595c92
- Allow non-origin jobs to purge before minjobage · 285b5cdd
  Brian Christiansen authored 8 years ago
  
  285b5cdd
- Make comments on one line · 17917228
  Brian Christiansen authored 8 years ago
  
  17917228
- Fix memory leak. · 23a98db4
  Brian Christiansen authored 8 years ago
  
  Follows pattern from c5ace562
  23a98db4
- Add comment · 7f88c9c2
  Brian Christiansen authored 8 years ago
  
  7f88c9c2
- Fix comment · fb66df28
  Brian Christiansen authored 8 years ago
  
  fb66df28
- Change info's to debug's · 846656b4
  Brian Christiansen authored 8 years ago
  
  846656b4
- Requeue completing jobs in db · 4f74ad06
  Brian Christiansen authored 8 years ago
  
  If a job was requeued while in the completing state, the database wasn't being updated with the requeue state.
  4f74ad06
- Add submitted clusters to job_record · d0bf5ed8
  Brian Christiansen authored 8 years ago
  
  When a fed job is requeued, it needs to be requeued to clusters that it was submittted to.
  d0bf5ed8
- Put restart_cnt on job_record · b1793f92
  Brian Christiansen authored 8 years ago
  
  When the a fed job is requeued and new siblings are submitted to the other siblings, the restart_cnt needs to go to the siblings in case the job runs on a remote sibling.
  b1793f92
- Make copy_job_record_to_job_desc extern accessible · a8c75742
  Brian Christiansen authored 8 years ago
  
  The federation needs to make a job_desc when requeueing jobs to siblings.
  a8c75742