Commits · e1dd4845f96c2847fd6f1c94fd73917da32578b8 · tud-zih-energy / Slurm

Jun 01, 2017

Merge remote-tracking branch 'origin/slurm-17.02' · e1dd4845
Danny Auble authored 7 years ago
```
# Conflicts:
#	src/slurmctld/job_mgr.c
```
e1dd4845
Add wall-time to seff output · eeebf5c8
Pablo Escobar authored 7 years ago
```
bug 3846
```
eeebf5c8
Note why we use list_dequeue here. · dd0f7e4e
Danny Auble authored 7 years ago

dd0f7e4e
While not needed, put a list_destroy function in the creation of · 7d94f6d3
Danny Auble authored 7 years ago
```
purge_files_list.
```
7d94f6d3
Put in the word 'extern' for consistancy sake. · 6c45c680
Danny Auble authored 7 years ago

6c45c680

Handle file deletion for purge_old_job() in a separate thread. · b9719be2

Tim Wickberg authored 7 years ago

File deletion can be slow, especially when StateSaveLocation in on
NFS or other network filesystems. Since purge_old_job() holds all
the slurmctld write locks, this is especially performance sensitive.

Moving this to an independent thread lets the slower filesystem
cleanup happen without owning these locks. purge_old_job() then
results in the purged job ids being queued in the purge_list.

A race with the job id potentially wrapping around again is already
prevented by _dup_job_file_test() in get_next_job_id().

Bug 3763.

b9719be2

Make _delete_job_details a static function. · ce2cd1b2
Tim Wickberg authored 7 years ago
```
Only called from _list_delete_job once the MinJobAge has
passed.
```
ce2cd1b2

Remove timeout code from job_purge_old. · 843e5d38

Tim Wickberg authored 7 years ago

This will need to be handled differently. The timeout can
lead to the purge process falling further and further behind
on high throughput systems if the number of job scripts that
can be deleted within a second is lower than the job submission
and completion rate of the cluster, eventually leading to
the MaxJobCount limit being reached.

Bug 3763.

843e5d38

Better commit from last · cff4e661
Danny Auble authored 7 years ago

cff4e661
Update test to print warning about rxvt/aterm ignoring SIGFPE. · 3b20bc1d
Danny Auble authored 7 years ago

3b20bc1d

May 31, 2017
- Fix test to print out the slurmd node name instead of gethostname so · 9f2b3ffe
  Danny Auble authored 7 years ago
  
  it works better on multi-slurmd installs.
  9f2b3ffe
- Fix test · a036fa5a
  Isaac Hartung authored 7 years ago
  
  Should be fed1,2,3 and not fed1,2,2
  a036fa5a
- Add test to verify federated scontrol notify · b06726b5
  Isaac Hartung authored 7 years ago
  
  Bug 3839
  b06726b5
- Docs - clarify performance issues from ConstrainRAMSpace. · 2e833147
  Tim Wickberg authored 7 years ago
  
  Revert some of my b50f4661. Elaborate on tradeoffs, and point to HTC page as well which is a better location for this info.
  2e833147
- Add warning about libcurl-devel not being installed during configure. · 0e582365
  Danny Auble authored 7 years ago
  
  0e582365
- Merge branch 'fed_async' · 4a62dac3
  Brian Christiansen authored 7 years ago
  
  4a62dac3
- Docs - remove reference to ConstrainRAMSpace in HTC. · b50f4661
  Tim Wickberg authored 7 years ago
  
  This is better discussed in the high_throughput.shtml doc. Also, "Contrain" is misspelled adding to the confusion.
  b50f4661
- Enable --cluster-constraints=!<features> · 02c126ac
  Isaac Hartung authored 7 years ago
  
  To submit sibling jobs to clusters that don't have the specified features. Bug 3859
  02c126ac
- Change test to use cluster variables · d6a3ee07
  Isaac Hartung authored 7 years ago
  
  d6a3ee07
- Fix spelling · ce91d533
  Brian Christiansen authored 7 years ago
  
  ce91d533
- test37.7 to test local jobs after cluster removed · be7ff6c6
  Isaac Hartung authored 7 years ago
  
  Bug 3640
  be7ff6c6
- Update test37.7 with fed drain/remove updates · d917d5e9
  Brian Christiansen authored 7 years ago
  
  d917d5e9
- Improve process of draining fed cluster · cfb7a484
  Brian Christiansen authored 7 years ago
  
  Instead of waiting for all jobs to clear out of the system, wait until all jobs are completed. This helps so that you don't have to wait as long for the cluster to be drain and or removed.
  cfb7a484
- Fix fed communications to talk on siblings proto · 398c4d9f
  Brian Christiansen authored 7 years ago
  
  Clusters in the federation could be different rpc_versions so each cluster needs to talk each other's language.
  398c4d9f
- Make scontrol notify work in a federation · 7bebc4f8
  Brian Christiansen authored 7 years ago
  
  Routes request to origin cluster if it isn't running the job.
  7bebc4f8
- Pull locks out of function · 2c924ef3
  Brian Christiansen authored 7 years ago
  
  to give more flexibility.
  2c924ef3
- Route requeue message to origin cluster · 472d52b5
  Brian Christiansen authored 7 years ago
  
  if a federated job.
  472d52b5
- Extract common func to determine routing to origin · 51b0d55c
  Brian Christiansen authored 7 years ago
  
  51b0d55c
- Move rerouting of msg to deeper level · ba46ffbe
  Brian Christiansen authored 7 years ago
  
  Move from slurm_send_recv_controller_rc_msg to slurm_send_recv_controller_msg. This allows scontrol requeue to be rerouted to the origin.
  ba46ffbe
- Remove outdated comments · db7d7507
  Brian Christiansen authored 7 years ago
  
  db7d7507
- Add test37.13 for validating federated arrays · 45a2e65a
  Brian Christiansen authored 7 years ago
  
  45a2e65a
- Spelling fix. · de0061b6
  Tim Wickberg authored 7 years ago
  
  de0061b6
- Merge branch 'slurm-17.02' · 816d0d60
  Tim Wickberg authored 7 years ago
  
  816d0d60
- Prevent segfault in sacctmgr due to bad handling of return code. · 15276c01
  Tim Shaw authored 7 years ago
  
  Bug 3840.
  15276c01
- Fix NEWS line from commit 56ea068c . · 1503bdcc
  Tim Shaw authored 7 years ago
  
  1503bdcc
- Fix another build issue with elasticsearch. · 5940a5f8
  Tim Wickberg authored 7 years ago
  
  Another issue inadvertently caused by 8d10ebf5.
  5940a5f8
- Remove used variable. · 266725ce
  Tim Wickberg authored 7 years ago
  
  Fixes build for elasticsearch plugin, inadvertently broken by commit 8d10ebf5.
  266725ce
May 30, 2017
- Merge branch 'slurm-17.02' · a2344915
  Morris Jette authored 7 years ago
  
  a2344915
- don't clear GRES from non-KNL node · 56ea068c
  Tim Shaw authored 7 years ago
  
  node_featurs/knl_cray plugin: Don't clear configured GRES from non-KNL node. bug 3768
  56ea068c
- Cosmetic changes to cray/ccm driver · b1d03680
  Morris Jette authored 7 years ago
  
  b1d03680