Commits · 9b81dec42b6202521741e32ea1b92a90d83ad4ef · tud-zih-energy / Slurm

Jul 21, 2016
- display job array tasks in x-y:z format only when is more then 5 tasks · 9b81dec4
  Dominik Bartkiewicz authored 8 years ago
  
  bug 2837
  9b81dec4
- Merge branch 'slurm-16.05' · 02d092b5
  Morris Jette authored 8 years ago
  
  02d092b5
- Add SLURM_ARRAY_TASK_COUNT env var · 8dede1aa
  Morris Jette authored 8 years ago
  
  Add SLURM_ARRAY_TASK_COUNT environment variable. Total number of tasks in a job array (e.g. "--array=2,4,8" will set SLURM_ARRAY_TASK_COUNT=3). bug 2150
  8dede1aa
- Add README info about some contribs files · a307e1ae
  Morris Jette authored 8 years ago
  
  a307e1ae
- Merge branch 'slurm-16.05' · 71ccb1e3
  Morris Jette authored 8 years ago
  
  71ccb1e3
- Treat invalid user ID in AllowUserBoot as error · 59e66700
  Morris Jette authored 8 years ago
  
  Treat invalid user ID in AllowUserBoot option of knl.conf file as error rather than fatal (log and do not exit).
  59e66700
Jul 20, 2016
- Merge branch 'slurm-16.05' · f91d135b
  Morris Jette authored 8 years ago
  
  f91d135b
- Prevent slurmctld abort on kill of job waiting node reboot · 1aa7af7d
  Morris Jette authored 8 years ago
  
  Prevent slurmctld abort if job is killed or requeued while waiting for reboot of its allocated compute nodes. The _wait_boot() would reference job_ptr->node_bitmap, which would be NULL.
  1aa7af7d
- Merge remote-tracking branch 'origin/slurm-16.05' · 5a6488d2
  Danny Auble authored 8 years ago
  
  5a6488d2
- Fixed race condition in PMIx Fence logic · cf6733be
  Boris Karasev authored 8 years ago
  
  Bug 2908
  cf6733be
- Continuation of commit 65b4f283 · 71ddc0a5
  Danny Auble authored 8 years ago
  
  71ddc0a5
- Prevent segfault when attempting to cleanup a SLURM_PENDING_STEP. · 3b914e5b
  Tim Wickberg authored 8 years ago
  
  Step hasn't been assigned resources, so the select_jobinfo struct hasn't yet been populated. Calling select_g_step_finish will dereference causing a segfault. Bug 2922.
  3b914e5b
- Add burst buffer job array test · 7dd26078
  Morris Jette authored 8 years ago
  
  7dd26078
- Merge branch 'slurm-16.05' · 74cd7acc
  Morris Jette authored 8 years ago
  
  74cd7acc
Jul 19, 2016

Add routing queue info to Slurm FAQ web page · f88119ff
Morris Jette authored 8 years ago

f88119ff

Show running jobs from all to be deleted clusters · d04f2d21

Allow "sacctmgr delete cluster" to show running jobs on multiple
clusters when attempting to delete clusters with running jobs.

By freeing "object" if there were already jobs found on other clusters
prevents _check_jobs_before_remove_assoc() from selecting jobs from the
cluster because "cluster_name" will be NULL.

d04f2d21

Fix small mem leak when deleting clusters from db · eb18e53a
Brian Christiansen authored 8 years ago

eb18e53a
Fix invalid read when deleting cluster from db. · 401886ab
Brian Christiansen authored 8 years ago
```
Happens when there are running jobs on the cluster.
```
401886ab
Dont call extra func unless there is work to do · 6aae9b76
Brian Christiansen authored 8 years ago
```
_process_running_jobs_result() only does something if there were results
returned.
```
6aae9b76
Fix small mem leak when job fails to load state · 6c4df688
Brian Christiansen authored 8 years ago

6c4df688
Fix some typos in comments and logs · 5a45503c
Gennaro Oliva authored 8 years ago

5a45503c
Merge branch 'slurm-16.05' · 6977482d
Morris Jette authored 8 years ago

6977482d

Improve partition AllowGroups caching · 7e381982

Morris Jette authored 8 years ago

If the user is now allowed to use the partition,
    then do not check that user's group access again for 5 seconds.
bug 2913

7e381982

Improve partition AllowGroups caching · 98dc38b2

Morris Jette authored 8 years ago

Improve partition AllowGroups caching. Update the table of UIDs permitted to
    use a partition based upon it's AllowGroups configuration parameter as new
    valid UIDs are found rather than looking up that user's group information
    for every job they submit, which can involve considerable overhead for
    some systems.
bug 2913

98dc38b2

Merge branch 'slurm-16.05' · c4835a73
Morris Jette authored 8 years ago

c4835a73

Minimize preempted jobs · b9f17b18

Morris Jette authored 8 years ago

Minimize preempted jobs for configurations with multiple jobs per node.
  Previous logic would preeempt every job on node allocated to pending
  job.
bug 2906

b9f17b18

gres-flags=enforce-binding fix · 5df8509f

Morris Jette authored 8 years ago

Fix for core selection with job --gres-flags=enforce-binding option.
    Previous logic would in some cases allocate a job zero cores, resulting in
    slurmctld abort.
bug 2808

5df8509f

Jul 18, 2016
- Improve GRES log format · b5e54e11
  Morris Jette authored 8 years ago
  
  Add some indentation so that GRES topology-specific information logged is more readable.
  b5e54e11
- Merge branch 'slurm-16.05' · 5115dabf
  Morris Jette authored 8 years ago
  
  5115dabf
- Select/cons_res memory corruption fix · c06db0de
  Morris Jette authored 8 years ago
  
  A job allocation selecting nodes and no cores/CPUs could write off the end of arrays and corrupt memory. Now to figure out how the logic reached this point in the first place. bug 2808
  c06db0de
- Add SLUGM16 dinner info · 6dc074c8
  Morris Jette authored 8 years ago
  
  6dc074c8
Jul 16, 2016

Add SLURM_PENDING_STEP id so it won't be confused with SLURM_EXTERN_CONT. · 0c7bd6d0

Danny Auble authored 8 years ago

In commit b8190e5d many places that were mean to be pending step ids
were changed to be extern_step id.  The main problem was when we came up
with the idea of the extern step we reused -1 (INFINITE) for the id.  So
pending steps also appeared to be extern steps as well.  Hopefully this
fixes the situation.

Bug 2907

0c7bd6d0

Merge branch 'slurm-16.05' · b8705d7f
Morris Jette authored 8 years ago

b8705d7f
Remove vestigial comment · 71800937
Morris Jette authored 8 years ago

71800937

Move startup of power save thread · fb8e3558

Morris Jette authored 8 years ago

Start power save thread only after the partition information is read
  in order to avoid trying to interpret the SuspendExcParts configuration
  information before the partition information is available, which would
  result in a slurmctld abort.

fb8e3558

Prevent slurmctld race condition · c7cae55b

Morris Jette authored 8 years ago

Do not try to access part_list variable (partition list pointer)
  if not yet initialized. Return NULL pointer rather than aborting
  with NULL pointer.

c7cae55b

Jul 15, 2016
- Fix spelling of hierarchy in comments · 4f3a0a02
  Tim Wickberg authored 8 years ago
  
  4f3a0a02
- Do not scheduled powered down nodes in FAILED state · 310de98d
  Jacek Budzowski authored 8 years ago
  
  bug 2900
  310de98d
- Remove unnecessary test for super user in regression test · 2a7d01a5
  Nicolas Joly authored 8 years ago
  
  2a7d01a5
- Cleanup generated files if test cannot run due to inappropriate conditions. · b9abe288
  Nicolas Joly authored 8 years ago
  
  b9abe288