Commits · cd95423ee6000809afe14a80137b64e96655fdbe · tud-zih-energy / Slurm

Jun 25, 2014
- Add more reservation logs · cd95423e
  Morris Jette authored 10 years ago
  
  cd95423e
- Expand backfill logging · 7a187f10
  Morris Jette authored 10 years ago
  
  Log more backfill scheduling actions including where/when jobs will run even if they can not start until long in the future
  7a187f10
- Correct comment describing a function · 92fb2a4b
  Morris Jette authored 10 years ago
  
  92fb2a4b
- simpler code · ed223bbe
  Danny Auble authored 10 years ago
  
  ed223bbe
- Fix issue with sacctmgr 'load' not able to gracefully handle bad formatted · dad8bd76
  Remi Palancher authored 10 years ago
  
  file.
  dad8bd76
- Make sure we lock on the conf when sending slurmd's conf to the slurmstepd. · 354e4311
  Bill Brophy authored 10 years ago
  
  If we don't it could get messed up on a reconfig.
  354e4311
- Fix if a job's partition was taken away from it don't allow a requeue. · 566b5a50
  Danny Auble authored 10 years ago
  
  566b5a50
- Minor reword of comment · 2e4ca6ca
  Danny Auble authored 10 years ago
  
  2e4ca6ca
- Print the state of requeued job as REQUEUED. · e58e7ea3
  David Bigagli authored 10 years ago
  
  e58e7ea3
- When a job is requeued make sure accounting marks it as such. · 2f6f3e09
  Danny Auble authored 10 years ago
  
  2f6f3e09
- Improve accuracy of a log message · 464abac8
  Morris Jette authored 10 years ago
  
  464abac8
- Remove some vestigial logging function arguments · 230fa908
  Morris Jette authored 10 years ago
  
  230fa908
- fix gres topo array index · 369eb585
  Morris Jette authored 10 years ago
  
  Logic used to identify cores which are usable with each GRES weas incorrect. bug 905
  369eb585
- Fix issue where association maxnodes wouldn't be evaluated correctly if a · cecd5588
  Danny Auble authored 10 years ago
  
  QOS had a GrpNodes set.
  cecd5588
- Fix header of file to represent the correct file · 133f5359
  Danny Auble authored 10 years ago
  
  133f5359
Jun 24, 2014

core reservation distribution fix · aeecd03c

Morris Jette authored 10 years ago

Fix for core-based advanced reservations where the distribution of cores
across nodes is not even. Failing test case:
system has 10 nodes, 1 of which is fully occupied
create reservation with 9 nodes and 10 cores
always would fail with "busy nodes" error

aeecd03c

Purely cosmetic changes · 339753a1
Morris Jette authored 10 years ago

339753a1

Jun 20, 2014

Fix in HDF5 for NULL hostname · 5520ec6a

Morris Jette authored 10 years ago

The hostname was being set as an HDF5 value before the field was
set in slurmstepd, resulting in SEGV. This change sets hostname
before the HDF5 call and also tests for NULL before trying to set
the value. Backtrace of failure:
(gdb) bt
0  strlen () at ../sysdeps/x86_64/strlen.S:106
1  0x00007f4af5bb756d in put_string_attribute (parent=33554432,
   name=0x7f4af5bb8591 "Node Name", value=0x0)
   at src/plugins/acct_gather_profile/hdf5/hdf5_api.c:1711
2  0x00007f4af5bad224 in acct_gather_profile_p_node_step_start (job=0x194cc80)
   at src/plugins/acct_gather_profile/hdf5/acct_gather_profile_hdf5.c:372
3  0x000000000052ca86 in acct_gather_profile_g_conf_set (tbl=0x194cc80)
   at src/common/slurm_acct_gather_profile.c:490
4  0x000000000042e028 in batch_stepd_step_rec_create (msg=0x194d280)
   at src/slurmd/slurmstepd/slurmstepd_job.c:496
5  0x0000000000426ae5 in mgr_launch_batch_job_setup (msg=0x194d280, cli=0x194bec0)
   at src/slurmd/slurmstepd/mgr.c:422
6  0x00000000004263da in _step_setup (cli=0x194bec0, self=0x0, msg=0x194bd90)
   at slurmd/slurmstepd/slurmstepd.c:516
7  0x0000000000424302 in main (argc=1, argv=0x7fff1f7c6c98)
    at src/slurmd/slurmstepd/slurmstepd.c:127

5520ec6a

Fix task/cgroup to handle -mblock:fcyclic correctly · 992ec094
Matthieu Hautreux authored 10 years ago

992ec094

Jun 19, 2014
- Print Slurm error string in scontrol update job and reset the Slurm · 8982fb6b
  David Bigagli authored 10 years ago
  
  errno before each call to the API.
  8982fb6b
- Correct job shared info · fad1f773
  jette authored 10 years ago
  
  Correct Shared field in job state information seen by scontrol, sview, etc.
  fad1f773
- Document when a job's shared field can change · 8855d4ad
  jette authored 10 years ago
  
  Only for pending jobs bug 899
  8855d4ad
- If a srun runs in an exclusive allocation and doesn't use the entire · da86481d
  Danny Auble authored 10 years ago
  
  allocation and CR_PACK_NODES is set layout tasks appropriately. This is related to bug 890
  da86481d
- fixes to the NEWS file · 692c179c
  Danny Auble authored 10 years ago
  
  692c179c
Jun 18, 2014
- Serialize sinfo as needed · 5351d393
  David Bigagli authored 10 years ago
  
  the sinfo command is parallelized for performance reasons and it really can not be completely parallelized for some use cases. see bug 883
  5351d393
- Merge branch 'slurm-2.6' into slurm-14.03 · 7e7814c0
  Morris Jette authored 10 years ago
  
  7e7814c0
Jun 17, 2014
- Correct CPU ID on Power7 · ebaa4366
  Morris Jette authored 10 years ago
  
  Correct logic to support Power7 processor with 1 or 2 threads per core (CPU IDs are not consecutive). bug 891
  ebaa4366
- Fix a couple of "make check" issues · e88b9899
  Morris Jette authored 10 years ago
  
  There was some logic copied from bitstring.c and an unused variable problem reported by the compiler.
  e88b9899
- Update META for v14.03.4-2 · 13c454fa
  Morris Jette authored 10 years ago
  
  13c454fa
- Fix for possible scontrol segv · 34fc03ba
  Morris Jette authored 10 years ago
  
  This is due to a bug introduced in commit 83d626ca SOme configurations could result in NULL names in the node list table (e.g. hidden partitions).
  34fc03ba
- Improve performance of "scontrol show job" · aaad924a
  Morris Jette authored 10 years ago
  
  SLowness introduced in commit 83d626ca
  aaad924a
- Fix for shared partitions · 9f7a4250
  Morris Jette authored 10 years ago
  
  This reverts commit 0d6a9965 That patch would permit a job with shared resources to run on the same node as a job without shared resources, unfortunately it let those jobs share CPUs. Finer grained sharing might be possible with extensive code changes, but not something to work on now.
  9f7a4250
- Merge branch 'slurm-14.03' of https://github.com/SchedMD/slurm into slurm-14.03 · 90f8688f
  jette authored 10 years ago
  
  90f8688f
- Fix for share partitions · 0d6a9965
  jette authored 10 years ago
  
  Without this change, the job's --shared option when used with a partition configuration of Shared=YES was not being honored by the select/cons_res or select/serial plugin.
  0d6a9965
- Shared=, don't implicitly set job's shared field · 8672400e
  jette authored 10 years ago
  
  Original code was implicitly setting a job's shared field to 1 for select/cons_res.
  8672400e
- Partition Shared=YES fix · 16fbc7a7
  jette authored 10 years ago
  
  This reverts commit c773b750
  16fbc7a7
- Fix minor memory leak when reading in incomplete node data checkpoint file. · 34d59e1e
  Danny Auble authored 10 years ago
  
  34d59e1e
- Update sinfo.1 man page and NEWS. · 13d71ca5
  David Bigagli authored 10 years ago
  
  13d71ca5
- Enlarge the width specifier when printing partition SHARE to · f6b61804
  ggeorgakoudis authored 10 years ago
  
  display larger sharing values.
  f6b61804
- Print the details of the accepted connection if DebugFlags=Protocol. · 242d2628
  David Bigagli authored 10 years ago
  
  242d2628