Commits · 06ea19c47dd982851307fcaaa9ba48a2549c3b85 · tud-zih-energy / Slurm

Mar 03, 2015
- Fix associations not getting default qos set until after a restart. · 06ea19c4
  Brian Christiansen authored 10 years ago
  
  Bug 1492
  06ea19c4
- Abort I/O for debugged app launch fail · 49770e20
  Morris Jette authored 10 years ago
  
  For job running under a debugger, if the exec of the task fails, then cancel its I/O and abort immediately rather than waiting 60 seconds for I/O timeout.
  49770e20
Mar 02, 2015
- Change the level of debug messages. · 971d0021
  David Bigagli authored 10 years ago
  
  971d0021
- Correct the initialization of QOS MinCPUs per job limit. · 862cc80b
  David Bigagli authored 10 years ago
  
  862cc80b
- minor doc update · 94149da1
  Danny Auble authored 10 years ago
  
  94149da1
- update meetings · c70d5091
  Danny Auble authored 10 years ago
  
  c70d5091
Feb 27, 2015
- Update sched plugin web description · eee7bf80
  Nicolas Joly authored 10 years ago
  
  Add missing arguments to slurm_sched_p_newalloc/slurm_sched_p_freealloc documentation.
  eee7bf80
- Use consistent case style for job accounting fields description. · a5179e9a
  Nicolas Joly authored 10 years ago
  
  a5179e9a
- Small typo in sacct man page. · 18a82a34
  Nicolas Joly authored 10 years ago
  
  18a82a34
- Cosmetic mods, no change in logic · 60841159
  Morris Jette authored 10 years ago
  
  60841159
- Fix job getting EligibleTime set before meeting dependency requirements. · ab773f65
  Brian Christiansen authored 10 years ago
  
  Bug 1476
  ab773f65
Feb 26, 2015
- Account all CPUs to the batch steps. · cc8c2e3e
  David Bigagli authored 10 years ago
  
  cc8c2e3e
- task/affinity clean up · 7b313990
  Morris Jette authored 10 years ago
  
  Improved logging and some code restructuring. No change in logic.
  7b313990
Feb 25, 2015
- Revert "Remove unused variable." · 663ec8f2
  David Bigagli authored 10 years ago
  
  This reverts commit e24a418b.
  663ec8f2
- Remove unused variable. · e24a418b
  David Bigagli authored 10 years ago
  
  e24a418b
- Add job_submit build instructions · ee90e55a
  Morris Jette authored 10 years ago
  
  ee90e55a
- select/alps - Reverse .my.cnf search order · 96363d42
  Morris Jette authored 10 years ago
  
  This is a variation on commit 5391b8cc Check $HOME/.my.cnf last rather than first to follow more standard search order
  96363d42
Feb 24, 2015
- Fix sprio showing wrong priority for job arrays until priority is recalculated. · 423029d8
  Brian Christiansen authored 10 years ago
  
  Bug 1469
  423029d8
- cray/basil, read mysql creds from /root/.my.conf · 5391b8cc
  Nina Suvanphim authored 10 years ago
  
  The /root/.my.cnf would typically contain the login credentials for root. If those are needed for Slurm, then it should be checking that directory. (In reply to Nina Suvanphim from comment #0) ... > const char *default_conf_paths[] = { > "/root/.my.cnf", <<<<<<<<<<<<<<<<<------- add this line > "/etc/my.cnf", "/etc/opt/cray/MySQL/my.cnf", > "/etc/mysql/my.cnf", NULL }; I'll also note that typically the $HOME/.my.cnf file would be checked last rather than first.
  5391b8cc
- Add some SUG photo links · 3d67a89a
  Morris Jette authored 10 years ago
  
  3d67a89a
- Fix code for apple computers SOL_TCP is not defined · ac0343be
  Danny Auble authored 10 years ago
  
  ac0343be
- Fix wrong variables used in the wrapper functions needed for systems that · 8d0c9901
  Danny Auble authored 10 years ago
  
  don't support strong_alias
  8d0c9901
Feb 20, 2015

Dorian Krause authored 10 years ago

we came across the following error message in the slurmctld logs when
using non-consumable resources:

error: gres/potion: job 39 dealloc of node node1 bad node_offset 0 count
is 0

The error comes from _job_dealloc():

node_gres_data=0x7f8a18000b70, node_offset=0, gres_name=0x1999e00
"potion", job_id=46,
    node_name=0x1987ab0 "node1") at gres.c:3980
(job_gres_list=0x199b7c0, node_gres_list=0x199bc38, node_offset=0,
job_id=46,
    node_name=0x1987ab0 "node1") at gres.c:4190
job_ptr=0x19e9d50, pre_err=0x7f8a31353cb0 "_will_run_test", remove_all=true)
    at select_linear.c:2091
bitmap=0x7f8a18001ad0, min_nodes=1, max_nodes=1, max_share=1, req_nodes=1,
    preemptee_candidates=0x0, preemptee_job_list=0x7f8a2f910c40) at
select_linear.c:3176
bitmap=0x7f8a18001ad0, min_nodes=1, max_nodes=1, req_nodes=1, mode=2,
    preemptee_candidates=0x0, preemptee_job_list=0x7f8a2f910c40,
exc_core_bitmap=0x0) at select_linear.c:3390
bitmap=0x7f8a18001ad0, min_nodes=1, max_nodes=1, req_nodes=1, mode=2,
    preemptee_candidates=0x0, preemptee_job_list=0x7f8a2f910c40,
exc_core_bitmap=0x0) at node_select.c:588
avail_bitmap=0x7f8a2f910d38, min_nodes=1, max_nodes=1, req_nodes=1,
exc_core_bitmap=0x0)
    at backfill.c:367

The cause of this problem is that _node_state_dup() in gres.c does not
duplicate the no_consume flag.
The cr_ptr passed to _rm_job_from_nodes() is created with _dup_cr()
which calls _node_state_dup().

Below is a simple patch to fix the problem. A "future-proof" alternative
might be to memcpy() from gres_ptr to new_gres and
only handle pointers separately.

33c48ac5

Feb 19, 2015
- Load lua-5.2 library if using lua5.2 for lua job submit plugin. · 408c108e
  Brian Christiansen authored 10 years ago
  
  Bug 1471
  408c108e
- Remove vestigial/wrong documentation · 2e7bed24
  Morris Jette authored 10 years ago
  
  "If you specify a maximum node count and the host list contains more nodes, the extra node names will be silently ignored." Not so.
  2e7bed24
- MySQL - Fix potential issue when PrivateData=Usage and a normal user · 9a03f2a5
  Danny Auble authored 10 years ago
  
  runs certain sreport reports.
  9a03f2a5
Feb 18, 2015
- Add SLURM_JOB_GPUS to Prolog · 2e95c20b
  Morris Jette authored 10 years ago
  
  Add SLURM_JOB_GPUS environment variable to those available in Prolog. Also add list of environment variables available in the various prologs and epilogs on the web page. bug 1458
  2e95c20b
- Print FAIR_TREE in "scontrol show config" output for PriorityFlags. · 27eef95d
  Brian Christiansen authored 10 years ago
  
  27eef95d
Feb 17, 2015
- BGQ - Close very small window where a step could of been removed before the · c169c935
  Danny Auble authored 10 years ago
  
  runjob happened, and the step was part of an array. This is an addition to commit 49e0f5f2
  c169c935
- BGQ - Fix issue with job arrays not being handled correctly · 49e0f5f2
  Danny Auble authored 10 years ago
  
  in the runjob_mux plugin.
  49e0f5f2
- Update NEWS · 6984348d
  Brian Christiansen authored 10 years ago
  
  Bug 1461 Commit: 2e2d924e
  6984348d
- Prevent slurmdbd abort if node DOWN with NULL reason · 2e2d924e
  Morris Jette authored 10 years ago
  
  See bug 1461
  2e2d924e
Feb 13, 2015

Fix squeue. · c13e8540
David Bigagli authored 10 years ago

c13e8540

Avoid triggering accounting if node state unchanged · 23f84ace

Morris Jette authored 10 years ago

If call was made to change a node's state to the same state it
was already in and set its reason to the same value it already
had, then an accounting record was generated. If a script, say
NodeHealthCheck is repeatedly setting a node state (say DRAIN),
it could generate a huge number of redundant accounting records.
This eliminates these redundant records.
related to bug 1437

23f84ace

Feb 12, 2015
- Start v14.11.5 NEWS file · 4531ab3f
  Morris Jette authored 10 years ago
  
  4531ab3f
- Update META for v14.11.4 tag · 1b2c8e18
  Morris Jette authored 10 years ago
  
  1b2c8e18
- Fix perlapi tests for libslurm perl module. · ea7a0c7c
  Brian Christiansen authored 10 years ago
  
  ea7a0c7c
- Fix issue with "sreport cluster AccountUtilizationByUser" when using PrivateData=users. · 37b56085
  Brian Christiansen authored 10 years ago
  
  Bug 1446
  37b56085
Feb 11, 2015
- MySQL - If a node state and reason are the same on a node state change · 1685ba56
  Danny Auble authored 10 years ago
  
  don't insert a new row in the event table.
  1685ba56
Feb 10, 2015
- Additional fix to 50e0c84f . · 50b43afd
  Brian Christiansen authored 10 years ago
  
  uid's are 0 when associations are loaded.
  50b43afd